Problem rebuilding MPSS-3.3 RPMs for a later RHEL kernel

Problem rebuilding MPSS-3.3 RPMs for a later RHEL kernel

This is only a fractionally later kernel than the one the pre-packaged software comes compiled for, so you'd think it would compile straight off. Unfortunately everything is made more complicated by the fact that the spec files are rather badly written. Many are completely or partially lacking pre-reqs required for rebuilding, and the errors are not easy to decipher.

Having waded through many errors, I have come up against this one, which I am unsure of. Some help would be appreciated in finding a workaround. It would also be good if Intel could feed back to the developers that they need to fix their RPM packages, as they're only marginally better than those of Mellanox, which isn't saying much. It would be nice if someone outside of the open source community could get this right.

Processing files: dapl-2.0.42.2-1.el6.x86_64
Executing(%doc): /bin/sh -e /var/tmp/rpm-tmp.P933I0
+ umask 022
+ cd /home/me/rpmbuild/BUILD
+ cd dapl-2.0.42.2
+ DOCDIR=/home/me/rpmbuild/BUILDROOT/dapl-2.0.42.2-1.glibc2.12.2.x86_64/usr/share/doc/dapl-2.0.42.2
+ export DOCDIR
+ rm -rf /home/me/rpmbuild/BUILDROOT/dapl-2.0.42.2-1.glibc2.12.2.x86_64/usr/share/doc/dapl-2.0.42.2
+ /bin/mkdir -p /home/me/rpmbuild/BUILDROOT/dapl-2.0.42.2-1.glibc2.12.2.x86_64/usr/share/doc/dapl-2.0.42.2
+ cp -pr AUTHORS README COPYING ChangeLog LICENSE.txt LICENSE2.txt LICENSE3.txt README.mcm /home/me/rpmbuild/BUILDROOT/dapl-2.0.42.2-1.glibc2.12.2.x86_64/usr/share/doc/dapl-2.0.42.2
+ exit 0
Provides: config(dapl) = 2.0.42.2-1.el6 libdaplofa.so.2()(64bit) libdaplofa.so.2(DAPL_CMA_2.0)(64bit) libdaploscm.so.2()(64bit) libdaploscm.so.2(DAPL_SCM_2.0)(64bit) libdaploucm.so.2()(64bit) libdaploucm.so.2(DAPL_OCM_2.0)(64bit) libdat2.so.2()(64bit) libdat2.so.2(DAT_2.0)(64bit)
Requires(interp): /bin/sh /bin/sh /bin/sh /bin/sh
Requires(rpmlib): rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(CompressedFileNames) <= 3.0.4-1
Requires(post): /sbin/ldconfig /sbin/chkconfig /bin/sh
Requires(preun): /sbin/chkconfig /bin/sh
Requires(postun): /sbin/ldconfig /bin/sh
Obsoletes: intel-mic-ofed-dapl
Processing files: dapl-devel-2.0.42.2-1.el6.x86_64
Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1
Requires: libdaplofa.so.2()(64bit) libdaploscm.so.2()(64bit) libdaploucm.so.2()(64bit) libdat2.so.2()(64bit)
Obsoletes: intel-mic-ofed-dapl-devel
Processing files: dapl-devel-static-2.0.42.2-1.el6.x86_64
Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1
Obsoletes: intel-mic-ofed-dapl-devel-static
Processing files: dapl-utils-2.0.42.2-1.el6.x86_64
Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1
Requires: libc.so.6()(64bit) libc.so.6(GLIBC_2.2.5)(64bit) libdat2.so.2()(64bit) libdat2.so.2(DAT_2.0)(64bit) libdl.so.2()(64bit) libpthread.so.0()(64bit) libpthread.so.0(GLIBC_2.2.5)(64bit) libpthread.so.0(GLIBC_2.3.2)(64bit) rtld(GNU_HASH)
Obsoletes: intel-mic-ofed-dapl-utils
Checking for unpackaged file(s): /usr/lib/rpm/check-files /home/me/rpmbuild/BUILDROOT/dapl-2.0.42.2-1.glibc2.12.2.x86_64
error: Unable to write temp header

RPM build errors:
user build does not exist - using root
group build does not exist - using root
user build does not exist - using root
group build does not exist - using root
Unable to write temp header

13 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I meant to say that the original command is:

rpmbuild --rebuild --define "MOFED 1" src/dapl-2.0.42.2-1.glibc2.12.2.src.rpm src/libibscif-1.0.0-1.fc13.src.rpm src/ofed-driver-3.3-1.src.rpm

as per the "MPSS users guide".

I will pass on your comment about the spec files to the developers. Were there any requirements other than those listed in section 2.1 of the MPSS User's Guide?

I notice that you did not uninstall the existing dapl, libibscif and ofed-driver packages before doing the build. Before you install your rebuilt copies of those packages, it is recommended that you uninstall the old version. The same is true for the entire MPSS, although in this limited case, I suspect it is not strictly necessary. I know - not what you would expect to be asked to do for the polished open source code you are used to. 

On to the errors - 

Did you run this command as yourself or as root? I suspect it is right that there is no user 'build' on your system. But this does not seem to be a fatal error. When one of my teammates did the same build yesterday, he did it as root and I don't believe he got those errors.

Was your /tmp directory full or otherwise unwritable? I looked through some of the rpm source code and it looks like that message "Unable to write temp header" occurs when a temporary file that was created becomes unwritable because there is no space available.

Actually there were no existing libibscif or ofed-driver packages installed. This is on a fresh build. The dapl package got installed as the build originally failed stating it had to be present to build dapl. There are a lot of OFED-2 packages (mostly libraries and their -devel parts) that are required to get this far in the build without it failing with obscure errors. Digging shows linking problems. None of these dependencies are listed in the MPSS user guide as far as I can see, and the spec files don't specify them as build-requires either.

I set a custom TMPDIR, which is being honoured as we already know about that problem.

You should never build RPMs as root. Really best practice says that these RPMs should be rebuilt in a chroot using a tool like mock, which is what I use when building RPMs for the Fedora project. The developers might find the Fedora documentation on RPM packaging helpful, as it is detailed and thorough. The best practices they insist on for packaged software are there to make it maintainable and ensure high quality, so Intel could benefit from that hard work and experience rather than trying to muddle through. For example: https://fedoraproject.org/wiki/Packaging:Guidelines

The lack of a 'build' user should not be an issue, and is a harmless warning. It should not prevent the RPM from building. I have seen it a fair few times with commercial software (I think a lot of people copy spec files around and keep much of the cruft without asking themselves what it does).

Actually I do see that there is some confusion as the wrong error was posted (from before tmpdir was changed). This is the actual error, since I am unable to edit the original post:

set -e ; perl /usr/src/kernels/2.6.32-358.6.2.el6.x86_64/scripts/recordmcount.pl "x86_64" "64" "objdump" "objcopy" "gcc" "ld" "nm" "" "" "1" "/home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp/kmtest/server_tests.o";
make[3]: *** wait: No child processes. Stop.
make[2]: *** [/home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/hw/scif] Error 2
make[2]: *** Waiting for unfinished jobs....
set -e ; perl /usr/src/kernels/2.6.32-358.6.2.el6.x86_64/scripts/recordmcount.pl "x86_64" "64" "objdump" "objcopy" "gcc" "ld" "nm" "" "" "1" "/home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp/kmtest/sa.o";
set -e ; perl /usr/src/kernels/2.6.32-358.6.2.el6.x86_64/scripts/recordmcount.pl "x86_64" "64" "objdump" "objcopy" "gcc" "ld" "nm" "" "" "1" "/home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp/kmtest/rc_pingpong.o";
set -e ; perl /usr/src/kernels/2.6.32-358.6.2.el6.x86_64/scripts/recordmcount.pl "x86_64" "64" "objdump" "objcopy" "gcc" "ld" "nm" "" "" "1" "/home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp/kmtest/sa_test.o";
set -e ; perl /usr/src/kernels/2.6.32-358.6.2.el6.x86_64/scripts/recordmcount.pl "x86_64" "64" "objdump" "objcopy" "gcc" "ld" "nm" "" "" "1" "/home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp/kmtest/rdma_bw.o";
set -e ; perl /usr/src/kernels/2.6.32-358.6.2.el6.x86_64/scripts/recordmcount.pl "x86_64" "64" "objdump" "objcopy" "gcc" "ld" "nm" "" "" "1" "/home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp/kmtest/tests.o";
set -e ; perl /usr/src/kernels/2.6.32-358.6.2.el6.x86_64/scripts/recordmcount.pl "x86_64" "64" "objdump" "objcopy" "gcc" "ld" "nm" "" "" "1" "/home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp/kmtest/cm_test.o";
set -e ; perl /usr/src/kernels/2.6.32-358.6.2.el6.x86_64/scripts/recordmcount.pl "x86_64" "64" "objdump" "objcopy" "gcc" "ld" "nm" "" "" "1" "/home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp/kmtest/main.o";
ld -m elf_x86_64 -r -o /home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp/kmtest/kmtest.o /home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp/kmtest/client.o /home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp/kmtest/ib_tests.o /home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp/kmtest/tcp_utils.o /home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp/kmtest/clientserver_utils.o /home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp/kmtest/ib_utils.o /home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp/kmtest/time_utils.o /home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp/kmtest/server_tests.o /home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp/kmtest/rc_pingpong.o /home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp/kmtest/rdma_bw.o /home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp/kmtest/frequency.o /home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp/kmtest/cm.o /home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp/kmtest/sa.o /home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp/kmtest/sa_test.o /home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp/kmtest/cm_test.o /home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp/kmtest/tests.o /home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp/kmtest/main.o ; scripts/mod/modpost /home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp/kmtest/kmtest.o
make[2]: *** [/home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1/drivers/infiniband/ibp] Error 2
make[1]: *** [_module_/home/me/rpmbuild/BUILD/ofed-driver/ofa_kernel-1.5.4.1] Error 2
make[1]: Leaving directory `/usr/src/kernels/2.6.32-358.6.2.el6.x86_64'
make: *** [kernel] Error 2
error: Bad exit status from /scratch/me/tmp/rpm-tmp.Pg11Vd (%build)

I have omitted the errors about the build user as they are meaningless.

After reading back over your original post, I have a couple questions. I want to make sure you are at the place in the install instructions that I think you are at. The documentation that comes with the MPSS can be confusing (which is why it is undergoing a complete rewrite).

Do you have the kernel-headers and kernel-devel packages installed (or kernel-default-devel, if using SuSE)?

Did you rebuild mpss-modules-3.3-1.src.rpm, following the directions in the "Warning" box in section 2.1 of the readme.txt file (not the User's Guide but the readme.txt file)? If you are rebuilding kernel modules because you are running a later kernel, that module is the first thing you should rebuild. Did you have any trouble rebuilding that module?

Did you complete the install of the basic MPSS following the directions in section 2.2 of the readme.txt file?

Are you using the Mellanox* 2.1 version of OFED? Did you complete the install of OFED, using the directions from Mellanox?

What section of the MPSS User's Guide are you referring to when you say:

I meant to say that the original command is:
rpmbuild --rebuild --define "MOFED 1" src/dapl-2.0.42.2-1.glibc2.12.2.src.rpm src/libibscif-1.0.0-1.fc13.src.rpm src/ofed-driver-3.3-1.src.rpm
as per the "MPSS users guide".

Can you provide a list of the packages you installed in an effort to clean up the missing prerequisite problem?

Looking back at the particular software you are having trouble with, you should actually only need the kernel-headers and kernel-devel packages, plus some headers and libraries that come with the MPSS and OFED. If you are doing a full build of OFED, you need the packages listed in section 2.1 of the User's Guide. But I believe for the pieces you are attempting to rebuild here you only need, as I said, the kernel-headers and kernel-devel packages, plus some headers and libraries that come with the MPSS and OFED.

Thanks for getting back to me so quickly.

Yes, the kernel-headers and kernel-devel packages are installed, as are the freshly rebuilt mpss-modules and mpss-modules-devel packages. These should be build-requires (I think the kernel packages are in the mpss-modules SPEC files already, though I haven't looked). I had no trouble building these packages at all, and they installed without problems. Those instructions are actually from the readme.txt.

We are using Mellanox OFED 2.0, not 2.1, and this is for kernel 2.6.32-358.6.2.el6.x86_64 on RHEL 6. These two are not something I can really change, but the changes between 2.0 and 2.1 are minimal so it should not be a problem for this. The same for the kernel version. There are already pre-compiled MPSS RPMs for 2.6.32-358, so I don't expect problems.

The command is from page 11 (section 2.5) of http://registrationcenter.intel.com/irc_nas/4433/MPSS_Users_Guide.pdf

The pre-requisites noticed so far are:

libibverbs-devel dapl dapl-devel libibumad-devel librdmacm-devel mpss-modules-devel

most of these can be guessed at but it is not always obvious when something is missing, and specific build-requires in the SPEC files would make it easy to also check the correct versions are present. At the moment finding out exactly what is missing involves hunting through the source and trying to work out what libraries the linker errors are about. It's easier if it is a header file missing as that usually makes it pretty clear.

I haven't got any further than the end of part (3) of section 2.2 of the readme.txt, as beyond that it talks about installing software that we don't have yet, as it must be re-built. Note that step (3) is about the installation of the mpss-modules RPM, which I have explained I have built and installed in the run up to trying to build libibscif, dapl and the ofed-driver.

OK, I don't know why this isn't working for you or why it says it needs those extra packages (some of them like libibverbs-devel, dapl-devel, dapl are in the source package you are trying to build and mpss-modules-devel was in the mpss modules source package that you built and whose output you installed previously. )

Since the make error 2 is so unhelpful, saying in effect only that make hit some error, could you perhaps try doing the make by hand and add a debug option?

I have also asked someone who knows more about building this software to take a look. I am worried that the answer will come back that you must use Mellanox OFED 2.1, that 2.0 will keep our code from building. But we will see.

I managed to get it to build manually, though I'm not sure why. It seemed to be more luck than anything else and I'm not sure what I did that made it build. Unfortunately it doesn't work, though it does install. This seems to be a problem with the OFED version I think, given the error messages in dmesg.

On a slightly different note, there doesn't seem to be a mpxyd daemon installed, which is mentioned in the user guide and was necessary in earlier versions to get IPoIB working with Mellanox hardware. Has this need gone away, or am I missing something?

micctrl seems to replace the entire contents of the /etc/hosts file with just two entries, for mic0 and mic1, throwing away whatever was in there before. This is extremely unhelpful and actually we need the entries it discards, so we are currently using a workaround to prevent it from editing the file at all, then adding the mic entries manually later on in the build. Surely it wouldn't be too difficult to just append rather than overwrite this file? In any case, many Linux services do not function correctly if the localhost entries aren't kept at the very least.

Another point of confusion is the mlnx-ofa_kernel package (from Mellanox OFED 2.*), which conflicts (and seems always to have done) with the ofed-driver from Intel, but the latter seems to depend on several files from the first package, while trying and failing to overwrite others. It isn't mentioned in the installation instructions in any of the Intel documentation as far as I can see. It would be good if someone could look at this.

I am told that Mellanox 2.0 will not work. It lacks necessary support for the coprocessor. So you must use at least 2.1.

mpxyd is in dapl-2.0.42.2-1.glibc2.12.2.x86_64.rpm which is supposed to be installed. dapl-2.0.42.2-1.glibc2.12.2.x86_64.rpm is also supposed to be rebuilt from the source code in mpss3.3/src. The directions say to rebuild with "MOFED 1" defined. I suspect it should be "_MELLANOX 1" instead. I am checking on this.

For the micctrl options that control networking functions, there is an option, --modhost=no that should prevent the disappearing /etc/host entries from happening.

For the conflicts between Mellanox OFED and the ofed-driver from Intel, was this with Mellanox OFED 2.0 or 2.1. If it was 2.0, please switch to 2.1. If it was 2.1 could you list what the conflicts were?

Thanks, I will see whether using "_MELLANOX 1" instead helps when rebuilding. dapl is installed, so it should be working, but if as you suggest it needs a different macro value set then that might explain it.

We have (with much cursing and pain) switched to OFED 2.1, but only these machines only. Hopefully this won't cause problems elsewhere.. The conflict occurs with all 2.x OFEDs from Mellanox. I have even tried 2.2 just in case.

This is the conflict itself, when you try to install ofed-driver if you don't first remove the mlnx-ofa_kernel:

  file /etc/infiniband/connectx.conf from install of ofed-driver-3.0.76-0.11-default-3.3-1.x86_64 conflicts with file from package mlnx-ofa_kernel-2.1-OFED.2.1.197.g008fbee.rhel6u4.x86_64
  file /etc/infiniband/openib.conf from install of ofed-driver-3.0.76-0.11-default-3.3-1.x86_64 conflicts with file from package mlnx-ofa_kernel-2.1-OFED.2.1.197.g008fbee.rhel6u4.x86_64
  file /etc/infiniband/truescale.cmds from install of ofed-driver-3.0.76-0.11-default-3.3-1.x86_64 conflicts with file from package mlnx-ofa_kernel-2.1-OFED.2.1.197.g008fbee.rhel6u4.x86_64
  file /etc/init.d/openibd from install of ofed-driver-3.0.76-0.11-default-3.3-1.x86_64 conflicts with file from package mlnx-ofa_kernel-2.1-OFED.2.1.197.g008fbee.rhel6u4.x86_64
  file /etc/modprobe.d/ib_ipoib.conf from install of ofed-driver-3.0.76-0.11-default-3.3-1.x86_64 conflicts with file from package mlnx-ofa_kernel-2.1-OFED.2.1.197.g008fbee.rhel6u4.x86_64
  file /etc/udev/rules.d/90-ib.rules from install of ofed-driver-3.0.76-0.11-default-3.3-1.x86_64 conflicts with file from package mlnx-ofa_kernel-2.1-OFED.2.1.197.g008fbee.rhel6u4.x86_64
  file /sbin/connectx_port_config from install of ofed-driver-3.0.76-0.11-default-3.3-1.x86_64 conflicts with file from package mlnx-ofa_kernel-2.1-OFED.2.1.197.g008fbee.rhel6u4.x86_64
  file /sbin/sysctl_perf_tuning from install of ofed-driver-3.0.76-0.11-default-3.3-1.x86_64 conflicts with file from package mlnx-ofa_kernel-2.1-OFED.2.1.197.g008fbee.rhel6u4.x86_64
  file /usr/bin/ibdev2netdev from install of ofed-driver-3.0.76-0.11-default-3.3-1.x86_64 conflicts with file from package mlnx-ofa_kernel-2.1-OFED.2.1.197.g008fbee.rhel6u4.x86_64

This is what we see in the logs at the moment when the MIC software starts being used, having just removed mlnx-ofa_kernel and installed ofed-kernel:

Sep  5 15:49:44 yellow12 kernel: mic0: Transition from state booting to online
Sep  5 15:49:44 yellow12 kernel: mic1: Transition from state booting to online
Sep  5 15:51:44 yellow12 kernel: ibp_server: disagrees about version of symbol ib_unregister_client
Sep  5 15:51:44 yellow12 kernel: ibp_server: Unknown symbol ib_unregister_client
Sep  5 15:51:44 yellow12 kernel: ibp_server: disagrees about version of symbol ib_query_ah
Sep  5 15:51:44 yellow12 kernel: ibp_server: Unknown symbol ib_query_ah
Sep  5 15:51:44 yellow12 kernel: ibp_server: disagrees about version of symbol ib_query_srq
Sep  5 15:51:44 yellow12 kernel: ibp_server: Unknown symbol ib_query_srq
Sep  5 15:51:44 yellow12 kernel: ibp_server: disagrees about version of symbol ib_dereg_mr
Sep  5 15:51:44 yellow12 kernel: ibp_server: Unknown symbol ib_dereg_mr
Sep  5 15:51:45 yellow12 kernel: ibp_server: disagrees about version of symbol ib_query_qp
Sep  5 15:51:45 yellow12 kernel: ibp_server: Unknown symbol ib_query_qp
Sep  5 15:51:45 yellow12 kernel: ibp_server: disagrees about version of symbol ib_register_event_handler
Sep  5 15:51:45 yellow12 kernel: ibp_server: Unknown symbol ib_register_event_handler
Sep  5 15:51:45 yellow12 kernel: ibp_server: disagrees about version of symbol ib_detach_mcast
Sep  5 15:51:45 yellow12 kernel: ibp_server: Unknown symbol ib_detach_mcast
Sep  5 15:51:45 yellow12 kernel: ibp_server: disagrees about version of symbol ib_unregister_event_handler
Sep  5 15:51:45 yellow12 kernel: ibp_server: Unknown symbol ib_unregister_event_handler
Sep  5 15:51:45 yellow12 kernel: ibp_server: disagrees about version of symbol ib_create_ah
Sep  5 15:51:45 yellow12 kernel: ibp_server: Unknown symbol ib_create_ah
Sep  5 15:51:45 yellow12 kernel: ibp_server: disagrees about version of symbol ib_register_client
Sep  5 15:51:45 yellow12 kernel: ibp_server: Unknown symbol ib_register_client
Sep  5 15:51:45 yellow12 kernel: ibp_server: disagrees about version of symbol ib_destroy_cq
Sep  5 15:51:45 yellow12 kernel: ibp_server: Unknown symbol ib_destroy_cq
Sep  5 15:51:45 yellow12 kernel: ibp_server: disagrees about version of symbol ib_set_client_data
Sep  5 15:51:45 yellow12 kernel: ibp_server: Unknown symbol ib_set_client_data
Sep  5 15:51:45 yellow12 kernel: ibp_server: disagrees about version of symbol ib_query_port
Sep  5 15:51:45 yellow12 kernel: ibp_server: Unknown symbol ib_query_port
Sep  5 15:51:45 yellow12 kernel: ibp_server: disagrees about version of symbol ib_get_client_data
Sep  5 15:51:45 yellow12 kernel: ibp_server: Unknown symbol ib_get_client_data
Sep  5 15:51:46 yellow12 kernel: ibp_server: disagrees about version of symbol ib_destroy_srq
Sep  5 15:51:46 yellow12 kernel: ibp_server: Unknown symbol ib_destroy_srq
Sep  5 15:51:46 yellow12 kernel: ibp_server: disagrees about version of symbol ib_query_device
Sep  5 15:51:46 yellow12 kernel: ibp_server: Unknown symbol ib_query_device
Sep  5 15:51:46 yellow12 kernel: ibp_server: disagrees about version of symbol ib_destroy_ah
Sep  5 15:51:46 yellow12 kernel: ibp_server: Unknown symbol ib_destroy_ah
Sep  5 15:51:46 yellow12 kernel: ibp_server: disagrees about version of symbol ib_dealloc_xrcd
Sep  5 15:51:46 yellow12 kernel: ibp_server: Unknown symbol ib_dealloc_xrcd
Sep  5 15:51:46 yellow12 kernel: ibp_server: disagrees about version of symbol ib_query_pkey
Sep  5 15:51:46 yellow12 kernel: ibp_server: Unknown symbol ib_query_pkey
Sep  5 15:51:46 yellow12 kernel: ibp_server: disagrees about version of symbol ib_destroy_qp
Sep  5 15:51:46 yellow12 kernel: ibp_server: Unknown symbol ib_destroy_qp
Sep  5 15:51:46 yellow12 kernel: ibp_server: disagrees about version of symbol ib_dealloc_pd
Sep  5 15:51:46 yellow12 kernel: ibp_server: Unknown symbol ib_dealloc_pd
Sep  5 15:51:46 yellow12 kernel: ibp_server: disagrees about version of symbol ib_query_gid
Sep  5 15:51:46 yellow12 kernel: ibp_server: Unknown symbol ib_query_gid
Sep  5 15:51:46 yellow12 kernel: ibp_server: disagrees about version of symbol ib_attach_mcast
Sep  5 15:51:46 yellow12 kernel: ibp_server: Unknown symbol ib_attach_mcast

Note that as much of what is provided by mlnx-ofa_kernel package is in use (and I don't know what), and the nodes are diskless, there isn't much more we can do at this end at this stage.

Unless I force it, which I'd much rather not, I can't install the ofed-driver package at all without uninstalling mlnx-ofa_kernel, and then once the mics come up, these errors appear and IPoIB can't be set up.

I tried building with --define "_MELLANOX 1", but it failed with an obscure error:

...
                $(if $(CONFIG_XEN),-D__XEN_INTERFACE_VERSION__=$(CONFIG_XEN_INTERFACE_VERSION)) \
                $(if $(CONFIG_XEN),-I$(srctree)/arch/x86/include/mach-xen) \
                -I$(srctree)/arch/$(hdr-arch)/include \
                -Iinclude \
                $(if $(KBUILD_SRC),-Iinclude2 -I$(srctree)/include) \
                -I$(srctree)/arch/$(SRCARCH)/include \
                ' \
                modules
make[1]: Entering directory `/usr/src/kernels/2.6.32-358.6.2.el6.x86_64'
make[1]: *** No rule to make target `modules'.  Stop.
make[1]: Leaving directory `/usr/src/kernels/2.6.32-358.6.2.el6.x86_64'
make: *** [kernel] Error 2
error: Bad exit status from /scratch/zan/tmp/rpm-tmp.g3Ax9t (%build)

Login to leave a comment.