intel mpi and infiniband udapl

intel mpi and infiniband udapl

hi,

I am trying to use the Intel compilers and mpi libraries to run over
infiniband.
From the documentation and also from all the searches I did on the Intel
forums I could not figure out what the problem might be. We are running
a small test with 8 nodes connected via infiniband. I can ping all the
nodes and startup mpd on all of then via IP over IB:

hpcp5551(salmr0)192:mpdtrace
192.168.0.1
192.168.0.5
192.168.0.4
192.168.0.3
192.168.0.2
192.168.0.8
192.168.0.7
192.168.0.6

I can run fine using the "sock" network fabric or IP over IB:
hpcp5551(salmr0)193:mpiexec -genv I_MPI_DEVICE sock -n 8 ./cpi
Process 0 on 192.168.0.1
Process 2 on 192.168.0.4
Process 1 on 192.168.0.5
Process 3 on 192.168.0.3
Process 4 on 192.168.0.2
Process 5 on 192.168.0.8
Process 6 on 192.168.0.7
Process 7 on 192.168.0.6
pi is approximately 3.1416009869231245, Error is 0.0000083333333314
wall clock time = 0.007859

The problem is when I try to run over the native IB fabric using the
"rdma" network fabric:

hpcp5551(salmr0)194:mpiexec -genv I_MPI_DEVICE rdma:OpenIB-cma -n 8 -env
I_MPI_DEBUG 2 ./cpi
rank 4 in job 9 192.168.0.1_35933 caused collective abort of all
ranks
exit status of rank 4: killed by signal 11
rank 1 in job 9 192.168.0.1_35933 caused collective abort of all
ranks
exit status of rank 1: killed by signal 11
rank 0 in job 9 192.168.0.1_35933 caused collective abort of all
ranks
exit status of rank 0: killed by signal 11

I have the correct entries in /etc/dat.conf:
hpcp5551:~ # tail /etc/dat.conf
# Simple (OpenIB-cma) default with netdev name provided first on list
# to enable use of same dat.conf version on all nodes
#
# Add examples for multiple interfaces and IPoIB HA fail over, and
bonding
#
OpenIB-cma u1.2 nonthreadsafe
default /usr/local/ofed/lib64/libdaplcma.so dapl.1.2 "ib0 0" ""
OpenIB-cma-1 u1.2 nonthreadsafe
default /usr/local/ofed/lib64/libdaplcma.so dapl.1.2 "ib1 0" ""
OpenIB-cma-2 u1.2 nonthreadsafe
default /usr
/local/ofed/lib64/libdaplcma.so dapl.1.2 "ib2 0" ""
OpenIB-cma-3 u1.2 nonthreadsafe
default /usr/local/ofed/lib64/libdaplcma.so dapl.1.2 "ib3 0" ""
OpenIB-bond u1.2 nonthreadsafe
default /usr/local/ofed/lib64/libdaplcma.so dapl.1.2 "bond0 0" ""

hpcp5551:~ # ls -l /usr/local/ofed/lib64/libdaplcma.so
lrwxrwxrwx 1 root root 19 Jan 18
17:20 /usr/local/ofed/lib64/libdaplcma.so -> libdaplcma.so.1.0.2

hpcp5551:~ # ifconfig ib0
ib0 Link encap:UNSPEC HWaddr
80-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00
inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::208:f104:398:2999/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1
RX packets:851583 errors:0 dropped:0 overruns:0 frame:0
TX packets:824427 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:128
RX bytes:11834748000 (11286.4 Mb) TX bytes:11786736324
(11240.7 Mb)

Is there any way to get mode debug or verbose messages out of mpiexec or
mpirun so that it can maybe provide me with a hit as to what the problem
might be?

This is with OFED 1.2.5.4

Thanks
Rene

publicaciones de 8 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

export I_MPI_DEBUG=2 (or whatever level of verbosity you want)

Thanks for the reply. I guess I should have mentioned that on my post.
I did try the I_MPI_DEBUG 2 option with various levels but don't seem to get any more info that what I originally posted.

hpcp5551(salmr0)196:setenv I_MPI_DEBUG 2
hpcp5551(salmr0)197:mpiexec -genv I_MPI_DEVICE rdma:OpenIB-cma -n 8 ./cpi
rank 3 in job 11 192.168.0.1_35933 caused collective abort of all ranks
exit status of rank 3: killed by signal 11

hpcp5551(salmr0)198:setenv I_MPI_DEBUG 4
hpcp5551(salmr0)199:mpiexec -genv I_MPI_DEVICE rdma:OpenIB-cma -n 8 ./cpi
rank 3 in job 12 192.168.0.1_35933 caused collective abort of all ranks
exit status of rank 3: killed by signal 11

hpcp5551(salmr0)200:mpiexec -genv I_MPI_DEVICE rdma:OpenIB-cma -n 8 -env I_MPI_DEBUG 3 ./cpi
rank 0 in job 13 192.168.0.1_35933 caused collective abort of all ranks
exit status of rank 0: killed by signal 11

Any other ideas? Is ther a way to check if I have the right updapl libs installed other then looking for /usr/local/ofed/lib64/libdaplcma.so?

Thanks
Rene

Hi Rene,

Did you able to run dapltest program on your cluster? Do I understand right that you did not get additional debug information even if cpi was linked against debug version of MPI library?

Best regards,

Andrey

Hi,

i guess i was not asking for enough debug info. I tried debug levels of 2,3,4 and was getting nowhere. Once i increased to level 10 or above i got a bit more useful info.

I think we found the problem. We like to compile things statically here
so we would typically do something like this:

hpcp5551(salmr0)77:mpicc -static cpi.c
hpcp5551(salmr0)108:ldd a.out
not a dynamic executable

and this works fine and we can run it anywhere over gigabit ethernet or
using the sock interface over IB.

If we do the same and try to run over IB we get nowhere as you can see
from the previous post

But for some reason if we compile with the "-static_mpi" flag things
seem to work.

hpcp5551(salmr0)109:mpicc -static_mpi cpi.c
hpcp5551(salmr0)110:ldd a.out
librt.so.1 => /lib64/librt.so.1 (0x00002b666073b000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b6660844000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002b666095a000)
libc.so.6 => /lib64/libc.so.6 (0x00002b6660a5f000)
/lib64/ld-linux-x86-64.so.2 (0x00002b666061e000)

hpcp5551(salmr0)111:mpiexec -genv I_MPI_DEVICE rdma:OpenIB-cma -np 2
-env I_MPI_DEBUG 10 a.out
[0] MPI startup(): DAPL provider OpenIB-cma
[1] MPI startup(): DAPL provider OpenIB-cma
[0] MPI startup(): RDMA data transfer mode
[0] MPI Startup(): process is pinned to CPU00 on node hpcp5551
[1] MPI startup(): RDMA data transfer mode
[1] MPI Startup(): process is pinned to CPU00 on node hpcp5555
Process 1 on 192.168.0.5
Process 0 on 192.168.0.1
[0] Rank Pid Pin cpu Node name
[0] 0 7515 0 hpcp5551
[0] 1 5192 0 hpcp5555
[0] Init(): I_MPI_DEBUG=10
[0] Init(): I_MPI_DEVICE=rdma
pi is approximately 3.1416009869231241, Error is 0.0000083333333309
wall clock time = 0.000111

The only problem is the a.out executable is really not static it still
had the need for some libs to be loaded dynamically. What are the flags
or options we need to generate a true static executable that would run
over IB?

thanks
Rene

mpicc -static should have the same effect as gcc -static in choosing static versions of libraries known to gcc. As you figured out, -static_mpi controls the choice of Intel mpi libraries. According to your stated requirement, you would want to use both options.

Hi,

thanks for the reply. Yes I can compile using both flags just fine but if I do that I can not loger run the executable over IB. Here is an example.

Compile semi statically just using -static_mpi works fine:
----------------------------------------------------------
hpcp5551(salmr0)140:mpicc -static_mpi cpi.c
hpcp5551(salmr0)141:ldd a.out
librt.so.1 => /lib64/librt.so.1 (0x00002b3805bbe000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b3805cc7000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002b3805ddd000)
libc.so.6 => /lib64/libc.so.6 (0x00002b3805ee2000)
/lib64/ld-linux-x86-64.so.2 (0x00002b3805aa1000)
hpcp5551(salmr0)142:mpiexec -genv I_MPI_DEVICE rdma:OpenIB-cma -np 2 -env I_MPI_DEBUG 10 a.out
[0] MPI startup(): DAPL provider OpenIB-cma
[1] MPI startup(): DAPL provider OpenIB-cma
[0] MPI startup(): RDMA data transfer mode
[0] MPI Startup(): process is pinned to CPU00 on node hpcp5551
[1] MPI startup(): RDMA data transfer mode
[1] MPI Startup(): process is pinned to CPU00 on node hpcp5555
Process 1 on 192.168.0.5
[0] Rank Pid Pin cpu Node name
[0] 0 23443 0 hpcp5551
[0] 1 19241 0 hpcp5555
[0] Init(): I_MPI_DEBUG=10
[0] Init(): I_MPI_DEVICE=rdma
Process 0 on 192.168.0.1
pi is approximately 3.1416009869231241, Error is 0.0000083333333309
wall clock time = 0.000159

Now we compile using both flags -static_mpi and -static does not run:
--------------------------------------------------------------------------------------
hpcp5551(salmr0)144:mpicc -static_mpi -static cpi.c /opt/intel/impi/3.1/lib64/libmpi.a(I_MPI_wrap_dat.o): In function `I_MPI_dlopen_dat':
I_MPI_wrap_dat.c:(.text+0x30f): warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/opt/intel/impi/3.1/lib64/libmpi.a(rdma_iba_util.o): In function `get_addr_by_host_name':
rdma_iba_util.c:(.text+0x21a): warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/opt/intel/impi/3.1/lib64/libmpi.a(sock.o): In function `MPIDU_Sock_get_host_description':
sock.c:(.text+0x5956): warning: Using 'gethostbyaddr' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/opt/intel/impi/3.1/lib64/libmpi.a(simple_pmi.o): In function `PMII_Connect_to_pm':
simple_pmi.c:(.text+0x29a8): warning: Using 'gethostbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
hpcp5551(salmr0)145:
hpcp5551(salmr0)145:ldd a.out
not a dynamic executable
hpcp5551(salmr0)146:mpiexec -genv I_MPI_DEVICE rdma:OpenIB-cma -np 2 -env I_MPI_DEBUG 10
a.out
rank 1 in job 18 192.168.0.1_54412 caused collective abort of all ranks
exit status of rank 1: killed by signal 11

as you can see the executable does not run when compiled staticaly. Here a more vebose output from debug=100

hpcp5551(salmr0)147:mpiexec -genv I_MPI_DEVICE rdma:OpenIB-cma -np 2 -env I_MPI_DEBUG 100 a.out
[0] MPI startup(): attributes for device:
[0] MPI startup(): NEEDS_LDAT MAYBE
[0] MPI startup(): HAS_COLLECTIVES (null)
[0] MPI startup(): I_MPI_LIBRARY_VERSION 3.1
[0] MPI startup(): I_MPI_VERSION_DATE_OF_BUILD Fri Oct 5 15:41:02 MSD 2007
[0] MPI startup(): I_MPI_VERSION_PKGNAME_UNTARRED mpi_src.32.svsmpi004.20071005
[0] MPI startup(): I_MPI_VERSION_MY_CMD_NAME_CVS_ID ./BUILD_MPI.sh version: BUILD_MPI.sh,v 1.102 2007/09/13 07:41:42 Exp $
[0] MPI startup(): I_MPI_VERSION_MY_CMD_LINE ./BUILD_MPI.sh -pkg_name mpi_src.32.svsmpi004.20071005.tar.gz -explode -explode_dirname mpi2.32e.svsmpi020.20071005 -all -copyout -noinstall
[0] MPI startup(): I_MPI_VERSION_MACHINENAME svsmpi020
[0] MPI startup(): I_MPI_DEVICE_VERSION 3.1.20071005
[0] MPI startup(): I_MPI_GCC_VERSION 3.4.4 20050721 (Red Hat 3.4.4-2)
[0] MPI startup(): I_MPI_ICC_VERSION Version 9.1 Beta Build 20060131 Package ID: l_cc_bc_9.1.023
[0] MPI startup(): I_MPI_IFORT_VERSION Version 9.1 Beta Build 20060131 Package ID: l_fc_bc_9.1.020
[0] MPI startup(): attributes for device:
[0] MPI startup(): NEEDS_LDAT MAYBE
[0] MPI startup(): HAS_COLLECTIVES (null)
[0] MPI startup(): I_MPI_LIBRARY_VERSION 3.1
[0] MPI startup(): I_MPI_VERSION_DATE_OF_BUILD Fri Oct 5 15:41:02 MSD 2007
[0] MPI startup(): I_MPI_VERSION_PKGNAME_UNTARRED mpi_src.32.svsmpi004.20071005
[0] MPI startup(): I_MPI_VERSION_MY_CMD_NAME_CVS_ID ./BUILD_MPI.sh version: BUILD_MPI.sh,v 1.102 2007/09/13 07:41:42 Exp $
[0] MPI startup(): I_MPI_VERSION_MY_CMD_LINE ./BUILD_MPI.sh -pkg_name mpi_src.32.svsmpi004.20071005.tar.gz -explode -explode_dirname mpi2.32e.svsmpi020.20071005 -all -copyout -noinstall
[0] MPI startup(): I_MPI_VERSION_MACHINENAME svsmpi020
[0] MPI startup(): I_MPI_DEVICE_VERSION 3.1.20071005
[0] MPI startup(): I_MPI_GCC_VERSION 3.4.4 20050721 (Red Hat 3.4.4-2)
[0] MPI startup(): I_MPI_ICC_VERSION Version 9.1 Beta Build 20060131 Package ID: l_cc_bc_9.1.023
[0] MPI startup(): I_MPI_IFORT_VERSION Version 9.1 Beta Build 20060131 Package ID: l_fc_bc_9.1.020
[0] I_MPI_dlopen_dat(): trying to dlopen default -ldat: libdat.so
[0] my_dlopen(): trying to dlopen: libdat.so
rank 0 in job 19 192.168.0.1_54412 caused collective abort of all ranks
exit status of rank 0: killed by signal 11

Thanks
Rene

Rene,

You can not build true static executable that would run over IB with 100% garantee. It is due to libc runtime limitations. There isdlopen() call inside MPI library which requires presence of the same runtime on the other cluster. Probably you saw warning messages when tried the mpicc -static option.

Best regards,

Andrey

Inicie sesión para dejar un comentario.