intel mpi and infiniband udapl

intel mpi and infiniband udapl

Ritratto di salmr0

hi,

I am trying to use the Intel compilers and mpi libraries to run over

infiniband.
From the documentation and also from all the searches I did on the Intel

forums I could not figure out what the problem might be. We are running

a small test with 8 nodes connected via infiniband. I can ping all the

nodes and startup mpd on all of then via IP over IB:



hpcp5551(salmr0)192:mpdtrace

192.168.0.1

192.168.0.5

192.168.0.4

192.168.0.3

192.168.0.2

192.168.0.8

192.168.0.7

192.168.0.6

I can run fine using the "sock" network fabric or IP over IB:

hpcp5551(salmr0)193:mpiexec -genv I_MPI_DEVICE sock -n 8 ./cpi

Process 0 on 192.168.0.1

Process 2 on 192.168.0.4

Process 1 on 192.168.0.5

Process 3 on 192.168.0.3

Process 4 on 192.168.0.2

Process 5 on 192.168.0.8

Process 6 on 192.168.0.7

Process 7 on 192.168.0.6

pi is approximately 3.1416009869231245, Error is 0.0000083333333314

wall clock time = 0.007859



The problem is when I try to run over the native IB fabric using the

"rdma" network fabric:



hpcp5551(salmr0)194:mpiexec -genv I_MPI_DEVICE rdma:OpenIB-cma -n 8 -env

I_MPI_DEBUG 2 ./cpi

rank 4 in job 9 192.168.0.1_35933 caused collective abort of all

ranks

exit status of rank 4: killed by signal 11

rank 1 in job 9 192.168.0.1_35933 caused collective abort of all

ranks

exit status of rank 1: killed by signal 11

rank 0 in job 9 192.168.0.1_35933 caused collective abort of all

ranks

exit status of rank 0: killed by signal 11


I have the correct entries in /etc/dat.conf:

hpcp5551:~ # tail /etc/dat.conf

# Simple (OpenIB-cma) default with netdev name provided first on list

# to enable use of same dat.conf version on all nodes

#

# Add examples for multiple interfaces and IPoIB HA fail over, and

bonding

#

OpenIB-cma u1.2 nonthreadsafe

default /usr/local/ofed/lib64/libdaplcma.so dapl.1.2 "ib0 0" ""

OpenIB-cma-1 u1.2 nonthreadsafe

default /usr/local/ofed/lib64/libdaplcma.so dapl.1.2 "ib1 0" ""

OpenIB-cma-2 u1.2 nonthreadsafe

default /usr
/local/ofed/lib64/libdaplcma.so dapl.1.2 "ib2 0" ""

OpenIB-cma-3 u1.2 nonthreadsafe

default /usr/local/ofed/lib64/libdaplcma.so dapl.1.2 "ib3 0" ""

OpenIB-bond u1.2 nonthreadsafe

default /usr/local/ofed/lib64/libdaplcma.so dapl.1.2 "bond0 0" ""


hpcp5551:~ # ls -l /usr/local/ofed/lib64/libdaplcma.so

lrwxrwxrwx 1 root root 19 Jan 18

17:20 /usr/local/ofed/lib64/libdaplcma.so -> libdaplcma.so.1.0.2





hpcp5551:~ # ifconfig ib0

ib0 Link encap:UNSPEC HWaddr

80-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00

inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.255.0

inet6 addr: fe80::208:f104:398:2999/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1

RX packets:851583 errors:0 dropped:0 overruns:0 frame:0

TX packets:824427 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:128

RX bytes:11834748000 (11286.4 Mb) TX bytes:11786736324

(11240.7 Mb)


Is there any way to get mode debug or verbose messages out of mpiexec or

mpirun so that it can maybe provide me with a hit as to what the problem

might be?


This is with OFED 1.2.5.4


Thanks

Rene

8 post / 0 new
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione
Ritratto di Tim Prince

export I_MPI_DEBUG=2 (or whatever level of verbosity you want)

Ritratto di salmr0

Thanks for the reply. I guess I should have mentioned that on my post.
I did try the I_MPI_DEBUG 2 option with various levels but don't seem to get any more info that what I originally posted.

hpcp5551(salmr0)196:setenv I_MPI_DEBUG 2
hpcp5551(salmr0)197:mpiexec -genv I_MPI_DEVICE rdma:OpenIB-cma -n 8 ./cpi
rank 3 in job 11 192.168.0.1_35933 caused collective abort of all ranks
exit status of rank 3: killed by signal 11

hpcp5551(salmr0)198:setenv I_MPI_DEBUG 4
hpcp5551(salmr0)199:mpiexec -genv I_MPI_DEVICE rdma:OpenIB-cma -n 8 ./cpi
rank 3 in job 12 192.168.0.1_35933 caused collective abort of all ranks
exit status of rank 3: killed by signal 11


hpcp5551(salmr0)200:mpiexec -genv I_MPI_DEVICE rdma:OpenIB-cma -n 8 -env I_MPI_DEBUG 3 ./cpi
rank 0 in job 13 192.168.0.1_35933 caused collective abort of all ranks
exit status of rank 0: killed by signal 11


Any other ideas? Is ther a way to check if I have the right updapl libs installed other then looking for /usr/local/ofed/lib64/libdaplcma.so?


Thanks
Rene


Ritratto di Andrey Derbunovich (Intel)

Hi Rene,


Did you able to run dapltest program on your cluster? Do I understand right that you did not get additional debug information even if cpi was linked against debug version of MPI library?


Best regards,


Andrey

Ritratto di salmr0

Hi,

i guess i was not asking for enough debug info. I tried debug levels of 2,3,4 and was getting nowhere. Once i increased to level 10 or above i got a bit more useful info.

I think we found the problem. We like to compile things statically here

so we would typically do something like this:



hpcp5551(salmr0)77:mpicc -static cpi.c

hpcp5551(salmr0)108:ldd a.out

not a dynamic executable



and this works fine and we can run it anywhere over gigabit ethernet or

using the sock interface over IB.



If we do the same and try to run over IB we get nowhere as you can see

from the previous post



But for some reason if we compile with the "-static_mpi" flag things

seem to work.



hpcp5551(salmr0)109:mpicc -static_mpi cpi.c

hpcp5551(salmr0)110:ldd a.out

librt.so.1 => /lib64/librt.so.1 (0x00002b666073b000)

libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b6660844000)

libdl.so.2 => /lib64/libdl.so.2 (0x00002b666095a000)

libc.so.6 => /lib64/libc.so.6 (0x00002b6660a5f000)

/lib64/ld-linux-x86-64.so.2 (0x00002b666061e000)





hpcp5551(salmr0)111:mpiexec -genv I_MPI_DEVICE rdma:OpenIB-cma -np 2

-env I_MPI_DEBUG 10 a.out

[0] MPI startup(): DAPL provider OpenIB-cma

[1] MPI startup(): DAPL provider OpenIB-cma

[0] MPI startup(): RDMA data transfer mode

[0] MPI Startup(): process is pinned to CPU00 on node hpcp5551

[1] MPI startup(): RDMA data transfer mode

[1] MPI Startup(): process is pinned to CPU00 on node hpcp5555

Process 1 on 192.168.0.5

Process 0 on 192.168.0.1

[0] Rank Pid Pin cpu Node name

[0] 0 7515 0 hpcp5551

[0] 1 5192 0 hpcp5555

[0] Init(): I_MPI_DEBUG=10

[0] Init(): I_MPI_DEVICE=rdma

pi is approximately 3.1416009869231241, Error is 0.0000083333333309

wall clock time = 0.000111







The only problem is the a.out executable is really not static it still

had the need for some libs to be loaded dynamically. What are the flags

or options we need to generate a true static executable that would run

over IB?



thanks

Rene

Ritratto di Tim Prince

mpicc -static should have the same effect as gcc -static in choosing static versions of libraries known to gcc. As you figured out, -static_mpi controls the choice of Intel mpi libraries. According to your stated requirement, you would want to use both options.

Ritratto di salmr0

Hi,

thanks for the reply. Yes I can compile using both flags just fine but if I do that I can not loger run the executable over IB. Here is an example.

Compile semi statically just using -static_mpi works fine:
----------------------------------------------------------
hpcp5551(salmr0)140:mpicc -static_mpi cpi.c
hpcp5551(salmr0)141:ldd a.out
librt.so.1 => /lib64/librt.so.1 (0x00002b3805bbe000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b3805cc7000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002b3805ddd000)
libc.so.6 => /lib64/libc.so.6 (0x00002b3805ee2000)
/lib64/ld-linux-x86-64.so.2 (0x00002b3805aa1000)
hpcp5551(salmr0)142:mpiexec -genv I_MPI_DEVICE rdma:OpenIB-cma -np 2 -env I_MPI_DEBUG 10 a.out
[0] MPI startup(): DAPL provider OpenIB-cma
[1] MPI startup(): DAPL provider OpenIB-cma
[0] MPI startup(): RDMA data transfer mode
[0] MPI Startup(): process is pinned to CPU00 on node hpcp5551
[1] MPI startup(): RDMA data transfer mode
[1] MPI Startup(): process is pinned to CPU00 on node hpcp5555
Process 1 on 192.168.0.5
[0] Rank Pid Pin cpu Node name
[0] 0 23443 0 hpcp5551
[0] 1 19241 0 hpcp5555
[0] Init(): I_MPI_DEBUG=10
[0] Init(): I_MPI_DEVICE=rdma
Process 0 on 192.168.0.1
pi is approximately 3.1416009869231241, Error is 0.0000083333333309
wall clock time = 0.000159


Now we compile using both flags -static_mpi and -static does not run:
--------------------------------------------------------------------------------------
hpcp5551(salmr0)144:mpicc -static_mpi -static cpi.c /opt/intel/impi/3.1/lib64/libmpi.a(I_MPI_wrap_dat.o): In function `I_MPI_dlopen_dat':
I_MPI_wrap_dat.c:(.text+0x30f): warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/opt/intel/impi/3.1/lib64/libmpi.a(rdma_iba_util.o): In function `get_addr_by_host_name':
rdma_iba_util.c:(.text+0x21a): warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/opt/intel/impi/3.1/lib64/libmpi.a(sock.o): In function `MPIDU_Sock_get_host_description':
sock.c:(.text+0x5956): warning: Using 'gethostbyaddr' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/opt/intel/impi/3.1/lib64/libmpi.a(simple_pmi.o): In function `PMII_Connect_to_pm':
simple_pmi.c:(.text+0x29a8): warning: Using 'gethostbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
hpcp5551(salmr0)145:
hpcp5551(salmr0)145:ldd a.out
not a dynamic executable
hpcp5551(salmr0)146:mpiexec -genv I_MPI_DEVICE rdma:OpenIB-cma -np 2 -env I_MPI_DEBUG 10
a.out
rank 1 in job 18 192.168.0.1_54412 caused collective abort of all ranks
exit status of rank 1: killed by signal 11


as you can see the executable does not run when compiled staticaly. Here a more vebose output from debug=100

hpcp5551(salmr0)147:mpiexec -genv I_MPI_DEVICE rdma:OpenIB-cma -np 2 -env I_MPI_DEBUG 100 a.out
[0] MPI startup(): attributes for device:
[0] MPI startup(): NEEDS_LDAT MAYBE
[0] MPI startup(): HAS_COLLECTIVES (null)
[0] MPI startup(): I_MPI_LIBRARY_VERSION 3.1
[0] MPI startup(): I_MPI_VERSION_DATE_OF_BUILD Fri Oct 5 15:41:02 MSD 2007
[0] MPI startup(): I_MPI_VERSION_PKGNAME_UNTARRED mpi_src.32.svsmpi004.20071005
[0] MPI startup(): I_MPI_VERSION_MY_CMD_NAME_CVS_ID ./BUILD_MPI.sh version: BUILD_MPI.sh,v 1.102 2007/09/13 07:41:42 Exp $
[0] MPI startup(): I_MPI_VERSION_MY_CMD_LINE ./BUILD_MPI.sh -pkg_name mpi_src.32.svsmpi004.20071005.tar.gz -explode -explode_dirname mpi2.32e.svsmpi020.20071005 -all -copyout -noinstall
[0] MPI startup(): I_MPI_VERSION_MACHINENAME svsmpi020
[0] MPI startup(): I_MPI_DEVICE_VERSION 3.1.20071005
[0] MPI startup(): I_MPI_GCC_VERSION 3.4.4 20050721 (Red Hat 3.4.4-2)
[0] MPI startup(): I_MPI_ICC_VERSION Version 9.1 Beta Build 20060131 Package ID: l_cc_bc_9.1.023
[0] MPI startup(): I_MPI_IFORT_VERSION Version 9.1 Beta Build 20060131 Package ID: l_fc_bc_9.1.020
[0] MPI startup(): attributes for device:
[0] MPI startup(): NEEDS_LDAT MAYBE
[0] MPI startup(): HAS_COLLECTIVES (null)
[0] MPI startup(): I_MPI_LIBRARY_VERSION 3.1

[0] MPI startup(): I_MPI_VERSION_DATE_OF_BUILD Fri Oct 5 15:41:02 MSD 2007
[0] MPI startup(): I_MPI_VERSION_PKGNAME_UNTARRED mpi_src.32.svsmpi004.20071005
[0] MPI startup(): I_MPI_VERSION_MY_CMD_NAME_CVS_ID ./BUILD_MPI.sh version: BUILD_MPI.sh,v 1.102 2007/09/13 07:41:42 Exp $
[0] MPI startup(): I_MPI_VERSION_MY_CMD_LINE ./BUILD_MPI.sh -pkg_name mpi_src.32.svsmpi004.20071005.tar.gz -explode -explode_dirname mpi2.32e.svsmpi020.20071005 -all -copyout -noinstall
[0] MPI startup(): I_MPI_VERSION_MACHINENAME svsmpi020
[0] MPI startup(): I_MPI_DEVICE_VERSION 3.1.20071005
[0] MPI startup(): I_MPI_GCC_VERSION 3.4.4 20050721 (Red Hat 3.4.4-2)
[0] MPI startup(): I_MPI_ICC_VERSION Version 9.1 Beta Build 20060131 Package ID: l_cc_bc_9.1.023
[0] MPI startup(): I_MPI_IFORT_VERSION Version 9.1 Beta Build 20060131 Package ID: l_fc_bc_9.1.020
[0] I_MPI_dlopen_dat(): trying to dlopen default -ldat: libdat.so
[0] my_dlopen(): trying to dlopen: libdat.so
rank 0 in job 19 192.168.0.1_54412 caused collective abort of all ranks
exit status of rank 0: killed by signal 11


Thanks
Rene







Ritratto di Andrey Derbunovich (Intel)

Rene,


You can not build true static executable that would run over IB with 100% garantee. It is due to libc runtime limitations. There isdlopen() call inside MPI library which requires presence of the same runtime on the other cluster. Probably you saw warning messages when tried the mpicc -static option.


Best regards,


Andrey

Accedere per lasciare un commento.