Intel MPI 3.0 over IB (uDAPL)

Intel MPI 3.0 over IB (uDAPL)

Anybody could help me to run Intel MPI on IB?

My steps was:
1. Got Intel MPI 3.0 Evaluation for 30 days
2. Install it on shared directory
3. Configure password-less SSH between nodes
4. Configure (for other purposes) IBoIP - confirmed working
5. Compiled test MPI application - comes with Intel MPI

Now it works over Ethernet for can't run it over IB:

$ mpirun -n 4 -r ssh /gpfs/loadl/HPL/prefix/intel/mpi/3.0/test/test
Hello world: rank 0 of 4 running on n1
Hello world: rank 1 of 4 running on n3
Hello world: rank 2 of 4 running on n4
Hello world: rank 3 of 4 running on n2

$ mpirun -n 4 -r ssh -env I_MPI_DEVICE rdssm:OpenIB-cma -env I_MPI_FALLBACK_DEVICE 0 -env I_MPI_DEBUG 5 /gpfs/loadl/HPL/prefix/intel/mpi/3.0/test/test
[0] DAPL provider is not found and fallback device is not enabled
[cli_0]: aborting job:
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(925): Initialization failed
MPIDD_Init(95).......: channel initialization failed
MPIDI_CH3_Init(144)..: generic failure with errno = -1
(unknown)():
rank 3 in job 1 n1_36568 caused collective abort of all ranks
exit status of rank 3: return code 13
[output from other nodes skipped]

My IB configuration: OFED 1.2.5 from Cisco:
OFED-1.2.5

ofa_kernel-1.2.5:
Git:
git://git.openfabrics.org/ofed_1_2/linux-2.6.git ofed_1_2_c
commit 21ec9ff84cba24ea6e53a268da21a72e6ab190d0

ofa_user-1.2.5:
libibverbs:
git://git.kernel.org/pub/scm/libs/infiniband/libibverbs.git master
commit d5052fa0bf8180be9edf1c4c1c014dde01f8a4dd
libmthca:
git://git.kernel.org/pub/scm/libs/infiniband/libmthca.git master
commit f29c1d8a198a8d7f322c3924205a62770a9862a3
libmlx4:
git://git.kernel.org/pub/scm/libs/infiniband/libmlx4.git master
commit fc9edce51069fd38e33c9e627d9a89bc1e329b67
libehca:
git://git.openfabrics.org/ofed_1_2/libehca.git ofed_1_2
commit 00b26973092c949b11b8372eb027059fda7a8061
libipathverbs:
git://git.openfabrics.org/ofed_1_2/libipathverbs.git ofed_1_2
commit 15f62c3f045295dd2a941ae8d4e0e36035aad5cf
tvflash:
git://git.openfabrics.org/ofed_1_2/tvflash.git ofed_1_2
commit e0a0903b2a998a397ada053554fd678ed7914cc6
libibcm:
git://git.openfabrics.org/ofed_1_2/libibcm.git ofed_1_2
commit 8154d4d57f69789be6d26fdc8f10b552c83a87ec
libsdp:
git://git.openfabrics.org/ofed_1_2/libsdp.git ofed_1_2
commit 9e1c2cce1cbe030bf8fc9c03db4e80a703946af1
mstflint:
git://git.openfabrics.org/~mst/mstflint.git master
commit a9579dfbd259133cb50bf6b12ff247d5a04a9473
perftest:
git://git.openfabrics.org/~mst/perftest.git master
commit 20ea8b29537dda3f0a217b95ac50a0aaa7b24477
srptools:
git://git.openfabrics.org/ofed_1_2/srptools.git ofed_1_2
commit 883a08f0db168f4eb20293552f6416529da982f1
ipoibtools:
git://git.openfabrics.org/ofed_1_2/ipoibtools.git ofed_1_2
commit e29da6049cb725b175423fddc80181980ebfa0b4
librdmacm:
git://git.openfabrics.org/ofed_1_2/librdmacm.git ofed_1_2
commit 87b2be8cf17cca4f2212c32ecfd06c35d7ac7719
dapl:
git://git.openfabrics.org/ofed_1_2/dapl.git ofed_1_2
commit 3654c6ef425f94b9f27a593b0b8c1f3d7cc39029
management:
git://git.openfabrics.org/ofed_1_2/management.git ofed_1_2
commit 46bdba974ee2e1c8a64101effdb7358fd9060c8b
libcxgb3:
git://git.openfabrics.org/ofed_1_2/libcxgb3.git ofed_1_2
commit f97d
cedc6d5af5c222542d69755ad4193f2114fc
qlvnictools:
git://git.openfabrics.org/ofed_1_2/qlvnictools.git ofed_1_2
commit bcfd11d4b5369398f2f816d0e1d89b6e98b25961
sdpnetstat:
git://git.openfabrics.org/ofed_1_2/sdpnetstat.git ofed_1_2
commit d726c17c3b54739ad71e2234c521aa3ee81a5905
ofascripts:
git://git.openfabrics.org/~vlad/ofascripts.git ofed_1_2_c
commit 598684991ff6127dd803540c757f56b289872bef

# MPI
mvapich-0.9.9-1458.src.rpm
mvapich2-0.9.8-15.src.rpm
openmpi-1.2.2-1.src.rpm
mpitests-2.0-705.src.rpm

$ ibv_devinfo
hca_id: mthca0
fw_ver: 4.8.917
node_guid: 0005:ad00:000b:b224
sys_image_guid: 0005:ad00:0100:d050
vendor_id: 0x05ad
vendor_part_id: 25208
hw_ver: 0xA0
board_id: HCA.HSDC.A0.Boot
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 2
port_lid: 6
&
nbsp; port_lmc: 0x00

port: 2
state: PORT_DOWN (1)
max_mtu: 2048 (4)
active_mtu: 512 (2)
sm_lid: 0
port_lid: 0
port_lmc: 0x00

$ cat /etc/dat.conf
#
# DAT 1.2 configuration file
#
# Each entry should have the following fields:
#
#
#
#
# For the uDAPL cma provder, specify as one of the following:
# network address, network hostname, or netdev name and 0 for port
#
# Simple (OpenIB-cma) default with netdev name provided first on list
# to enable use of same dat.conf version on all nodes
#
# Add examples for multiple interfaces and IPoIB HA fail over, and bonding
#
OpenIB-cma u1.2 nonthreadsafe default /usr/lib64/libdaplcma.so dapl.1.2 "ib0 0" ""
OpenIB-cma-1 u1.2 nonthreadsafe default /usr/lib64/libdaplcma.so dapl.1.2 "ib1 0" ""
OpenIB-cma-2 u1.2 nonthreadsafe default /usr/lib64/libdaplcma.so dapl.1.2 "ib2 0" ""
OpenIB-cma-3 u1.2 nonthreadsafe default /usr/lib64/libdaplcma.so dapl.1.2 "ib3 0" ""
OpenIB-bond u1.2 nonthreadsafe default /usr/lib64/libdaplcma.so dapl.1.2 "bond0 0" ""

6 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

My customers don't get much information from Cisco, so we're not sufficiently in the loop. However, I received the following comment this week:

the current topspin release 3.2.0-118 has fixes for uDAPL and Intel MPI, the release notes state:

uDAPL

Fixed uDAPL startup scalability problem when using Intel MPI. (PR

CSCse88951)

Thanks for your prompt reply.

I'm not using old Cisco MPI (actually, it was grabbed Cisco from Topspin and derives from MPICH, as I remember). Cisco now uses OFED. And I trying to run Intel MPI on newest OFED version.

Did you able to run Intel MPI on newest OFED version? The output with higher I_MPI_DEBUG value can be useful if you still have a problems with runs.

After number of unsuccessful attempts, now it works (don't ask me why - I don't know).

Next question is how to compile 64-bit MPI applications with Intel MPI on x86_64 arch?

$ mpicc -o osu_acc_latency-intel-mpi osu_acc_latency.c
$ file osu_acc_latency-intel-mpi
osu_acc_latency-intel-mpi: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5, dynamically linked (uses shared libs), not stripped

Please make sure that you have set 64-bit MPI environment. Source mpivars.[c]sh file from the $install_dir/bin64 directory to be able build 64-bit MPI application. You should also have 64-bit version of gcc compiler as your default gcc compiler while using the mpicc compiler driver.

Best regards,

Andrey

Connectez-vous pour laisser un commentaire.