Intel MPI 3.0 over IB (uDAPL)

Anybody could help me to run Intel MPI on IB?

My steps was:
1. Got Intel MPI 3.0 Evaluation for 30 days
2. Install it on shared directory
3. Configure password-less SSH between nodes
4. Configure (for other purposes) IBoIP - confirmed working
5. Compiled test MPI application - comes with Intel MPI

Now it works over Ethernet for can't run it over IB:

$ mpirun -n 4 -r ssh /gpfs/loadl/HPL/prefix/intel/mpi/3.0/test/test
Hello world: rank 0 of 4 running on n1
Hello world: rank 1 of 4 running on n3
Hello world: rank 2 of 4 running on n4
Hello world: rank 3 of 4 running on n2

$ mpirun -n 4 -r ssh -env I_MPI_DEVICE rdssm:OpenIB-cma -env I_MPI_FALLBACK_DEVICE 0 -env I_MPI_DEBUG 5 /gpfs/loadl/HPL/prefix/intel/mpi/3.0/test/test
[0] DAPL provider is not found and fallback device is not enabled
[cli_0]: aborting job:
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(925): Initialization failed
MPIDD_Init(95).......: channel initialization failed
MPIDI_CH3_Init(144)..: generic failure with errno = -1
rank 3 in job 1 n1_36568 caused collective abort of all ranks
exit status of rank 3: return code 13
[output from other nodes skipped]

My IB configuration: OFED 1.2.5 from Cisco:

$ ibv_devinfo
hca_id: mthca0
fw_ver: 4.8.917
node_guid: 0005:ad00:000b:b224
sys_image_guid: 0005:ad00:0100:d050
vendor_id: 0x05ad
vendor_part_id: 25208
hw_ver: 0xA0
board_id: HCA.HSDC.A0.Boot
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 2
port_lid: 6
nbsp; port_lmc: 0x00

port: 2
state: PORT_DOWN (1)
max_mtu: 2048 (4)
active_mtu: 512 (2)
sm_lid: 0
port_lid: 0
port_lmc: 0x00

$ cat /etc/dat.conf
# DAT 1.2 configuration file
# Each entry should have the following fields:
# For the uDAPL cma provder, specify as one of the following:
# network address, network hostname, or netdev name and 0 for port
# Simple (OpenIB-cma) default with netdev name provided first on list
# to enable use of same dat.conf version on all nodes
# Add examples for multiple interfaces and IPoIB HA fail over, and bonding
OpenIB-cma u1.2 nonthreadsafe default /usr/lib64/ dapl.1.2 "ib0 0" ""
OpenIB-cma-1 u1.2 nonthreadsafe default /usr/lib64/ dapl.1.2 "ib1 0" ""
OpenIB-cma-2 u1.2 nonthreadsafe default /usr/lib64/ dapl.1.2 "ib2 0" ""
OpenIB-cma-3 u1.2 nonthreadsafe default /usr/lib64/ dapl.1.2 "ib3 0" ""
OpenIB-bond u1.2 nonthreadsafe default /usr/lib64/ dapl.1.2 "bond0 0" ""

My customers don't get much information from Cisco, so we're not sufficiently in the loop. However, I received the following comment this week:

the current topspin release 3.2.0-118 has fixes for uDAPL and Intel MPI, the release notes state:


Fixed uDAPL startup scalability problem when using Intel MPI. (PR


Thanks for your prompt reply.

I'm not using old Cisco MPI (actually, it was grabbed Cisco from Topspin and derives from MPICH, as I remember). Cisco now uses OFED. And I trying to run Intel MPI on newest OFED version.

Did you able to run Intel MPI on newest OFED version? The output with higher I_MPI_DEBUG value can be useful if you still have a problems with runs.

After number of unsuccessful attempts, now it works (don't ask me why - I don't know).

Next question is how to compile 64-bit MPI applications with Intel MPI on x86_64 arch?

$ mpicc -o osu_acc_latency-intel-mpi osu_acc_latency.c
$ file osu_acc_latency-intel-mpi
osu_acc_latency-intel-mpi: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5, dynamically linked (uses shared libs), not stripped

Please make sure that you have set 64-bit MPI environment. Source mpivars.[c]sh file from the $install_dir/bin64 directory to be able build 64-bit MPI application. You should also have 64-bit version of gcc compiler as your default gcc compiler while using the mpicc compiler driver.

Best regards,


