Intel MPI on Mellanox Infiniband

Intel MPI on Mellanox Infiniband

emoreno@dim.uchile.cl的头像

Hi everybody.

I have several months trying to run Intel MPI on our Itanium cluster with Mellanox Infiniband interconnect with IBGold (It works perfectly over ethernet)

apparently, MPI can't find the DAPL provider. my /etc/dat.conf say:
ib0 u1.2 nonthreadsafe default /opt/ibgd/lib/libdapl.so ri.1.1 "InfiniHost0 1" ""
ib1 u1.2 nonthreadsafe default /opt/ibgd/lib/libdapl.so ri.1.1 "InfiniHost0 2" ""

but when I run a MPI code, I get:
mpiexec -genv I_MPI_DEVICE rdma -env I_MPI_DEBUG 4 -n 2 ./a.out
I_MPI: [0] my_dlopen(): dlopen failed: libmpi.def.so
I_MPI: [0] set_up_devices(): will use static-default device
couldn't open /dev/ts_ua_cm0: No such file or directory

using more debug value, I get something strange:
I_MPI: [0] try_one_device(): trying device: libmpi.rdma.so
I_MPI: [0] my_dlsym(): dlsym for dats_get_ia_handle failed: /usr/lib/libdat.so: undefined symbol: dats_get_ia_handle
I_MPI: [0] can_use_dapl_provider(): returning; DAPL provider not ok to use: ib0
I_MPI: [0] can_use_dapl_provider(): returning; DAPL provider not ok to use: ib1


Anybody have a hint?

Thanks.

3 帖子 / 0 new
最新文章
如需更全面地了解编译器优化,请参阅优化注意事项
Tim Prince的头像

Unfortunately, this is a frequent problem with those DAPL drivers. Some have avoided it by switching to OpenIB gen2.

Community Admin的头像

Which version of the Mellanox IBGD package are you using?

If it is 1.8.0 or later you may have to enable the DAPL before you can use the Intel MPI.

Install the Mellanox package with everything selected. This makes sure you have the DAPL software installed.

The DAPL driver is not enabled by default on these versions. To enable it you need to make a minor change to a file:
/etc/infiniband/openib.conf

Change the answer to loading UDAPL to YES on the copy on the master node. Do the same thing to all of the other nodes in the cluster. Once you have finished I recommend shutting down all of the compute nodes then rebooting the master node. This will run the openib init correctly and you should see the fabric come up as each node is turned on.

Once all of the nodes are up you should be able to use the Intel MPI with the proper switch to utilize the RDMA driver.

登陆并发表评论。