impi-4.0.0.007 run on IB network with "dapl_cma_active: ARP_ERR, retries(15) exhausted"

impi-4.0.0.007 run on IB network with "dapl_cma_active: ARP_ERR, retries(15) exhausted"

Hello,

I am encounterd a problem that did not show up before.

nod727:16036: dapl_cma_active: ARP_ERR, retries(15) exhausted -> DST 172.40.108.10,11233

I am using these options:

-genv I_MPI_PIN 0
-genv I_MPI_FALLBACK_DEVICE 0
-genv I_MPI_RDMA_RNDV_WRITE 1
-genv I_MPI_RDMA_MAX_MSG_SIZE 4194304
-genv I_MPI_DEVICE rdssm:OpenIB-mlx4_0-1
-genv I_MPI_DEBUG +2

Please help.

Thanks.

--Terrence Liao

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hello,
I found the problem. It was due to a faulty Qlogic swith (only one year old!!) that no longer be able to initiate the connection inONE direction (i.e. nodeA -> nodeB unreachable). Once ping from the reverse, (i.e. from nodeB ping nodeA it ping O.K.), the un-initialte connection (nodeA -> nodeB) comes to live.
-- Terrence

Hi Terrence,

Thank you for sharing your finding with people.

Could you tell me why you use I_MPI_PIN=0. Could you check performance with I_MPI_PIN set to 0 and 1?
Intel MPI Library starting from 4.0 works with I_MPI_FABRICS instead of I_MPI_DEVICE. The format is: I_MPI_FABRICS=shm:dapl. You can also use 'shm:tcp', 'shm:ofa', 'shm:tmi' if tmi is supported (Qlogic and Myrinet only). Provider can be set by I_MPI_DAPL_PROVIDER enviroment variable.

Regards!
Dmitry

Dmitry,

For I_MPI_PIN=0, it is for historical reason, that the code uses OpenMP too. In the early day of our developement, to make sure threads use all avail cores, we use this env to make sure no process pinning.

-- Terrebce

Hi Terrence,

Intel MPI Library version 4.0 handles hybrid (MPI+openMP) applications much better than previous versions. You can use I_MPI_PIN and set I_MPI_PIN_DOMAIN env variable. You can find detailed description in the Reference Manual chapter 3.2 (especially 3.2.3). The idea is to place one MPI process in one domain and all other free cores will be used by openMP threads.

As an example:
$ export OMP_NUM_THREADS=4
$ export I_MPI_FABRICS=shm:dapl
$ export KMP_AFFINITY=compact

$ mpirun -perhost 4 -n ./ap_name

Please give a try and compare performance.

BTW: 4.0 Update 1 is available and shows even better performance.

Regards!
Dmitry

Leave a Comment

Please sign in to add a comment. Not a member? Join today