Intel MPI 3.2 issue - "open_hca: getaddr_netdev ERROR: Connection refused. Is ib0 configured?"

Intel MPI 3.2 issue - "open_hca: getaddr_netdev ERROR: Connection refused. Is ib0 configured?"

Hi,

I got one issue when upgrading Intel MPI library from 3.1 to 3.2?
For the same source code, there's issue when linking 3.1 library.
The error log is below:

hpc-p-19:18677: open_hca: getaddr_netdev ERROR: Connection refused. Is ib0 configured?
hpc-p-17:18094: open_hca: getaddr_netdev ERROR: Connection refused. Is ib0 configured?
hpc-p-5:18168: open_hca: getaddr_netdev ERROR: Connection refused. Is ib0 configured?
hpc-p-6:18157: open_hca: getaddr_netdev ERROR: Connection refused. Is ib0 configured?
hpc-p-15:18059: open_hca: getaddr_netdev ERROR: Connection refused. Is ib0 configured?
.............
[12] MPI startup(): DAPL provider OpenIB-mthca0-1 specified in DAPL configuration file /etc/dat.conf
[98] MPI startup(): DAPL provider OpenIB-mthca0-1 specified in DAPL configuration file /etc/dat.conf
[77] MPI startup(): DAPL provider OpenIB-mthca0-1 specified in DAPL configuration file /etc/dat.conf

Thanks!

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Quoting - bamboo7413
Hi,

Sorrry for typo. It should be:
For the same source code, there's issue when linking 3.2 library.
But it's OK if linking 3.1 library.

Thanks!

Quoting - bamboo7413
Hi,

I got one issue when upgrading Intel MPI library from 3.1 to 3.2?
For the same source code, there's issue when linking 3.1 library.
The error log is below:

hpc-p-19:18677: open_hca: getaddr_netdev ERROR: Connection refused. Is ib0 configured?
hpc-p-17:18094: open_hca: getaddr_netdev ERROR: Connection refused. Is ib0 configured?
hpc-p-5:18168: open_hca: getaddr_netdev ERROR: Connection refused. Is ib0 configured?
hpc-p-6:18157: open_hca: getaddr_netdev ERROR: Connection refused. Is ib0 configured?
hpc-p-15:18059: open_hca: getaddr_netdev ERROR: Connection refused. Is ib0 configured?
.............
[12] MPI startup(): DAPL provider OpenIB-mthca0-1 specified in DAPL configuration file /etc/dat.conf
[98] MPI startup(): DAPL provider OpenIB-mthca0-1 specified in DAPL configuration file /etc/dat.conf
[77] MPI startup(): DAPL provider OpenIB-mthca0-1 specified in DAPL configuration file /etc/dat.conf

Thanks!

Hi bamboo7413,

Thanks for the interest to Intel MPI Library.
It seems to me that something wrong with your environment or settings. Message "open_hca: getaddr_netdev ERROR" goes from DAPL library but not from MPI and should not depend on the MPI version.

Could you provide dat.conf and your command line? Also you can try to run your applicatin with '-genv I_MPI_DEBUG 2' to get additional debug information from the MPI library.

Regards!
Dmitry

Quoting - Dmitry Kuzmin (Intel)

Hi bamboo7413,

Thanks for the interest to Intel MPI Library.
It seems to me that something wrong with your environment or settings. Message "open_hca: getaddr_netdev ERROR" goes from DAPL library but not from MPI and should not depend on the MPI version.

Could you provide dat.conf and your command line? Also you can try to run your applicatin with '-genv I_MPI_DEBUG 2' to get additional debug information from the MPI library.

Regards!
Dmitry

Thanks, here's something that you need. Could you please help me to locate the issue?
1) dat.conf
OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib0 0" ""
OpenIB-cma-1 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib1 0" ""
OpenIB-mthca0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mthca0 1" ""
OpenIB-mthca0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mthca0 2" ""
OpenIB-mlx4_0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mlx4_0 1" ""
OpenIB-mlx4_0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mlx4_0 2" ""
OpenIB-ipath0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "ipath0 1" ""
OpenIB-ipath0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "ipath0 2" ""
OpenIB-ehca0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "ehca0 1" ""
OpenIB-iwarp u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "eth2 0" ""

2) command line
time mpirun -f $LSB_DJOB_HOSTFILE -r ssh -env I_MPI_DEBUG 3 -np $LSB_DJOB_NUMPROC _our_mpi_program_

Any commnet?

Hi bamboo7413,

Could you try to comment out first 2 lines in your dat.conf file?
Like:

Thanks, here's something that you need. Could you please help me to locate the issue?
1) dat.conf
#OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib0 0" ""
#OpenIB-cma-1 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib1 0" ""
OpenIB-mthca0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mthca0 1" ""
OpenIB-mthca0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mthca0 2" ""

and let me know if it helps.

Regards!
Dmitry

Leave a Comment

Please sign in to add a comment. Not a member? Join today