please help me!! return code 254

please help me!! return code 254

I have installed mpi 3.2.2,when I useIntel Linpack ,I have a error ,please see below.

I use Infiniband network.

14 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Zhang,

This error message goes from DAPL library. Is your OFED stack up to date? The latest release is 1.5.1 and you can download it from http://www.openfabrics.org/downloads/OFED/

If the issue still persists with new library, could you provide your /etc/dat.conf file, command line and output of a run with I_MPI_DEBUG set to 5.

Regards!
Dmitry

Thank you very much!
My OFED is 1.4.2

[root@cn001 em64t]# ofed_info

OFED-1.4.2

My dat.conf,please see below!

OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib0 0" ""

OpenIB-cma-1 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib1 0" ""

OpenIB-mthca0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mthca0 1" ""

OpenIB-mthca0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mthca0 2" ""

OpenIB-mlx4_0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mlx4_0 1" ""

OpenIB-mlx4_0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mlx4_0 2" ""

OpenIB-ipath0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "ipath0 1" ""

OpenIB-ipath0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "ipath0 2" ""

OpenIB-ehca0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "ehca0 1" ""

OpenIB-iwarp u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "eth2 0" ""

ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0" ""

ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib1 0" ""

ofa-v2-mthca0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 1" ""

ofa-v2-mthca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 2" ""

ofa-v2-mlx4_0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 1" ""

ofa-v2-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 2" ""

ofa-v2-ipath0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ipath0 1" ""

ofa-v2-ipath0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ipath0 2" ""

ofa-v2-qib0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "qib0 1" ""

ofa-v2-qib0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "qib0 2" ""

ofa-v2-ehca0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ehca0 1" ""

ofa-v2-iwarp u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth2 0" ""

some error informations

[root@cn001 em64t]# mpiexec -n 128 ./xhpl_em64t

================================================================================

HPLinpack 2.0 -- High-Performance Linpack benchmark -- September 10, 2008

Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK

Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK

Modified by Julien Langou, University of Colorado Denver

================================================================================

An explanation of the input/output parameters follows:

T/V : Wall time / encoded variant.

N : The order of the coefficient matrix A.

NB : The partitioning blocking factor.

P : The number of process rows.

Q : The number of process columns.

Time : Time in seconds to solve the linear system.

Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N : 5000 10000

NB : 168

PMAP : Row-major process mapping

P : 8

Q : 16

PFACT : Right

NBMIN : 4

NDIV : 2

RFACT : Crout

BCAST : 1ringM

DEPTH : 0

SWAP : Mix (threshold = 64)

L1 : transposed form

U : transposed form

EQUIL : yes

ALIGN : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.

- The following scaled residual check will be computed:

||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )

- The relative machine precision (eps) is taken to be 1.110223e-16

- Computational tests pass if scaled residuals are less than 16.0

Column=000168 Fraction=0.005 Mflops=54327.78

Column=000336 Fraction=0.010 Mflops=57272.34

Column=000504 Fraction=0.015 Mflops=79439.70

Column=000672 Fraction=0.020 Mflops=98088.18

Column=000840 Fraction=0.025 Mflops=111594.45

Column=001008 Fraction=0.030 Mflops=123415.88

Column=001176 Fraction=0.035 Mflops=132172.10

Column=001344 Fraction=0.040 Mflops=139603.32

Column=001512 Fraction=0.045 Mflops=142278.36

Column=001680 Fraction=0.050 Mflops=146694.84

Column=001848 Fraction=0.055 Mflops=149609.30

Column=002016 Fraction=0.060 Mflops=151598.00

Column=002184 Fraction=0.065 Mflops=151309.32

Column=002352 Fraction=0.070 Mflops=151842.16

Column=002520 Fraction=0.075 Mflops=152216.35

Column=002688 Fraction=0.080 Mflops=152570.72

Column=002856 Fraction=0.085 Mflops=153657.87

Column=003024 Fraction=0.090 Mflops=152353.27

Column=003192 Fraction=0.095 Mflops=151422.48

Column=003360 Fraction=0.100 Mflops=150112.05

Column=003528 Fraction=0.105 Mflops=148068.02

Column=003696 Fraction=0.110 Mflops=146578.42

Column=003864 Fraction=0.115 Mflops=145088.08

cn010:7731: dereg_pd Device or resource busy

rtc_invalidate error 1114112

rank 63 in job 1 cn001_54224 caused collective abort of all ranks

exit status of rank 63: return code 254

[root@cn001 em64t]#

I run mpiexec with I_MPI_DEBUG ,please see below!

[root@cn001 em64t]# mpiexec -genv I_MPI_DEBUG 5 -n 128 ./xhpl_em64t

[2] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[6] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[22] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[1] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[13] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[0] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[24] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[5] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[3] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[4] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[9] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[7] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[17] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[12] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[8] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[47] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[11] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[16] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[28] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[18] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[38] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[20] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[10] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[14] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[15] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[34] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[23] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[21] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[25] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[29] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[37] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[19] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[33] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[26] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[30] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[48] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[35] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[32] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[27] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[31] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[39] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf[36] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[40] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[41] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[49] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[46] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[53] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[42] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[44] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[52] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[45] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[43] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[54] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[50] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[62] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[72] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[51] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[55] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[56] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[68] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[65] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[64] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[82] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[60] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[59] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[73] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[57] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[58] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[69] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[67] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[61] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[63] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[71] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[66] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[74] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[70] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[78] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[80] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[75] MPI startup(): [76] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[77] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[81] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[79] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[83] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[89] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf[93] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[84] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[85] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[88] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[86] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[96] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[87] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[99] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[94] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[95] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[91] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[90] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[92] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[113] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[106] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[109] MPI startup(): [115] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[104] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[125] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[110] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[97] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[107] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf[108] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[105] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[100] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[103] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[111] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[98] MPI startup(): [101] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf[119] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[102] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[112] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[117] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[118] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[116] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[114] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[122] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf[120] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[123] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[127] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[126] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[124] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[121] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf

[0] MPI startup(): RDMA, shared memory, and socket data transfer modes

[2] MPI startup(): RDMA, shared memory, and socket data transfer modes

[1] MPI startup(): RDMA, shared memory, and socket data transfer modes

[0] MPI Startup(): process is pinned to CPU01 on node cn001

[2] MPI Startup(): process is pinned to CPU03 on node cn001

[4] MPI startup(): RDMA, shared memory, and socket data transfer modes

[3] MPI startup(): RDMA, shared memory, and socket data transfer modes

[1] MPI Startup(): process is pinned to CPU05 on node cn001

[4] MPI Startup(): process is pinned to CPU00 on node cn001

[3] MPI Startup(): process is pinned to CPU07 on node cn001

[5] MPI startup(): RDMA, shared memory, and socket data transfer modes

[7] MPI startup(): RDMA, shared memory, and socket data transfer modes

[8] MPI startup(): RDMA, shared memory, and socket data transfer modes

[8] MPI Startup(): process is pinned to CPU01 on node cn003

[6] MPI startup(): RDMA, shared memory, and socket data transfer modes

[6] MPI Startup(): process is pinned to CPU02 on node cn001

[9] MPI startup(): RDMA, shared memory, and socket data transfer modes

[9] MPI Startup(): process is pinned to CPU05 on node cn003[10] MPI startup(): RDMA, shared memory, and socket data transfer modes

[10] MPI Startup(): process is pinned to CPU03 on node cn003

[7] MPI Startup(): process is pinned to CPU06 on node cn001

[5] MPI Startup(): process is pinned to CPU04 on node cn001

[11] MPI startup(): RDMA, shared memory, and socket data transfer modes

[11] MPI Startup(): process is pinned to CPU07 on node cn003

[12] MPI startup(): RDMA, shared memory, and socket data transfer modes

[13] MPI startup(): RDMA, shared memory, and socket data transfer modes

[12] MPI Startup(): process is pinned to CPU00 on node cn003

[14] MPI startup(): RDMA, shared memory, and socket data transfer modes

[15] MPI startup(): RDMA, shared memory, and socket data transfer modes

[14] MPI Startup(): process is pinned to CPU02 on node cn003

[15] MPI Startup(): process is pinned to CPU06 on node cn003

[16] MPI startup(): RDMA, shared memory, and socket data transfer modes

[13] MPI Startup(): process is pinned to CPU04 on node cn003

[17] MPI startup(): RDMA, shared memory, and socket data transfer modes

[16] MPI Startup(): process is pinned to CPU01 on node cn016

[19] MPI startup(): RDMA, shared memory, and socket data transfer modes

[20] MPI startup(): RDMA, shared memory, and socket data transfer modes

[18] MPI startup(): RDMA, shared memory, and socket data transfer modes

[21] MPI startup(): RDMA, shared memory, and socket data transfer modes

[17] MPI Startup(): process is pinned to CPU05 on node cn016

[19] MPI Startup(): process is pinned to CPU07 on node cn016

[20] MPI Startup(): process is pinned to CPU00 on node cn016

[24] MPI startup(): RDMA, shared memory, and socket data transfer modes

[22] MPI startup(): RDMA, shared memory, and socket data transfer modes

[23] MPI startup(): RDMA, shared memory, and socket data transfer modes

[24] MPI Startup(): process is pinned to CPU01 on node cn015

[25] MPI startup(): RDMA, shared memory, and socket data transfer modes

[21] MPI Startup(): process is pinned to CPU04 on node cn016

[27] MPI startup(): RDMA, shared memory, and socket data transfer modes

[18] MPI Startup(): process is pinned to CPU03 on node cn016

[22] MPI Startup(): process is pinned to CPU02 on node cn016[28] MPI startup(): RDMA, shared memory, and socket data transfer modes

[31] MPI startup(): RDMA, shared memory, and socket data transfer modes

[23] MPI Startup(): process is pinned to CPU06 on node cn016

[26] MPI startup(): RDMA, shared memory, and socket data transfer modes

[26] MPI Startup(): process is pinned to CPU03 on node cn015

[25] MPI Startup(): process is pinned to CPU05 on node cn015

[29] MPI startup(): RDMA, shared memory, and socket data transfer modes

[30] MPI startup(): RDMA, shared memory, and socket data transfer modes

[32] MPI startup(): RDMA, shared memory, and socket data transfer modes

[31] MPI Startup(): process is pinned to CPU06 on node cn015

[30] MPI Startup(): process is pinned to CPU02 on node cn015

[34] MPI startup(): RDMA, shared memory, and socket data transfer modes

[27] MPI Startup(): process is pinned to CPU07 on node cn015

[33] MPI startup(): RDMA, shared memory, and socket data transfer modes

[29] MPI Startup(): [32] MPI Startup(): process is pinned to CPU01 on node cn014

[35] MPI startup(): RDMA, shared memory, and socket data transfer modes

[36] MPI startup(): RDMA, shared memory, and socket data transfer modes

process is pinned to CPU04 on node cn015

[28] MPI Startup(): process is pinned to CPU00 on node cn015

[33] MPI Startup(): process is pinned to CPU05 on node cn014

[37] MPI startup(): RDMA, shared memory, and socket data transfer modes

[38] MPI startup(): RDMA, shared memory, and socket data transfer modes

[39] MPI startup(): RDMA, shared memory, and socket data transfer modes

[40] MPI startup(): RDMA, shared memory, and socket data transfer modes

[41] MPI startup(): RDMA, shared memory, and socket data transfer modes

[36] MPI Startup(): process is pinned to CPU00 on node cn014

[42] MPI startup(): RDMA, shared memory, and socket data transfer modes

[34] MPI Startup(): process is pinned to CPU03 on node cn014

[40] MPI Startup(): process is pinned to CPU01 on node cn004

[41] MPI Startup(): process is pinned to CPU05 on node cn004

[37] MPI Startup(): process is pinned to CPU04 on node cn014

[44] MPI startup(): RDMA, shared memory, and socket data transfer modes

[38] MPI Startup(): process is pinned to CPU02 on node cn014

[43] MPI startup(): RDMA, shared memory, and socket data transfer modes

[42] MPI Startup(): process is pinned to CPU03 on node cn004

[39] MPI Startup(): process is pinned to CPU06 on node cn014

[43] MPI Startup(): process is pinned to CPU07 on node cn004

[35] MPI Startup(): process is pinned to CPU07 on node cn014

[44] MPI Startup(): process is pinned to CPU00 on node cn004

[45] MPI startup(): RDMA, shared memory, and socket data transfer modes

[46] MPI startup(): RDMA, shared memory, and socket data transfer modes

[46] MPI Startup(): process is pinned to CPU02 on node cn004

[45] MPI Startup(): process is pinned to CPU04 on node cn004

[47] MPI startup(): RDMA, shared memory, and socket data transfer modes

[48] MPI startup(): RDMA, shared memory, and socket data transfer modes

[47] MPI Startup(): process is pinned to CPU06 on node cn004

[49] MPI startup(): RDMA, shared memory, and socket data transfer modes

[48] MPI Startup(): process is pinned to CPU01 on node cn011

[51] MPI startup(): RDMA, shared memory, and socket data transfer modes

[49] MPI Startup(): process is pinned to CPU05 on node cn011

[51] MPI Startup(): process is pinned to CPU07 on node cn011

[50] MPI startup(): RDMA, shared memory, and socket data transfer modes

[50] MPI Startup(): process is pinned to CPU03 on node cn011

[52] MPI startup(): RDMA, shared memory, and socket data transfer modes

[53] MPI startup(): RDMA, shared memory, and socket data transfer modes

[52] MPI Startup(): process is pinned to CPU00 on node cn011

[53] MPI Startup(): process is pinned to CPU04 on node cn011

[54] MPI startup(): RDMA, shared memory, and socket data transfer modes

[54] MPI Startup(): process is pinned to CPU02 on node cn011

[55] MPI startup(): RDMA, shared memory, and socket data transfer modes

[56] MPI startup(): RDMA, shared memory, and socket data transfer modes

[55] MPI Startup(): process is pinned to CPU06 on node cn011

[58] MPI startup(): RDMA, shared memory, and socket data transfer modes

[57] MPI startup(): RDMA, shared memory, and socket data transfer modes

[56] MPI Startup(): process is pinned to CPU01 on node cn010

[58] MPI Startup(): process is pinned to CPU03 on node cn010

[57] MPI Startup(): [60] MPI startup(): RDMA, shared memory, and socket data transfer modes

process is pinned to CPU05 on node cn010

[59] MPI startup(): RDMA, shared memory, and socket data transfer modes

[59] MPI Startup(): process is pinned to CPU07 on node cn010

[60] MPI Startup(): process is pinned to CPU00 on node cn010

[62] MPI startup(): RDMA, shared memory, and socket data transfer modes

[61] MPI startup(): RDMA, shared memory, and socket data transfer modes

[63] MPI startup(): RDMA, shared memory, and socket data transfer modes

[64] MPI startup(): RDMA, shared memory, and socket data transfer modes

[62] MPI Startup(): process is pinned to CPU02 on node cn010

[63] MPI Startup(): process is pinned to CPU06 on node cn010

[61] MPI Startup(): process is pinned to CPU04 on node cn010

[65] MPI startup(): RDMA, shared memory, and socket data transfer modes

[64] MPI Startup(): process is pinned to CPU01 on node cn013

[67] MPI startup(): RDMA, shared memory, and socket data transfer modes

[66] MPI startup(): RDMA, shared memory, and socket data transfer modes

[65] MPI Startup(): process is pinned to CPU05 on node cn013

[67] MPI Startup(): process is pinned to CPU07 on node cn013

[68] MPI startup(): RDMA, shared memory, and socket data transfer modes

[69] MPI startup(): RDMA, shared memory, and socket data transfer modes

[66] MPI Startup(): process is pinned to CPU03 on node cn013

[68] MPI Startup(): process is pinned to CPU00 on node cn013

[70] MPI startup(): RDMA, shared memory, and socket data transfer modes

[71] MPI startup(): RDMA, shared memory, and socket data transfer modes

[71] MPI Startup(): process is pinned to CPU06 on node cn013

[72] MPI startup(): RDMA, shared memory, and socket data transfer modes

[72] MPI Startup(): process is pinned to CPU01 on node cn012

[70] MPI Startup(): process is pinned to CPU02 on node cn013

[73] MPI startup(): RDMA, shared memory, and socket data transfer modes

[69] MPI Startup(): process is pinned to CPU04 on node cn013

[74] MPI startup(): RDMA, shared memory, and socket data transfer modes

[74] MPI Startup(): process is pinned to CPU03 on node cn012

[75] MPI startup(): RDMA, shared memory, and socket data transfer modes

[73] MPI Startup(): process is pinned to CPU05 on node cn012

[75] MPI Startup(): process is pinned to CPU07 on node cn012

[77] MPI startup(): RDMA, shared memory, and socket data transfer modes

[76] MPI startup(): RDMA, shared memory, and socket data transfer modes

[76] MPI Startup(): process is pinned to CPU00 on node cn012

[77] MPI Startup(): process is pinned to CPU04 on node cn012

[78] MPI startup(): RDMA, shared memory, and socket data transfer modes

[80] MPI startup(): RDMA, shared memory, and socket data transfer modes

[79] MPI startup(): RDMA, shared memory, and socket data transfer modes

[81] MPI startup(): RDMA, shared memory, and socket data transfer modes

[79] MPI Startup(): process is pinned to CPU06 on node cn012

[80] MPI Startup(): process is pinned to CPU01 on node cn005

[81] MPI Startup(): process is pinned to CPU05 on node cn005

[82] MPI startup(): RDMA, shared memory, and socket data transfer modes

[84] MPI startup(): RDMA, shared memory, and socket data transfer modes

[78] MPI Startup(): process is pinned to CPU02 on node cn012

[83] MPI startup(): RDMA, shared memory, and socket data transfer modes

[85] MPI startup(): RDMA, shared memory, and socket data transfer modes

[84] MPI Startup(): process is pinned to CPU00 on node cn005

[82] MPI Startup(): process is pinned to CPU03 on node cn005

[86] MPI startup(): RDMA, shared memory, and socket data transfer modes

[86] MPI Startup(): process is pinned to CPU02 on node cn005

[83] MPI Startup(): process is pinned to CPU07 on node cn005

[88] MPI startup(): RDMA, shared memory, and socket data transfer modes

[85] MPI Startup(): process is pinned to CPU04 on node cn005

[87] MPI startup(): RDMA, shared memory, and socket data transfer modes

[89] MPI startup(): RDMA, shared memory, and socket data transfer modes

[88] MPI Startup(): process is pinned to CPU01 on node cn002

[89] MPI Startup(): process is pinned to CPU05 on node cn002

[87] MPI Startup(): process is pinned to CPU06 on node cn005

[91] MPI startup(): RDMA, shared memory, and socket data transfer modes

[90] MPI startup(): RDMA, shared memory, and socket data transfer modes

[90] MPI Startup(): process is pinned to CPU03 on node cn002

[91] MPI Startup(): process is pinned to CPU07 on node cn002

[92] MPI startup(): RDMA, shared memory, and socket data transfer modes

[94] MPI startup(): RDMA, shared memory, and socket data transfer modes

[92] MPI Startup(): process is pinned to CPU00 on node cn002

[93] MPI startup(): RDMA, shared memory, and socket data transfer modes

[95] MPI startup(): RDMA, shared memory, and socket data transfer modes

[97] MPI startup(): RDMA, shared memory, and socket data transfer modes

[94] MPI Startup(): process is pinned to CPU02 on node cn002

[93] MPI Startup(): process is pinned to CPU04 on node cn002

[96] MPI startup(): RDMA, shared memory, and socket data transfer modes

[96] MPI Startup(): process is pinned to CPU01 on node cn006

[95] MPI Startup(): process is pinned to CPU06 on node cn002

[99] MPI startup(): RDMA, shared memory, and socket data transfer modes

[98] MPI startup(): RDMA, shared memory, and socket data transfer modes

[98] MPI Startup(): process is pinned to CPU03 on node cn006

[100] MPI startup(): RDMA, shared memory, and socket data transfer modes

[97] MPI Startup(): process is pinned to CPU05 on node cn006

[101] MPI startup(): RDMA, shared memory, and socket data transfer modes

[99] MPI Startup(): process is pinned to CPU07 on node cn006

[104] MPI startup(): RDMA, shared memory, and socket data transfer modes

[102] MPI startup(): RDMA, shared memory, and socket data transfer modes

[103] MPI startup(): RDMA, shared memory, and socket data transfer modes

[100] MPI Startup(): process is pinned to CPU00 on node cn006

[104] MPI Startup(): process is pinned to CPU01 on node cn007

[105] MPI startup(): RDMA, shared memory, and socket data transfer modes

[101] MPI Startup(): process is pinned to CPU04 on node cn006

[103] MPI Startup(): process is pinned to CPU06 on node cn006[106] MPI startup(): RDMA, shared memory, and socket data transfer modes

[106] MPI Startup(): process is pinned to CPU03 on node cn007

[102] MPI Startup(): process is pinned to CPU02 on node cn006

[107] MPI startup(): RDMA, shared memory, and socket data transfer modes

[110] MPI startup(): RDMA, shared memory, and socket data transfer modes

[109] MPI startup(): RDMA, shared memory, and socket data transfer modes

[108] MPI startup(): RDMA, shared memory, and socket data transfer modes

[111] MPI startup(): RDMA, shared memory, and socket data transfer modes

[107] MPI Startup(): process is pinned to CPU07 on node cn007

[105] MPI Startup(): process is pinned to CPU05 on node cn007

[110] MPI Startup(): process is pinned to CPU02 on node cn007

[112] MPI startup(): RDMA, shared memory, and socket data transfer modes

[113] MPI startup(): RDMA, shared memory, and socket data transfer modes

[113] MPI Startup(): process is pinned to CPU05 on node cn009

[115] MPI startup(): RDMA, shared memory, and socket data transfer modes

[109] MPI Startup(): process is pinned to CPU04 on node cn007

[108] MPI Startup(): [114] MPI startup(): RDMA, shared memory, and socket data transfer modes

[111] MPI Startup(): process is pinned to CPU06 on node cn007

[114] MPI Startup(): process is pinned to CPU03 on node cn009

[116] MPI startup(): RDMA, shared memory, and socket data transfer modes

process is pinned to CPU00 on node cn007

[115] MPI Startup(): process is pinned to CPU07 on node cn009

[118] MPI startup(): RDMA, shared memory, and socket data transfer modes

[116] MPI Startup(): process is pinned to CPU00 on node cn009

[117] MPI startup(): RDMA, shared memory, and socket data transfer modes

[119] MPI startup(): RDMA, shared memory, and socket data transfer modes

[120] MPI startup(): RDMA, shared memory, and socket data transfer modes

[112] MPI Startup(): process is pinned to CPU01 on node cn009

[118] MPI Startup(): process is pinned to CPU02 on node cn009

[121] MPI startup(): RDMA, shared memory, and socket data transfer modes

[117] MPI Startup(): process is pinned to CPU04 on node cn009

[120] MPI Startup(): process is pinned to CPU01 on node cn008

[121] MPI Startup(): process is pinned to CPU05 on node cn008

[119] MPI Startup(): process is pinned to CPU06 on node cn009

[122] MPI startup(): RDMA, shared memory, and socket data transfer modes

[123] MPI startup(): RDMA, shared memory, and socket data transfer modes

[123] MPI Startup(): process is pinned to CPU07 on node cn008

[124] MPI startup(): RDMA, shared memory, and socket data transfer modes

[122] MPI Startup(): process is pinned to CPU03 on node cn008

[125] MPI startup(): RDMA, shared memory, and socket data transfer modes

[125] MPI Startup(): process is pinned to CPU04 on node cn008

[126] MPI startup(): RDMA, shared memory, and socket data transfer modes

[127] MPI startup(): RDMA, shared memory, and socket data transfer modes

[127] MPI Startup(): process is pinned to CPU06 on node cn008

[124] MPI Startup(): process is pinned to CPU00 on node cn008

[126] MPI Startup(): process is pinned to CPU02 on node cn008

[0] Rank Pid Node name Pin cpu

[0] 0 8482 cn001 1

[0] 1 8475 cn001 5

[0] 2 8476 cn001 3

[0] 3 8477 cn001 7

[0] 4 8478 cn001 0

[0] 5 8479 cn001 4

[0] 6 8480 cn001 2

[0] 7 8481 cn001 6

[0] 8 8551 cn003 1

[0] 9 8552 cn003 5

[0] 10 8553 cn003 3

[0] 11 8554 cn003 7

[0] 12 8555 cn003 0

[0] 13 8556 cn003 4

[0] 14 8557 cn003 2

[0] 15 8558 cn003 6

[0] 16 8278 cn016 1

[0] 17 8277 cn016 5

[0] 18 8279 cn016 3

[0] 19 8280 cn016 7

[0] 20 8281 cn016 0

[0] 21 8282 cn016 4

[0] 22 8283 cn016 2

[0] 23 8284 cn016 6

[0] 24 8260 cn015 1

[0] 25 8259 cn015 5

[0] 26 8261 cn015 3

[0] 27 8262 cn015 7

[0] 28 8263 cn015 0

[0] 29 8264 cn015 4

[0] 30 8265 cn015 2

[0] 31 8266 cn015 6

[0] 32 7283 cn014 1

[0] 33 7284 cn014 5

[0] 34 7285 cn014 3

[0] 35 7286 cn014 7

[0] 36 7287 cn014 0

[0] 37 7288 cn014 4

[0] 38 7289 cn014 2

[0] 39 7290 cn014 6

[0] 40 7366 cn004 1

[0] 41 7367 cn004 5

[0] 42 7368 cn004 3

[0] 43 7369 cn004 7

[0] 44 7370 cn004 0

[0] 45 7371 cn004 4

[0] 46 7372 cn004 2

[0] 47 7373 cn004 6

[0] 48 8322 cn011 1

[0] 49 8321 cn011 5

[0] 50 8323 cn011 3

[0] 51 8324 cn011 7

[0] 52 8325 cn011 0

[0] 53 8326 cn011 4

[0] 54 8328 cn011 2

[0] 55 8327 cn011 6

[0] 56 8257 cn010 1

[0] 57 8263 cn010 5

[0] 58 8258 cn010 3

[0] 59 8259 cn010 7

[0] 60 8260 cn010 0

[0] 61 8261 cn010 4

[0] 62 8262 cn010 2

[0] 63 8264 cn010 6

[0] 64 8256 cn013 1

[0] 65 8257 cn013 5

[0] 66 8258 cn013 3

[0] 67 8259 cn013 7

[0] 68 8260 cn013 0

[0] 69 8261 cn013 4

[0] 70 8262 cn013 2

[0] 71 8263 cn013 6

[0] 72 8251 cn012 1

[0] 73 8252 cn012 5

[0] 74 8253 cn012 3

[0] 75 8254 cn012 7

[0] 76 8255 cn012 0

[0] 77 8256 cn012 4

[0] 78 8257 cn012 2

[0] 79 8258 cn012 6

[0] 80 8258 cn005 1

[0] 81 8257 cn005 5

[0] 82 8259 cn005 3

[0] 83 8260 cn005 7

[0] 84 8261 cn005 0

[0] 85 8262 cn005 4

[0] 86 8263 cn005 2

[0] 87 8264 cn005 6

[0] 88 8267 cn002 1

[0] 89 8268 cn002 5

[0] 90 8269 cn002 3

[0] 91 8274 cn002 7

[0] 92 8270 cn002 0

[0] 93 8271 cn002 4

[0] 94 8272 cn002 2

[0] 95 8273 cn002 6

[0] 96 7335 cn006 1

[0] 97 7336 cn006 5

[0] 98 7337 cn006 3

[0] 99 7338 cn006 7

[0] 100 7339 cn006 0

[0] 101 7340 cn006 4

[0] 102 7341 cn006 2

[0] 103 7342 cn006 6

[0] 104 8254 cn007 1

[0] 105 8255 cn007 5

[0] 106 8256 cn007 3

[0] 107 8257 cn007 7

[0] 108 8258 cn007 0

[0] 109 8259 cn007 4

[0] 110 8260 cn007 2

[0] 111 8261 cn007 6

[0] 112 8256 cn009 1

[0] 113 8257 cn009 5

[0] 114 8258 cn009 3

[0] 115 8259 cn009 7

[0] 116 8260 cn009 0

[0] 117 8261 cn009 4

[0] 118 8262 cn009 2

[0] 119 8263 cn009 6

[0] 120 8257 cn008 1

[0] 121 8264 cn008 5

[0] 122 8258 cn008 3

[0] 123 8259 cn008 7

[0] 124 8260 cn008 0

[0] 125 8261 cn008 4

[0] 126 8262 cn008 2

[0] 127 8263 cn008 6

[0] Init(): I_MPI_DEBUG=5

================================================================================

HPLinpack 2.0 -- High-Performance Linpack benchmark -- September 10, 2008

Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK

Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK

Modified by Julien Langou, University of Colorado Denver

================================================================================

An explanation of the input/output parameters follows:

T/V : Wall time / encoded variant.

N : The order of the coefficient matrix A.

NB : The partitioning blocking factor.

P : The number of process rows.

Q : The number of process columns.

Time : Time in seconds to solve the linear system.

Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N : 5000 10000

NB : 168

PMAP : Row-major process mapping

P : 8

Q : 16

PFACT : Right

NBMIN : 4

NDIV : 2

RFACT : Crout

BCAST : 1ringM

DEPTH : 0

SWAP : Mix (threshold = 64)

L1 : transposed form

U : transposed form

EQUIL : yes

ALIGN : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.

- The following scaled residual check will be computed:

||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )

- The relative machine precision (eps) is taken to be 1.110223e-16

- Computational tests pass if scaled residuals are less than 16.0

Column=000168 Fraction=0.005 Mflops=88320.02

Column=000336 Fraction=0.010 Mflops=120242.45

Column=000504 Fraction=0.015 Mflops=130753.95

Column=000672 Fraction=0.020 Mflops=152165.04

Column=000840 Fraction=0.025 Mflops=159092.29

Column=001008 Fraction=0.030 Mflops=170499.45

Column=001176 Fraction=0.035 Mflops=178243.98

Column=001344 Fraction=0.040 Mflops=180653.55

Column=001512 Fraction=0.045 Mflops=186599.44

Column=001680 Fraction=0.050 Mflops=189944.18

Column=001848 Fraction=0.055 Mflops=189058.72

Column=002016 Fraction=0.060 Mflops=188567.28

Column=002184 Fraction=0.065 Mflops=186382.08

Column=002352 Fraction=0.070 Mflops=185879.00

Column=002520 Fraction=0.075 Mflops=185556.57

Column=002688 Fraction=0.080 Mflops=185180.12

Column=002856 Fraction=0.085 Mflops=185927.74

Column=003024 Fraction=0.090 Mflops=183693.76

Column=003192 Fraction=0.095 Mflops=181621.10

Column=003360 Fraction=0.100 Mflops=179312.61

Column=003528 Fraction=0.105 Mflops=176017.40

Column=003696 Fraction=0.110 Mflops=173622.55

Column=003864 Fraction=0.115 Mflops=171382.03

cn003:8558: dereg_pd Device or resource busy

rtc_invalidate error 1114112

cn010:8264: dereg_pd Device or resource busy

rtc_invalidate error 1114112

rank 63 in job 3 cn001_54224 caused collective abort of all ranks

exit status of rank 63: killed by signal 9

rank 15 in job 3 cn001_54224 caused collective abort of all ranks

exit status of rank 15: return code 254

[root@cn001 em64t]#

Zhang,

what interconnect card do you use?
If this is a Mellanox card you can use I_MPI_DEVICE environment variable. Something like this:
mpirun -env I_MPI_DEVICE rdssm:OpenIB-mlx4_0-1 -n 128 ...
or
mpirun -env I_MPI_DEVICE rdssm:ofa-v2-mlx4_0-1 -n 128 ...

If your interconnect is not Mellanox you could try to use another provider.

Regards!
Dmitry

I use Qlogic QDR card , Its chip is not Mellanox,It's Qlogic himself,please see below.
#lspci07:00.0 InfiniBand: QLogic Corp. Unknown device 7322 (rev 01)

I had used these.
mpirun -env I_MPI_DEVICE rdssm:OpenIB-mlx4_0-1 -n 128 ...
mpirun -env I_MPI_DEVICE rdssm:ofa-v2-mlx4_0-1 -n 128 ...I got a error yet!

Zhang,

mlx4 is rather for Mellanox cards.
Could you try I_MPI_DEVICE rdssm:ofa-v2-ib0?

If it doesn't help try to add '-env I_MPI_RDMA_TRANSLATION_CACHE off'.

As soon as we saw sometimes issues with Qlogic HCAs we'd recommend upgrading Intel MPI Library to version 4.0 and use tmi provider. Also the latest OFED version could help in this situation.

Regards!
Dmitry

Thank you! I will try it according to your suggestion again!

Hi,

Disabling the RDMA cache will dramatically decrease performance!!! And may not help to resolve the issue.

I suggest you use the thread safe version of the Intel MPI Library first. The "rtc_invalidate error" error message usually appears when the application produces threads while the single threaded version of the Intel MPI Library used.

I guess that you use math library which may automatically produce threads at run-time.

Best regards,
Andrey

Thank you,You said I should use math library which man automatically produce threads at run-time,But I used intel's Linpack which isl_lpk_p_10.2.4.010.tgz. Is Intel MPI Librarysingle threaded?
Do you mean that I should compile hpl with other math library? for example GotoBLAS?

You need to add '-mt_mpi' option to the compilation command line.

Regards!
Dmitry

I get a rigth result when I run linpack with -genv I_MPI_DEVICE rdma.
mpiexec -n 1024 -genv I_MPI_DEVICE rdma ./xhpl_em64Thank you! everybody!

Login to leave a comment.