[solved] random problems with MPI + DAPL initialization in RedHat 5.4

[solved] random problems with MPI + DAPL initialization in RedHat 5.4

Hi I havesometimes problems with execution of a program with Intel MPI
It happens with an error on stderr (or stdout):
problem with execution of   on  wn20:  [Errno 13] Permission denied

What could be a problem?

here is my ulimit -a:

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 135167
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 135167
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

I've checked logs on this node (wn20):

Mar 8 16:11:52 wn20 mpd: mpd starting; no mpdid yet
Mar 8 16:11:52 wn20 mpd: mpd has mpdid=wn20_45723 (port=45723)
Mar 8 16:11:53 wn20 mpd: wn20_45723 (run 1485): Warning: the directory pointed by TMPDIR (/tmp/pbs.2045.mgmt1) does not exist! /tmp will be used.
Mar 8 16:11:53 wn20 mpd: wn20_45723 (__init__ 1045): Warning: the directory pointed by TMPDIR (/tmp/pbs.2045.mgmt1) does not exist! /tmp will be used.
Mar 8 16:11:53 wn20 sshd[11867]: pam_unix(sshd:session): session closed for user routnwp
Mar 8 16:11:57 wn20 mpdman: mpdman starting new log; wn20_mpdman_120
Mar 8 16:11:57 wn20 mpdman: mpdman starting new log; wn20_mpdman_121
Mar 8 16:11:57 wn20 mpdman: mpdman starting new log; wn20_mpdman_122
Mar 8 16:11:57 wn20 mpdman: mpdman starting new log; wn20_mpdman_123
Mar 8 16:11:57 wn20 mpdman: mpdman starting new log; wn20_mpdman_124
Mar 8 16:11:57 wn20 mpdman: mpdman starting new log; wn20_mpdman_125
Mar 8 16:11:57 wn20 mpdman: mpdman starting new log; wn20_mpdman_126
Mar 8 16:11:57 wn20 mpdman: mpdman starting new log; wn20_mpdman_127
Mar 8 16:12:07 wn20 mpd: mpd ending mpdid=wn20_45723 (inside cleanup)

Mar 8 16:11:52 wn20 mpd: mpd starting; no mpdid yet
Mar 8 16:11:52 wn20 mpd: mpd has mpdid=wn20_45723 (port=45723)
Mar 8 16:11:53 wn20 mpd: wn20_45723 (run 1485): Warning: the directory pointed by TMPDIR (/tmp/pbs.2045.mgmt1) does not exist! /tmp will be used.
Mar 8 16:11:53 wn20 mpd: wn20_45723 (__init__ 1045): Warning: the directory pointed by TMPDIR (/tmp/pbs.2045.mgmt1) does not exist! /tmp will be used.
Mar 8 16:11:53 wn20 sshd[11867]: pam_unix(sshd:session): session closed for user routnwpMar 8 16:11:57 wn20 mpdman: mpdman starting new log; wn20_mpdman_120
Mar 8 16:11:57 wn20 mpdman: mpdman starting new log; wn20_mpdman_121
Mar 8 16:11:57 wn20 mpdman: mpdman starting new log; wn20_mpdman_122
Mar 8 16:11:57 wn20 mpdman: mpdman starting new log; wn20_mpdman_123
Mar 8 16:11:57 wn20 mpdman: mpdman starting new log; wn20_mpdman_124
Mar 8 16:11:57 wn20 mpdman: mpdman starting new log; wn20_mpdman_125
Mar 8 16:11:57 wn20 mpdman: mpdman starting new log; wn20_mpdman_126
Mar 8 16:11:57 wn20 mpdman: mpdman starting new log; wn20_mpdman_127
Mar 8 16:12:07 wn20 mpd: mpd ending mpdid=wn20_45723 (inside cleanup)

15 帖子 / 0 全新
最新文章
如需更全面地了解编译器优化,请参阅优化注意事项

Hi,

Could you provide command line and output in verbose mode if possible.

Regards!

Dmitry

>Presumably meaning with environment variable I_MPI_DEBUG=9

It seems to me that the issue is related to mpdboot (or mpirun) so this is '--verbose' option for this command.

Regards!

Dmitry

Hi Dmitry,
thanks for tip. Unfortunately I cannot reproduce the problem.

I've raised I_MPI_DEBUG to 5 as said in documentation, will 9 give more verbosity?

The problem is we've got now (with I_MPI_DEBUG=5):

[56] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf
[cli_56]: got unexpected response to get :cmd=get kvsname=kvs_wn3_49596_0_0 key=DAPL_MISMATCH
:
[cli_56]: got unexpected response to put :cmd=put kvsname=kvs_wn3_49596_0_0 key=P56-businesscard value=rdma_port#21114$rdma_host#2:0:0:192:168:20:10:0:0:0:0:0:0:0:0$
:
[cli_56]: aborting job:
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(283)...: Initialization failed
MPIDD_Init(98)..........: channel initialization failed
MPIDI_CH3_Init(261).....:
MPIDI_CH3U_Init_rdma(64): PMI_KVS_Put returned -1

[56] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf[cli_56]: got unexpected response to get :cmd=get kvsname=kvs_wn3_49596_0_0 key=DAPL_MISMATCH:[cli_56]: got unexpected response to put :cmd=put kvsname=kvs_wn3_49596_0_0 key=P56-businesscard value=rdma_port#21114$rdma_host#2:0:0:192:168:20:10:0:0:0:0:0:0:0:0$:[cli_56]: aborting job:Fatal error in MPI_Init: Other MPI error, error stack:MPIR_Init_thread(283)...: Initialization failedMPIDD_Init(98)..........: channel initialization failedMPIDI_CH3_Init(261).....:MPIDI_CH3U_Init_rdma(64): PMI_KVS_Put returned -1

what could be a possible problem?

here is also the output of env from one of mpi processess (I'm running bash script in mpirun to debug it more closely at MPI process level)

I_MPI_INFO_LCPU=16
I_MPI_INFO_SIGN=67237
VT_MPI=impi3
I_MPI_INFO_PACK=1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0
I_MPI_PIN_MAP=56 1,57 5,58 3,59 7,60 0,61 4,62 2,63 6
I_MPI_PIN_INFO=6
I_MPI_INFO_CACHE_SHARE=2,2,16
I_MPI_PIN_UNIT=6
I_MPI_INFO_THREAD=0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1
I_MPI_INFO_CACHES=3
I_MPI_INFO_CORE=0,0,2,2,1,1,3,3,0,0,2,2,1,1,3,3
I_MPI_DEVICE=rdma
I_MPI_RDMA_EAGER_THRESHOLD=25972
I_MPI_INFO_CACHE_SIZE=32768,262144,8388608
I_MPI_DEBUG=5
I_MPI_INFO_CACHE1=8,0,10,2,9,1,11,3,8,0,10,2,9,1,11,3
I_MPI_PIN_MAP_SIZE=8
I_MPI_INFO_CACHE2=8,0,10,2,9,1,11,3,8,0,10,2,9,1,11,3
I_MPI_INFO_CACHE3=1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0
I_MPI_PERHOST=allcores
MPICH_INTERFACE_HOSTNAME=192.168.0.10
I_MPI_ROOT=/opt/intel/impi/3.2.1.009

One more thing, we run it through batch scheduler. After running task I saw that mpd process exist - could it be somewhat connected with the problem:

python /opt/intel/impi/3.2.1.009/bin64/mpd.py -h wn9 -p 34585 --ifhn=192.168.0.13 --ncpus=1 --myhost=wn13 --myip=192.168.0.13 -e -d -s 5

This problem is most likely related to configuration of OFED or IP addresses for IPoIB.

Again, I don't see your command line - it might be useful in some cases.
What is your DAPL version (run 'ofed_info' command)?
Could you provide /etc/dat.conf?
What interconnect cards do you use?

The higher number for I_MPI_DEBUG you set the more information you get.

Please try to run you application with I_MPI_DEVICE set to 'sock'.

Regards!
Dmitry

>This problem is most likely related to configuration of OFED or IP addresses for IPoIB.

I'll check that, thanks. The problem is that it happens randomly only in particular jobs and the configuration is static...

My command line is
mpirun -r ssh -env I_MPI_DEBUG 5 -env I_MPI_DEVICE rdssm -np 196 /full/path/bin/cm_w_00.0.0.2.sh

where cm_w_00.0.0.2.sh contains
/full/path/bin/cm > $logbin 2>&1
and other few commands redirected logfiles (like $logbin) with names unique to mpiprocess to gather debugging data like output of ps, ulimit etc. but there is nothing interesting in those logs

I'm using stock RHEL5.4 OFED

dapl is dapl-2.0.19-2.el5 from repos, I do not have ofed_info command
/etc/dat.conf is/etc/ofed/dat.conf in RHEL:

ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0" ""
ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib1 0" ""
ofa-v2-mthca0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 1" ""
ofa-v2-mthca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 2" ""
ofa-v2-mlx4_0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 1" ""
ofa-v2-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 2" ""
ofa-v2-ipath0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ipath0 1" ""
ofa-v2-ipath0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ipath0 2" ""
ofa-v2-ehca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ehca0 1" ""
ofa-v2-iwarp u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth2 0" ""

cat /etc/ofed/dat.confofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0" ""ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib1 0" ""ofa-v2-mthca0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 1" ""ofa-v2-mthca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 2" ""ofa-v2-mlx4_0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 1" ""ofa-v2-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 2" ""ofa-v2-ipath0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ipath0 1" ""ofa-v2-ipath0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ipath0 2" ""ofa-v2-ehca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ehca0 1" ""ofa-v2-iwarp u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth2 0" ""

I think about trying sock device but this will rather avoid problem which happens randomly - do you think that's a good idea?

I'm using Mellanox ConnectX:

ibv_devinfo
hca_id: mlx4_0
fw_ver: 2.6.100
node_guid: 0023:7dff:ff94:4518
sys_image_guid: 0023:7dff:ff94:451b
vendor_id: 0x02c9
vendor_part_id: 26428
hw_ver: 0xA0
board_id: HP_0120000009
phys_port_cnt: 2
port: 1
state: active (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 6
port_lid: 4
port_lmc: 0x00

port: 2
state: active (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 6
port_lid: 5
port_lmc: 0x00

Your first output refers to OpenIB-cma provider but there is no such provider in the dat.conf you sent me.
So you probably need to use DAT_OVERRIDE=/etc/ofed/dat.conf variable to point to the correct dat.conf file.
Could you also change DEVICE env variable to:
-env I_MPI_DEVICE rdssm:ofa-v2-mlx4_0-1
In this case mlx4_0 will be used explicitly

>The problem is that it happens randomly only in particular jobs
This is very strange. Might be something wrong with cluster configuration or unstable work of some nodes.
Could you also add:
-env I_MPI_FALLBACK_DEVICE off
to your command line.

Let me know the result.

Regards!
Dmitry

I've tried your suggestion, it gave me:

[0] DAPL provider is not found and fallback device is not enabled
[cli_0]: aborting job:
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(283): Initialization failed
MPIDD_Init(98).......: channel initialization failed
MPIDI_CH3_Init(163)..: generic failure with errno = -1
(unknown)():
[0] MPI startup(): Intel MPI Library, Version 3.2.1 Build 20090312
[0] MPI startup(): Copyright (C) 2003-2009 Intel Corporation. All rights reserved.
rank 0 in job 1 wn1_33304 caused collective abort of all ranks
exit status of rank 0: return code 13

I've checked what is dapl library default dat.conf:

wn3 ~]$ dapltest
Dapltest: Service Point Ready - ofa-v2-ib0
wn3 ~]$ dapltestDapltest: Service Point Ready - ofa-v2-ib0

I've tried to use the same name with Intel mpirun with the same result -
[0] DAPL provider is not found and fallback device is not enabled

The weird thing is when running with just I_MPI_DEVICE=rdssm:
[0] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file /etc/dat.conf
[0] MPI startup(): RDMA, shared memory, and socket data transfer modes
[0] MPI startup(): Intel MPI Library, Version 3.2.1 Build 20090312
[0] MPI startup(): Copyright (C) 2003-2009 Intel Corporation. All rights reserved.

so it's trying to use OpenIB-cma, but it's not definied anywhere, weird thing is - it's working but not always...

So - Intel MPI is not using dat.conf which dat itself is using?

I'll also try to link dat.conf from /etc/ofed to /etc

> I'll also try to link dat.conf from /etc/ofed to /etc

nope, linking won't work :/

I realized that IntelMPI in RHEL 5.4 is not using dapl - it's using compat-dapl which has different dat.conf (don't ask me why):

# cat /etc/ofed/compat-dapl/dat.conf
OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib0 0" ""
OpenIB-cma-1 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib1 0" ""
OpenIB-mthca0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mthca0 1" ""
OpenIB-mthca0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mthca0 2" ""
OpenIB-mlx4_0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mlx4_0 1" ""
OpenIB-mlx4_0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mlx4_0 2" ""
OpenIB-ipath0-1 u1.2 nonthreadsafe default libdaplscm.so.2 dapl.1.2 "ipath0 1" ""
OpenIB-ipath0-2 u1.2 nonthreadsafe default libdaplscm.so.2 dapl.1.2 "ipath0 2" ""
OpenIB-ehca0-2 u1.2 nonthreadsafe default libdaplscm.so.2 dapl.1.2 "ehca0 1" ""
OpenIB-iwarp u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "eth2 0" ""

I've usedOpenIB-mlx4_0-1 in I_MPI_DEVICE and it runs ok for now - I'm waiting if this error will appear again.

Do you think this is what you wanted me to do?

Perhaps the following link may be helpful to understand about DAPL providers.

I would recommend not having invalid DAPL entries onyour dat.conf file, you may want to only offer to your cluster users those which are fully functional.

http://software.intel.com/en-us/articles/intel-mpi-library-for-linux-experience-with-various-interconnects-and-dapl-providers/

Hi, thanks. I've already read that.
The problem was the entries were not completely bad - I was using just wrong names (from other dat.conf) but what you wanted is to use fixed name for DAPL provider, right?

Could that help in solving this issue?:

[94] MPI startup(): DAPL provider OpenIB-cma specified in DAPL configuration file
[cli_94]: got unexpected response to get :cmd=get kvsname=kvs_wn3_49596_0_0 key=DAPL_MISMATCH
:
[cli_94]: got unexpected response to put :cmd=put kvsname=kvs_wn3_49596_0_0 key=P94-businesscard
value=rdma_port#18839$rdma_host#2:0:0:192:168:20:14:0:0:0:0:0:0:0:0$
[cli_94]: got unexpected response to get :cmd=get kvsname=kvs_wn3_49596_0_0 key=DAPL_MISMATCH:[cli_94]: got unexpected response to put :cmd=put kvsname=kvs_wn3_49596_0_0 key=P94-businesscardvalue=rdma_port#18839$rdma_host#2:0:0:192:168:20:14:0:0:0:0:0:0:0:0$

It's an Intel MPI message, could you explain to me what does it mean? I cannot find any docs about it.
It looks like DAPL provider has been chosen (it was the same when it was running fine).

after changing I_MPI_DEVICE toOpenIB-mlx4_0-1, I've got

from wn3 from RANK0 process:

[0] MPI startup(): DAPL provider OpenIB-mlx4_0-1
[cli_0]: got unexpected response to get :cmd=get kvsname=kvs_wn3_37604_0_0 key=DAPL_MISMATCH
:
[cli_0]: got unexpected response to put :cmd=put kvsname=kvs_wn3_37604_0_0 key=shm_name value=2D1921C52957AD9B5645EBCD4BA371D0
:
[0] MPI startup(): Intel MPI Library, Version 3.2.1 Build 20090312
[0] MPI startup(): Copyright (C) 2003-2009 Intel Corporation. All rights reserved.
[cli_0]: aborting job:
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(283)....: Initialization failed
MPIDD_Init(98)...........: channel initialization failed
MPIDI_CH3_Init(319)......:
MPIDI_CH3U_Init_sshm(239): PMI_KVS_Put returned -1
(unknown)():

from other processes:

[14] MPI startup(): DAPL provider OpenIB-mlx4_0-1
[cli_14]: got unexpected response to get :cmd=get kvsname=kvs_wn3_37604_0_0 key=DAPL_MISMATCH
:
[cli_14]: PMIU_parse_keyvals: unexpected key delimiter at character 1 in !
[cli_14]: expecting cmd=barrier_out, got !
[cli_14]: aborting job:
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(283)....: Initialization failed
MPIDD_Init(98)...........: channel initialization failed
MPIDI_CH3_Init(319)......:
MPIDI_CH3U_Init_sshm(257): PMI_Barrier returned -1
(unknown)():

I've found a solution to my problem.
I believe it was the same problem as described here:http://software.intel.com/en-us/articles/random-fabric-errors-on-rhel5U4/(workaround I_MPI_RDMA_CREATE_CONN_QUAL = 0seemed to work too)

After upgrading to OFED 1.5 with new DAPL the problem was finally solved.

DAPL version from RedHat 5.4 seems buggy.BTW: If anyone knows why RedHat decided to have two separate dat.conf files for each dapl version (1 and 2) please give me a note.I havesuccessfullyused new UCM interface (v2) with ConnectX (ofa-v2-mlx4_0-1u in dat.conf) which seems to be much much faster with many-core jobs than the old CMA provider.I believe that when sticking to RH provided OFED it's good to have one common dat.conf (DAT_OVERRIDE) with providers from DAPL1 and DAPL2.

Rafal, thanks for sharing this information.

登陆并发表评论。