pmi_proxy not found

pmi_proxy not found

Hi,

  I have installed under CentoOS 6.5 (latest release), the latest as of this writing intel mpi system and compilers l_mpi_p_4.1.3.049 and  parallel_studio_xe_2013_sp1_update3.  This is on a Dell T620 system with an 24 cores (Ivy Bridge 12 cores x 2 Cpus).  I have four of these nodes and I am not having the same sort of trouble for the other nodes as this one.  I have reproduced the trouble below, namely when I attempt to start a process using mpi hydra, it always hangs with the error "pmi_proxy: No such file or directory".  On the other hand if I use the mod system, the program (in this case the ab-initio calculation software VASP) starts up and runs without trouble.  I have reinstalled both the mpi and compiler systems and I am have no idea what is causing this problem.  Another symptom is that trying a simple diagnostic such as "mpirun -n 24 hostname" and mpiexec -n 24 hostname" produce different results.  While mpirun results in the same hang with pmi_proxy, mpiexec runs fine (reproduced below).  On the other nodes, "mpirun -n 24 hostname" prints out the hostnames as expected.

 

Any suggestions as to how to fix this would be gratefully appreciated.

 

Paul Fons

 

Output relating to the failure of hydra to run.

 

 

matstud@draco.a04.aist.go.jp:>source /opt/intel/bin/compilervars.sh intel64

matstud@draco.a04.aist.go.jp:>source /opt/intel/impi/4.1.3/intel64/bin/mpivars.sh

matstud@draco.a04.aist.go.jp:>mpdallexit

matstud@draco.a04.aist.go.jp:>mpiexec.hydra -n 24 -env I_MPI_FABRICS shm:ofa vasp

bash: /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy: No such file or directory

bash: /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy: No such file or directory

^CCtrl-C caught... cleaning up processes

[mpiexec@draco.a04.aist.go.jp] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:239): assert (!closed) failed

[mpiexec@draco.a04.aist.go.jp] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:127): unable to send SIGUSR1 downstream

[mpiexec@draco.a04.aist.go.jp] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status

[mpiexec@draco.a04.aist.go.jp] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:435): error waiting for event

[mpiexec@draco.a04.aist.go.jp] main (./ui/mpich/mpiexec.c:901): process manager error waiting for completion

matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>ls -l /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy

-rwxr-xr-x 1 root root 1001113 Mar  3 17:51 /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy

matstud@draco.a04.aist.go.jp:>mpdboot

matstud@draco.a04.aist.go.jp:>mpiexec -n 24 vasp

 running on   24 total cores

 distrk:  each k-point on   24 cores,    1 groups

 distr:  one band on    1 cores,   24 groups

 using from now: INCAR   

 vasp.5.3.5 31Mar14 (build Apr 04 2014 15:18:05) complex                      

 

 POSCAR found :  2 types and     128 ions

 scaLAPACK will be used

 

 

 

 

Output showing the different behavior of mpirun and mpiexec

 

matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>mpirun -n 24 hostname

bash: /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy: No such file or directory

bash: /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy: No such file or directory

^CCtrl-C caught... cleaning up processes

[mpiexec@draco.a04.aist.go.jp] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:239): assert (!closed) failed

[mpiexec@draco.a04.aist.go.jp] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:127): unable to send SIGUSR1 downstream

[mpiexec@draco.a04.aist.go.jp] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status

[mpiexec@draco.a04.aist.go.jp] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:435): error waiting for event

[mpiexec@draco.a04.aist.go.jp] main (./ui/mpich/mpiexec.c:901): process manager error waiting for completion

matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>mpiexec -n 24 hostname

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>which mpirun

/opt/intel/impi/4.1.3.049/intel64/bin/mpirun

matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>which mpiexec

/opt/intel/impi/4.1.3.049/intel64/bin/mpiexec

matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>

 

8 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

What is the output from

env | grep I_MPI

 

The result from "env | grep I_MPI" is the same as that from the other machines in the cluster that do not have this problem:

 

 

I_MPI_FABRICS=shm:ofa

I_MPI_ROOT=/opt/intel/impi/4.1.3.049

Try running with I_MPI_HYDRA_DEBUG=1 and attach the output.

hi i want to run mic executable from host i followed the procedure in the doc from intel site but got following errors

 

i have tried your procedure to run mic executa

 

[kiran@compute012 mpi_program]$ mpirun -f mpi_host -n 4 ./hello_mic
pmi_proxy: line 0: exec: pmi_proxy: not found
Ctrl-C caught... cleaning up processes
[mpiexec@compute012] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:239): assert (!closed) failed
[mpiexec@compute012] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:127): unable to send SIGUSR1 downstream
[mpiexec@compute012] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@compute012] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:435): error waiting for event
[mpiexec@compute012] main (./ui/mpich/mpiexec.c:901): process manager error waiting for completion
[kiran@compute012 mpi_program]$ cat mpi_host
compute012-mic0

Do you have /opt/intel available via NFS on the coprocessor?  If not, you will need to ensure that pmi_proxy (along with whichever MPI libraries you have linked) is available in the path on the coprocessor.

I found a fix!

This page didn't really help me, but I think I found a solution so I thought I'd post it.

I found that setting the I_MPI_MIC_PROXY_PATH environment variable to the directory in which the pmi_proxy command for the MIC resides (on the MIC itself) corrects this issue!

HTH,

Jim

It seems to me, that there is an internal storage limitation in the implementation of the MPI-3 shared memory feature in INTEL-MPI-5.0 . I cannot use INTEL-MPI in my real CFD-code with that limitation, because in case of very large grids the total storage allocated simultaneously by the shared windows can exceed 10 GB.

Greetings to you all

mua ban rao vat can tho, lắp đặt phòng net, lap dat phong net tron goi, pin sac du phong

Leave a Comment

Please sign in to add a comment. Not a member? Join today