pmi_proxy not found

pmi_proxy not found

Hi,

  I have installed under CentoOS 6.5 (latest release), the latest as of this writing intel mpi system and compilers l_mpi_p_4.1.3.049 and  parallel_studio_xe_2013_sp1_update3.  This is on a Dell T620 system with an 24 cores (Ivy Bridge 12 cores x 2 Cpus).  I have four of these nodes and I am not having the same sort of trouble for the other nodes as this one.  I have reproduced the trouble below, namely when I attempt to start a process using mpi hydra, it always hangs with the error "pmi_proxy: No such file or directory".  On the other hand if I use the mod system, the program (in this case the ab-initio calculation software VASP) starts up and runs without trouble.  I have reinstalled both the mpi and compiler systems and I am have no idea what is causing this problem.  Another symptom is that trying a simple diagnostic such as "mpirun -n 24 hostname" and mpiexec -n 24 hostname" produce different results.  While mpirun results in the same hang with pmi_proxy, mpiexec runs fine (reproduced below).  On the other nodes, "mpirun -n 24 hostname" prints out the hostnames as expected.

 

Any suggestions as to how to fix this would be gratefully appreciated.

 

Paul Fons

 

Output relating to the failure of hydra to run.

 

 

matstud@draco.a04.aist.go.jp:>source /opt/intel/bin/compilervars.sh intel64

matstud@draco.a04.aist.go.jp:>source /opt/intel/impi/4.1.3/intel64/bin/mpivars.sh

matstud@draco.a04.aist.go.jp:>mpdallexit

matstud@draco.a04.aist.go.jp:>mpiexec.hydra -n 24 -env I_MPI_FABRICS shm:ofa vasp

bash: /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy: No such file or directory

bash: /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy: No such file or directory

^CCtrl-C caught... cleaning up processes

[mpiexec@draco.a04.aist.go.jp] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:239): assert (!closed) failed

[mpiexec@draco.a04.aist.go.jp] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:127): unable to send SIGUSR1 downstream

[mpiexec@draco.a04.aist.go.jp] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status

[mpiexec@draco.a04.aist.go.jp] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:435): error waiting for event

[mpiexec@draco.a04.aist.go.jp] main (./ui/mpich/mpiexec.c:901): process manager error waiting for completion

matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>ls -l /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy

-rwxr-xr-x 1 root root 1001113 Mar  3 17:51 /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy

matstud@draco.a04.aist.go.jp:>mpdboot

matstud@draco.a04.aist.go.jp:>mpiexec -n 24 vasp

 running on   24 total cores

 distrk:  each k-point on   24 cores,    1 groups

 distr:  one band on    1 cores,   24 groups

 using from now: INCAR   

 vasp.5.3.5 31Mar14 (build Apr 04 2014 15:18:05) complex                      

 

 POSCAR found :  2 types and     128 ions

 scaLAPACK will be used

 

 

 

 

Output showing the different behavior of mpirun and mpiexec

 

matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>mpirun -n 24 hostname

bash: /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy: No such file or directory

bash: /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy: No such file or directory

^CCtrl-C caught... cleaning up processes

[mpiexec@draco.a04.aist.go.jp] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:239): assert (!closed) failed

[mpiexec@draco.a04.aist.go.jp] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:127): unable to send SIGUSR1 downstream

[mpiexec@draco.a04.aist.go.jp] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status

[mpiexec@draco.a04.aist.go.jp] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:435): error waiting for event

[mpiexec@draco.a04.aist.go.jp] main (./ui/mpich/mpiexec.c:901): process manager error waiting for completion

matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>mpiexec -n 24 hostname

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>which mpirun

/opt/intel/impi/4.1.3.049/intel64/bin/mpirun

matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>which mpiexec

/opt/intel/impi/4.1.3.049/intel64/bin/mpiexec

matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>

 

7 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

What is the output from

env | grep I_MPI

 

The result from "env | grep I_MPI" is the same as that from the other machines in the cluster that do not have this problem:

 

 

I_MPI_FABRICS=shm:ofa

I_MPI_ROOT=/opt/intel/impi/4.1.3.049

Try running with I_MPI_HYDRA_DEBUG=1 and attach the output.

hi i want to run mic executable from host i followed the procedure in the doc from intel site but got following errors

 

i have tried your procedure to run mic executa

 

[kiran@compute012 mpi_program]$ mpirun -f mpi_host -n 4 ./hello_mic
pmi_proxy: line 0: exec: pmi_proxy: not found
Ctrl-C caught... cleaning up processes
[mpiexec@compute012] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:239): assert (!closed) failed
[mpiexec@compute012] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:127): unable to send SIGUSR1 downstream
[mpiexec@compute012] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@compute012] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:435): error waiting for event
[mpiexec@compute012] main (./ui/mpich/mpiexec.c:901): process manager error waiting for completion
[kiran@compute012 mpi_program]$ cat mpi_host
compute012-mic0

Do you have /opt/intel available via NFS on the coprocessor?  If not, you will need to ensure that pmi_proxy (along with whichever MPI libraries you have linked) is available in the path on the coprocessor.

I found a fix!

This page didn't really help me, but I think I found a solution so I thought I'd post it.

I found that setting the I_MPI_MIC_PROXY_PATH environment variable to the directory in which the pmi_proxy command for the MIC resides (on the MIC itself) corrects this issue!

HTH,

Jim

Leave a Comment

Please sign in to add a comment. Not a member? Join today