mpiexec.hydra -ppn 1 and intel-mpi 4.1.2.040

mpiexec.hydra -ppn 1 and intel-mpi 4.1.2.040

I have just installed intel-mpi 4.1.2.040 onf a cluster...

If I used mpiexec.hydra to start jobs one per node... it still spawns processes on all available resources...

mpiexec.hydra -ppn 1 hostname

on two nodes will show me 40 lines as oppose to only two expected.

I have added a file with debug info when running

I_MPI_HYDRA_DEBUG=1 mpiexec.hydra -ppn 1 hostname 2>&1 | tee debug.txt

 

regards,

Alin

AdjuntoTamaño
Descargar debug.txt29.22 KB
Without Questions there are no Answers!
publicaciones de 9 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

Forgot to say! Any help in solving the issue or better understanding it much appreciated.

 

regards,

Alin

Without Questions there are no Answers!

Hi Alin,

Using -ppn will not limit the total number of ranks on a host, simply the number of consecutive ranks on each host.  If you have too many ranks, the placement will cycle back to the first host and begin again.  So if I have a hostfile with two hosts (node0 and node1), here's what I should see:

$mpirun -n 4 -ppn 2 ./hello
Hello world: rank 0 of 4 running on node0
Hello world: rank 1 of 4 running on node0
Hello world: rank 2 of 4 running on node1
Hello world: rank 3 of 4 running on node1
$mpirun -n 4 -ppn 1 ./hello
Hello world: rank 0 of 4 running on node0
Hello world: rank 1 of 4 running on node1
Hello world: rank 2 of 4 running on node0
Hello world: rank 3 of 4 running on node1

In your command line, you didn't specify the number of ranks to run.  If you don't specify that number, it will be determined from your job (or if that can't be found, then the number of cores available on the host).  In this case, your job says to use 40 ranks, so 40 ranks were launched.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Hi James,

Thank you for you answer. I cannot still reproduce your results with the above approach...
mpiexec.hydra is the one provided by the current version of the intel mpi library...
mpiexec.hydra.good is from 4.0 as you can see one offers the right output the other not.
Also the presence of -n in the past was not mandatory but maybe I missed something in the manual. I have also attached the nodes file.
The same happens when using mpirun.

[alin@service56:~]: mpiexec.hydra -n 4 -ppn 1 ./hello.X
I am process 0 out of 4 running on service56 with MPI version 2.2
I am process 1 out of 4 running on service56 with MPI version 2.2
I am process 3 out of 4 running on service56 with MPI version 2.2
I am process 2 out of 4 running on service56 with MPI version 2.2
[alin@service56:~]: mpiexec.hydra -n 4 -ppn 2 ./hello.X
I am process 1 out of 4 running on service56 with MPI version 2.2
I am process 0 out of 4 running on service56 with MPI version 2.2
I am process 3 out of 4 running on service56 with MPI version 2.2
I am process 2 out of 4 running on service56 with MPI version 2.2
[alin@service56:~]: mpirun -n 4 -ppn 1 ./hello.X
I am process 1 out of 4 running on service56 with MPI version 2.2
I am process 0 out of 4 running on service56 with MPI version 2.2
I am process 2 out of 4 running on service56 with MPI version 2.2
I am process 3 out of 4 running on service56 with MPI version 2.2
[alin@service56:~]: mpirun -n 4 -ppn 2 ./hello.X
I am process 1 out of 4 running on service56 with MPI version 2.2
I am process 0 out of 4 running on service56 with MPI version 2.2
I am process 3 out of 4 running on service56 with MPI version 2.2
I am process 2 out of 4 running on service56 with MPI version 2.2

[alin@service56:~]: mpiexec.hydra.good -n 4 -ppn 2 ./hello.X
I am process 0 out of 4 running on service56 with MPI version 2.2
I am process 1 out of 4 running on service56 with MPI version 2.2
I am process 2 out of 4 running on service54 with MPI version 2.2
I am process 3 out of 4 running on service54 with MPI version 2.2
[alin@service56:~]: mpiexec.hydra.good -n 4 -ppn 1 ./hello.X
I am process 0 out of 4 running on service56 with MPI version 2.2
I am process 2 out of 4 running on service56 with MPI version 2.2
I am process 3 out of 4 running on service54 with MPI version 2.2
I am process 1 out of 4 running on service54 with MPI version 2.2
[alin@service56:~]: mpiexec.hydra.good -ppn 1 ./hello.X
I am process 0 out of 2 running on service56 with MPI version 2.2
I am process 1 out of 2 running on service54 with MPI version 2.2
[alin@service56:~]: mpiexec.hydra.good -ppn 2 ./hello.X
I am process 0 out of 4 running on service56 with MPI version 2.2
I am process 2 out of 4 running on service54 with MPI version 2.2
I am process 3 out of 4 running on service54 with MPI version 2.2
I am process 1 out of 4 running on service56 with MPI version 2.2

[alin@service56:~]: cat $PBS_NODEFILE > nodes.txt

I looked more into
I_MPI_HYDRA_DEBUG=1 mpiexec.hydra.good -n 4 -ppn 2 ./hello.X > good
I_MPI_HYDRA_DEBUG=1 mpiexec.hydra -n 4 -ppn 2 ./hello.X > bad

attached them both.

looking into them I find these differences that may help to understand the issue
[alin@abaddon:~]: grep -A 3 "Proxy information" bad
Proxy information:
*********************
[1] proxy: service56 (20 cores)
Exec list: ./hello.X (4 processes);
[alin@abaddon:~]: grep -A 6 "Proxy information" good
Proxy information:
*********************
[1] proxy: service56 (2 cores)
Exec list: ./hello.X (2 processes);

[2] proxy: service54 (2 cores)
Exec list: ./hello.X (2 processes);

more the arguments passed to the proxy are different...

[alin@abaddon:~]: grep -A 2 "Arguments being" good
Arguments being passed to proxy 0:
--version 1.4.1p1 --iface-ip-env-name MPICH_INTERFACE_HOSTNAME --hostname service56 --global-core-map 0,2,2 --filler-process-map 0,2,2 --global-process-count 4 --auto-cleanup 1 --pmi-rank -1 --pmi-kvsname kvs_38696_0 --pmi-process-mapping (vector,(0,2,2)) --topolib ipl --ckpointlib blcr --ckpoint-prefix /tmp --ckpoint-preserve -1 --ckpoint off --ckpoint-num -1 --global-inherited-env 117 'I_MPI_PERHOST=allcores' 'I_MPI_ROOT=/ichec/home/packages/intel-cluster-studio/2013-sp1-u1/impi/4.1.2.040' 'COLORTERM=1' 'PBS_O_PATH=/ichec/home/staff/alin/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/opt/c3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/mam/bin:/opt/moab/bin:/opt/moab/sbin:.:/opt/sgi/sbin:/opt/sgi/bin' 'module=() { eval `/usr/bin/modulecmd bash $*`
}' '_=/ichec/home/packages/intel-cluster-studio/2013-sp1-u1/impi/4.1.2.040/intel64/bin/mpiexec.hydra.good' --global-user-env 0 --global-system-env 2 'MPICH_ENABLE_CKPOINT=1' 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 2 --exec --exec-appnum 0 --exec-proc-count 2 --exec-local-env 0 --exec-wdir /ichec/home/staff/alin --exec-args 1 ./hello.X
--
Arguments being passed to proxy 1:
--version 1.4.1p1 --iface-ip-env-name MPICH_INTERFACE_HOSTNAME --hostname service54 --global-core-map 2,2,0 --filler-process-map 2,2,0 --global-process-count 4 --auto-cleanup 1 --pmi-rank -1 --pmi-kvsname kvs_38696_0 --pmi-process-mapping (vector,(0,2,2)) --topolib ipl --ckpointlib blcr --ckpoint-prefix /tmp --ckpoint-preserve -1 --ckpoint off --ckpoint-num -1 --global-inherited-env 117 'I_MPI_PERHOST=allcores' 'I_MPI_ROOT=/ichec/home/packages/intel-cluster-studio/2013-sp1-u1/impi/4.1.2.040' 'COLORTERM=1' 'PBS_O_PATH=/ichec/home/staff/alin/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/opt/c3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/mam/bin:/opt/moab/bin:/opt/moab/sbin:.:/opt/sgi/sbin:/opt/sgi/bin' 'module=() { eval `/usr/bin/modulecmd bash $*`
}' '_=/ichec/home/packages/intel-cluster-studio/2013-sp1-u1/impi/4.1.2.040/intel64/bin/mpiexec.hydra.good' --global-user-env 0 --global-system-env 2 'MPICH_ENABLE_CKPOINT=1' 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 2 --exec --exec-appnum 0 --exec-proc-count 2 --exec-local-env 0 --exec-wdir /ichec/home/staff/alin --exec-args 1 ./hello.X
[alin@abaddon:~]:
[alin@abaddon:~]:
[alin@abaddon:~]: grep -A 2 "Arguments being" bad
Arguments being passed to proxy 0:
--version 1.4.1p1 --iface-ip-env-name MPICH_INTERFACE_HOSTNAME --hostname service56 --global-core-map 0,20,0 --filler-process-map 0,20,0 --global-process-count 4 --auto-cleanup 1 --pmi-rank -1 --pmi-kvsname kvs_38714_0 --pmi-process-mapping (vector,(0,2,20)) --topolib ipl --ckpointlib blcr --ckpoint-prefix /tmp --ckpoint-preserve 1 --ckpoint off --ckpoint-num -1 --global-inherited-env 117 'I_MPI_PERHOST=allcores' 'I_MPI_ROOT=/ichec/home/packages/intel-cluster-studio/2013-sp1-u1/impi/4.1.2.040' 'COLORTERM=1' 'PBS_O_PATH=/ichec/home/staff/alin/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/opt/c3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/mam/bin:/opt/moab/bin:/opt/moab/sbin:.:/opt/sgi/sbin:/opt/sgi/bin' 'module=() { eval `/usr/bin/modulecmd bash $*`
}' '_=/ichec/home/packages/intel-cluster-studio/2013-sp1-u1/impi/4.1.2.040/intel64/bin/mpiexec.hydra' --global-user-env 0 --global-system-env 2 'MPICH_ENABLE_CKPOINT=1' 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 20 --exec --exec-appnum 0 --exec-proc-count 4 --exec-local-env 0 --exec-wdir /ichec/home/staff/alin --exec-args 1 ./hello.X

If I collapse my hostfile into uniq hosts and use the -f I get the correct behaviour with or without -n.
Did the behaviour between versions of intel-mpi change or this is bug?

regards,
Alin

regards,
Alin

Adjuntos: 

AdjuntoTamaño
Descargar nodes.txt400 bytes
Descargar bad.txt27.64 KB
Descargar good.txt41.57 KB
Without Questions there are no Answers!

Hi Alin,

What is the full version number for the working one?

James.

Hi James,

Alin not being available right now, I'll answer the question.

The Intel MPI version the working mpiexec.hydra comes from is 4.1.0.024.
More precisely, it says: Intel(R) MPI Library for Linux* OS, Version 4.1.0 Build 20120831

Cheers.

Gilles

Hi,

Any chance to see an update on this issue? From the outside it looks so trivially a regression in the mpiexec.hydra, yet it is so annoying from a user's point of view... Do I miss some critical element here?

Although using an old version of it allows to run, it might have some unexpected side effects we don't see. Moreover, since we plan using intensively symmetric MPI mode on Xeon phi, being in a clean and up-to-date Intel MPI environment would be a highly desirable.

Cheers.

Gilles

Hi Gilles,

I currently do not have any additional information about this issue.  Several other customers are reporting it.  I can suggest using a machinefile as a workaround, or specifying a different hostfile, rather than allowing Hydra to automatically get the hosts from your job scheduler.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

I'm working in a cluster and learning how to send different process. Today I tried to use a script with the command to execute the program. Suddenly, when I use the command top appears:

28210 jazmin    25   0 13088  928  712 R 100.2  0.0 383:56.27 mpiexec.hydra 

and I cannot kill this process, how can I do it? thanks in advance

Deje un comentario

Por favor inicie sesión para agregar un comentario. ¿No es socio? Únase ya