wrong job dispatching on cpu usinig IntelMPI2.0

wrong job dispatching on cpu usinig IntelMPI2.0

When I run a job with IntelMPI2.0 using a file referencing two machines with 4 cpu each. I can see that on the first machine only two jobs are running and on the second machine this is 6 jobs instead of 4, one the first machine and 4 on the other one. Can you explain me a reason for this behaviour ?

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi martialp,

If I understand correctly, you're simply trying to run 2 jobs, 4 MPI processes each, on 2 different machines - is that true?

Could you provide us with how you run your application (mpdboot/mpiexec command line, or mpirun, if you use that), as well as any mpd.hosts files, machine files, or config files, you might use. At this point, we need a bit more information to make a suggestion.

Thanks,
~Gergana

Gergana Slavova
Technical Consulting Engineer
Intel® Cluster Tools
E-mail: gergana.s.slavova_at_intel.com

Quoting - martialp
When I run a job with IntelMPI2.0 using a file referencing two machines with 4 cpu each. I can see that on the first machine only two jobs are running and on the second machine this is 6 jobs instead of 4, one the first machine and 4 on the other one. Can you explain me a reason for this behaviour ?

What I try to do is to run an application using 8 cpus (4 on one machine and 4 on the other one). The command line is the following: mpirun -f host.list -np8 /easd/apps/vendor_appl/devl/Interwell5.3/bin/Linux/csh_presti_exe
The host.list file contains 2 lines:
lnx_137_1e051:4
lnx_137_1e033:4

Quoting - martialp
What I try to do is to run an application using 8 cpus (4 on one machine and 4 on the other one). The command line is the following: mpirun -f host.list -np8 /easd/apps/vendor_appl/devl/Interwell5.3/bin/Linux/csh_presti_exe
The host.list file contains 2 lines:
lnx_137_1e051:4
lnx_137_1e033:4

Hi martialp,

The -f option in this case would only read the names of the hosts you want to use. What you can do here is use the -perhost option, which helps you indicate how many processes the Intel MPI Library should put on each node. Your command line will look like this:

$ mpirun -f host.list -perhost 4 -np 8 /easd/apps/vendor_appl/devl/Interwell5.3/bin/Linux/csh_presti_exe

Let us know how this goes.

Regards,
~Gergana

Gergana Slavova
Technical Consulting Engineer
Intel® Cluster Tools
E-mail: gergana.s.slavova_at_intel.com

Quoting - Gergana Slavova (Intel)

Hi martialp,

The -f option in this case would only read the names of the hosts you want to use. What you can do here is use the -perhost option, which helps you indicate how many processes the Intel MPI Library should put on each node. Your command line will look like this:

$ mpirun -f host.list -perhost 4 -np 8 /easd/apps/vendor_appl/devl/Interwell5.3/bin/Linux/csh_presti_exe

Let us know how this goes.

Regards,
~Gergana

Hello Gergana

My client has used this new paramater with success. Thank you very much for this suggestion. We just have to find now why the jobs are not equally dispatch on the cpu of the first node of the cluster (some people suggest me thatit can be related to some cluster configuration file for the thread management)
Once again thanks a lot.

Martial

Leave a Comment

Please sign in to add a comment. Not a member? Join today