Intel mpirun allocated different parallel jobs in the same cpus

Intel mpirun allocated different parallel jobs in the same cpus

Recently, I just rceived a SGI cluster with the full package from intel, e.g., compiler, MKL, and MPI. It was set up by SGI.

I found out that "mpirun" is allocating different parallel jobs in the same cpu in a particular node, with a big lost of efficiency.

For example, job1 is submitted to run in 4 cores and it allocate the first 4 cpus in node n001 (node n001 has 16 cores); a second job2 is submitted to run in 4 cores (mpirun -n 4 exe) and in principle, it should run in the next available 4-free-cpus. However, it is not happen like that. The two jobs are sharing the same 4 cpus with a efficiency of 50% along of the run.

I compiled openmpi and I tesed it. I do not have this problem with openmpi.

Have someone found this problem before?

Is there a simple solution for that?

Any help is highly welcome.

Juarez L. F. Da Silva

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

If you are starting MPI jobs manually on the same node, you could use the Intel MPI environment variables to assign each to different cores. Intel MPI sets affinity by default; unless you set each to a different group of cores, each will get the same default. The same would happen in OpenMP if you set affinities, as you must do to get full performance. It's true, if you let the threads float, you can hope that the OS scheduler will assign them to different cores, but you can't hope for a good selection with each job local to one cache.
Of course, the usual way to keep MPI jobs separate on a cluster is to use the job scheduler, such as PBS, torque, SGE.

Quoting - Juarez L. F. Da Silva
Recently, I just rceived a SGI cluster with the full package from intel, e.g., compiler, MKL, and MPI. It was set up by SGI.

I found out that "mpirun" is allocating different parallel jobs in the same cpu in a particular node, with a big lost of efficiency.

Is there a simple solution for that?

Any help is highly welcome.

Juarez L. F. Da Silva

Hi Juarez,

tim18 is right - the best way to control workload is using of job shedulers. Intel's mpirun has internal support of PBS Pro. If you need to use SGE please read this document.
If you want to control cpu allocation on your own you can use I_MPI_PIN_DOMAIN env variable - please read the Reference Manual for detailed description.

Generally speaking, mpirun or mpiexec don't know about cpu usage.

Best wishes,
Dmitry

Quoting - Juarez L. F. Da Silva

Dear All

I am using Torque PBS queue system to manage all jobs in the cluster, however, it does not solve the problem.

I have several nodes with 16 cores and I would like to run 4 parallel jobs per node using 4 cores each. At the moment, using MPI mpirun from intel, and submited using torque PBS system, all 4 parallel jobs submitted at the same node share 4 cpus, i.e., 25% for each parallel job, while all other 12 cpus are empty. This happen with a submission using PBS or locally. It also happen by submitting locally in the head node using mpirun. I understand, that somehow the configuration is to run ONE parallel job per node and I would like to change it to allowed to run at least 4 parallel jobs per node.

I will check all your suggestions.

Juarez L. F. Da Silva

Hi Juarez,

There was a thread earlier within this forum that dealt with a similar issue. Here are all the details.

Basically, you can also try setting the I_MPI_PIN_DOMAIN environment variable to auto before running each Intel MPI job. For example:

$ export I_MPI_PIN_DOMAIN=auto
$ mpirun ...

or, alternatively (if you're using ssh):

$ mpirun -r ssh -genv I_MPI_PIN_DOMAIN auto ...

Let us know how it goes.

Regards,
~Gergana

Gergana Slavova
Technical Consulting Engineer
Intel® Cluster Tools
E-mail: gergana.s.slavova_at_intel.com

Login to leave a comment.