Recently, I just rceived a SGI cluster with the full package from intel, e.g., compiler, MKL, and MPI. It was set up by SGI.
I found out that "mpirun" is allocating different parallel jobs in the same cpu in a particular node, with a big lost of efficiency.
For example, job1 is submitted to run in 4 cores and it allocate the first 4 cpus in node n001 (node n001 has 16 cores); a second job2 is submitted to run in 4 cores (mpirun -n 4 exe) and in principle, it should run in the next available 4-free-cpus. However, it is not happen like that. The two jobs are sharing the same 4 cpus with a efficiency of 50% along of the run.
I compiled openmpi and I tesed it. I do not have this problem with openmpi.
Have someone found this problem before?
Is there a simple solution for that?
Any help is highly welcome.
Juarez L. F. Da Silva