Simultaneous MPI Runs at the same time...

Simultaneous MPI Runs at the same time...

I apologize if this has been answered in the past, but the search feature didn't reveal any useful information for me.

We are running RHEL 5.4 and MPI v 3.2.2.006 on a single machine with dual 4 core processors. Our users connect to the machine via NX (basically ssh) and do all their processing/runs from there. Today we ran into an issue were 2 different users each wanted to kick-off a run consuming 4 cores, but MPI only put into use a total of 4 cores and split the time between the two different jobs. This left 4 cores completely unused.

I read the documentation on what seemed to be the relavent information and attemped starting mpd with a listen port of 50552 on a separate 3rd user's account, and editited their login scripts to not run "mpdboot" but run "mpd -h -p 50552" instead. However, this made no change at all.

I am taking over for the original IT guy who installed and setup the system, and have little knowledge of MPI in general, so I'm sorry if this is something thats generally common knownledge. Any help is greatly appreciated.

3 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

Hi,

You don't need to play with port numbers. The reason of such behavior lies in automatic pinning of the Intel MPI Library. You just need to switch if off either for all users or for those who want to run several MPI applications simulteniously. Use 'export I_MPI_PIN=0'

BTW: please update the library to the latest one - 3.2.2 is quite old and probably doesn't know about modern processors.

Regards!
Dmitry

Turning off the default affinity, as Dmitry advised, is an important first step. You may find that you need to set I_MPI_PIN_PROCS explicitly so that each job is restricted to a single 4-core CPU in order to get reasonable efficiency. Unfortunately, this requires the 2 job submitters to agree on which CPU to use, and to look up (e.g. by I_MPI_DEBUG=5) the core numbers associated with each CPU.
As Dmitri hinted, the automatic pinning would not recognize CPUs released since the issue of that Intel MPI version. However, version 3.2.2 is still preferred by some customers, who don't need the newer features, on account of issues still under investigation.

Laisser un commentaire

Veuillez ouvrir une session pour ajouter un commentaire. Pas encore membre ? Rejoignez-nous dès aujourd’hui