Intel MPI with pthread

Intel MPI with pthread

I am trying to run a program that uses Pthread with Intel MPI. The program was compiled and linked successfully. I ran it on a dual-socket machine with two quad-core processors, but no threads seemed to be created. Below is the command I used:

mpirun -n 2 exectable

The program is supposed to generate 8 threads in one of the 2 processes. Thanks.

9 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Quoting - jackyjngwn
I am trying to run a program that uses Pthread with Intel MPI.

Intel MPI is said to be more particular than certain other MPI implementations in requiring MPI_Init_thread to be used correctly for this case. The I_MPI_PIN_DOMAIN environment variable settings are provided (since MPI 3.2) to give some control over thread affinity.
Other than that, I doubt much can be said with the information you gave.

Hi, Jacky

One other thing I would point out: when doing hybrid programming (threads + MPI), make sure you're using the thread-safe Intel MPI library. You can specify this for your Intel MPI Library compiler script (such as mpiicc, or mpiifort) by using the -mt_mpi flag. Or you can manually link with libmpi_mt.so in the lib/lib64 directory.

Regards,
~Gergana

Quoting - tim18

Intel MPI is said to be more particular than certain other MPI implementations in requiring MPI_Init_thread to be used correctly for this case. The I_MPI_PIN_DOMAIN environment variable settings are provided (since MPI 3.2) to give some control over thread affinity.
Other than that, I doubt much can be said with the information you gave.

Thanks for the reply. Do you mean that I have to use "MPI_Init_thread" in my program instead of MPI_init, or I can specifiy the thread levels using the -mt_mpl option?

Also, according to the reference manual, I_MPI_PIN_DOMAIN is for OpenMP only, isn't it?

According to my understanding, MPI_Init_thread is required, and the -mt_mpi would be needed for some of the supported threading models (perhaps not yours, except that it would be important anyway if you don't segregate threads from each other).
I_MPI_PIN_DOMAIN evidently has options to correspond with OpenMP, and may use the KMP_AFFINITY mechanism in that case to place the threads. I'm trying to refresh my installation so as to look at the current documentation on I_MPI_PIN_DOMAIN. If you care about placement of threads, you would want some mechanism such as this or taskset.

Quoting - tim18
According to my understanding, MPI_Init_thread is required, and the -mt_mpi would be needed for some of the supported threading models (perhaps not yours, except that it would be important anyway if you don't segregate threads from each other).
I_MPI_PIN_DOMAIN evidently has options to correspond with OpenMP, and may use the KMP_AFFINITY mechanism in that case to place the threads. I'm trying to refresh my installation so as to look at the current documentation on I_MPI_PIN_DOMAIN. If you care about placement of threads, you would want some mechanism such as this or taskset.

It appears that I_MPI_PIN_DOMAIN doesn't depend on OpenMP, but you may have to figure out the option if you are trying to reserve more than half the cores to 1 process.

I've linked to mpi_mt using -lmpi_mt, and changed MPI_INIT to MPI_INIT_thread, but it seems that these tricks didn't work.

My program is this. There are two processes, one is the master, one is the slave. The slave generates 8 threads. The master process reads the input data and dispatches to thethreads of the slaves. The 8 threads are supposed to use the 8 cores of the two quad-core processors. However, from the output of "top", only two cores were used, one by the master process, and one by the slave process.

I tried to use I_MPI_PIN_DOMAIN, too. When "sock" was specified, two cores were used by the slave process. But nothing changed when other arguments were used. What does this mean? thanks!

Quoting - jackyjngwn

I've linked to mpi_mt using -lmpi_mt, and changed MPI_INIT to MPI_INIT_thread, but it seems that these tricks didn't work.

My program is this. There are two processes, one is the master, one is the slave. The slave generates 8 threads. The master process reads the input data and dispatches to thethreads of the slaves. The 8 threads are supposed to use the 8 cores of the two quad-core processors. However, from the output of "top", only two cores were used, one by the master process, and one by the slave process.

I tried to use I_MPI_PIN_DOMAIN, too. When "sock" was specified, two cores were used by the slave process. But nothing changed when other arguments were used. What does this mean? thanks!

Hi Jacky,

Could you try to set I_MPI_PIN_DOMAIN to 'auto'. It it doesn't change the situation try to set I_MPI_PIN to 'off'.

Best wihes,
Dmitry

I've finally got it working, after I set I_MPI_PIN_MODE to node. Thank you allforthe help!

Leave a Comment

Please sign in to add a comment. Not a member? Join today