Intel MPI strange behavor with I_MPI_PIN_DOMAIN

Intel MPI strange behavor with I_MPI_PIN_DOMAIN

Hi,

I tried to pin processes to core allocated by SGE which support processor affinity(over 6.2u5).
The normal MPI programs work by using I_MPI_PIN_PROCESSOR_LIST.
However, The Hybrid(MPI + OpenMP) programs does not work by usign I_MPI_PIN_DOMAIN.

For example:
$ mpiexec -genv I_MPI_PIN_DOMAIN [1] -n 1 ./affinity
rank = 0, affinity = 0 <-- works !
$ mpiexec -genv I_MPI_PIN_DOMAIN [2] -n 1 ./affinity
rank = 0, affinity = 0 2 3 4 5 6 7 8 9 10 <-- not works
$ mpiexec -genv I_MPI_PIN_DOMAIN [3] -n 1 ./affinity
rank = 0, affinity = 0 1 <-- works
$ mpiexec -genv I_MPI_PIN_DOMAIN [4] -n 1 ./affinity
rank = 0, affinity = 0 1 3 4 5 6 7 8 9 10 11

If Open MPI is used with the -rf(rankfile) option, these work.
Can you help me ?

Thank you in advance.

Sincerely,
T.Ikeda

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi,

That probably goes from the misunderstanding of the I_MPI_PIN_DOMAIN logic.

I_MPI_PIN_DOMAIN doesn't limit number of processors to the one you used in a mask. It creates domains!
In your case (-genv I_MPI_PIN_DOMAIN [2]) 2 domains will be created: the first one will contain only one processor - 1th one, and the second domain will contain all other processors.
The problem here is that a domain with 0-th processor will be used first. That is why you see such behaviour.

Much better to use not exact mask but domain size. For example you know that you have a processor with 4 cores on each and 2 processors in a socket (8 cores). You can create 2 domains size of 4 and Intel MPI library will automatically create these domains so that processes will be allocated as close as possible inside of a domain. (I_MPI_PIN_DOMAIN=4)
Or even better to use I_MPI_PIN_DOMAIN=socket.

You can create domains so that processes will be allocated to share cache memory, e.g.: I_MPI_PIN_DOMAIN=cache2
Any MPI process and its openMP threads will share on domain.

Might be you need to try I_MPI_PIN_PROCESSOR_LIST environment variable?

Regards!
Dmitry

Thank you for your reply.

However, I don't understand how to pin processes of the hybrid(MPI + OpenMP) programs.
If I run 2processes which have 4threads each on 2nodes, I can pin with Open MPI as follows:

$ export KMP_AFFINITY=compact
$ export OMP_NUM_THREADS=4
$ cat machinefile
node1
node2
$ cat rankfile
rank 0=node1 slot=1,2,6,10 <-- This slot information will be provided by Sun Grid Eingine
rank 1=node2 slot=1,4,7,11 <-- This slot information will be provided by Sun Grid Eingine
$ mpirun -np 2 -machinefile machinefile -rf rankfile -x OMP_NUM_THREADS
-x KMP_AFFINITY ./affinity
hostname = node1, rank = 0, thread = 0, affinity = 1
hostname = node1, rank = 0, thread = 1, affinity = 2
hostname = node1, rank = 0, thread = 2, affinity = 6
hostname = node1, rank = 0, thread = 3, affinity = 10
hostname = node2, rank = 1, thread = 0, affinity = 1
hostname = node2, rank = 1, thread = 1, affinity = 4
hostname = node2, rank = 1, thread = 2, affinity = 7
hostname = node2, rank = 1, thread = 3, affinity = 11

Could you tell me how to pin with Intel Compiler + Intel MPI ?

Thank you in advance.

Sincerely,
T.Ikeda

Leave a Comment

Please sign in to add a comment. Not a member? Join today