question about "I_MPI_PIN_DOMAIN=<masklist>"

question about "I_MPI_PIN_DOMAIN=<masklist>"

Dear mic forum,

What I'd like to do is to split my 60-core coprocessor into 4 domains, pin one MPI process to each domain, and in each process let 60 threads be bound to 60 logical cores.

I was reading this post on process pinning and thread affinity, and had a feeling that the "masklist" could help achieve the goal. What confused me is the description on masklist, which states:

Each mi number defines one separate domain. The following rule is used: the ith logical processor is included into the domain if the corresponding mi value is set to 1. All remaining processors are put into a separate domain. BIOS numbering is used

I don't quite understand what it is talking about. What is BIOS numbering? Could someone give me a solid example on the masklist with detailed explanations? The example given in that post still appears confusing to me, what is =[0001E,001E0,01E00,1E000]? What is the best method to realize what I want?

Thanks a lot for your time!

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

A way of accomplishing this is to set a value for MIC_KMP_PLACE_THREADS in the environment for each rank







That is, assign 15 cores of 4 threads per core, with a different offset, for each rank, so as to make the assignments non-overlapping.

This automatically sets OMP_NUM_THREADS=60 for each rank.

As MPI usually increases significantly the workload on the core which is running MPSS, you may find that one core doesn't have sufficient resource left to perform its share of user work, so it may work better if you assign only 56 or 59 cores.

You will probably want to set a value for OMP_PROC_BIND as well.

With this scheme, you can set 2 or 3 threads per core in case that suits your workload better, and still see the work spread across the cores without MPI processes contending for the same cores.

Each rank has to be listed separately for mpirun in order to make this one difference (offset) in the environment.

The Jeffers, Reinders book gives an example from before KMP_PLACE_THREADS was made available.

On Intel MPI 4.1.1, starting a job from the host, following work's for me:

export I_MPI_MIC=1
export I_MPI_PIN_MODE=pm # let hydra process manager generate appropriate pinning domain masks
mpirun -np 4 -env KMP_PLACE_THREADS 15C,4T -host mic0  ./a.out.mic

As mentioned by Tim, you could get better results if you reserve mic's core0 for system tasks, especially when using tcp or dapl/scif0 for heavy MPI communication. Reducing number of threads placed on each core from 4 to 3 or 2 can usually help as well.

You can also take a look at this article. It is similar to the one you pointed out but has more examples with illustrations. 

You can also use 


This will allow you have a domain size of 60 i.e. 60 logical cores in each domain  and the logical cores within the domain will be located as close as possible. You can also use other layout options as described by the document.

In my experience, I have found it helpful to set the KMP_PLACE_THREADS and KMP_AFFINITY in addition to I_MPI_PIN_DOMAIN. You can read more about KMP_PLACE_THREADS at

I hope this is what you were looking for. 

The affinity you describe is what already happens by default.

Leave a Comment

Please sign in to add a comment. Not a member? Join today