Environment Variables for Main ThreadProcess Pinning
I_MPI_PIN
Turn on/off
process
pinning.Syntax
I_MPI_PIN=
<arg>
Arguments
<arg> | Binary indicator |
enable | yes | on | 1 | Enable
process pinning. This is the default
value |
disable | no | off | 0 | Disable
process pinning |
Description
Set this environment variable to control the
process
pinning feature of the Intel® MPI
Library.I_MPI_PIN_PROCESSOR_LIST
(I_MPI_PIN_PROCS)
Define a processor subset and the mapping rules for MPI
processes
within this subset.Syntax
I_MPI_PIN_PROCESSOR_LIST=
<value>
The environment variable value has the following syntax forms:
1.
<proclist>
2.
[
<procset>
][:[grain=
<grain>
][,shift=
<shift>
][,preoffset=
<preoffset>
][,postoffset=
<postoffset>
]
3.
[
<procset>
][:map=
<map>
]
The following paragraphs provide detail descriptions for the values of these
syntax forms.
The
postoffset
keyword has offset
alias.The second form of the pinning procedure has three steps:
- Cyclic shift of the source processor list onpreoffset*grainvalue.
- Round robin shift of the list derived on the first step onshift*grainvalue.
- Cyclic shift of the list derived on the second step on thepostoffset*grainvalue.
The
grain
, shift
, preoffset
, and postoffset
parameters have a unified definition style.This environment variable is available for both Intel® and non-Intel
microprocessors, but it may perform additional optimizations for Intel
microprocessors than it performs for non-Intel microprocessors.
Syntax
I_MPI_PIN_PROCESSOR_LIST=
<proclist>
Arguments
<proclist> | A comma-separated list of
logical processor numbers and/or ranges of processors. The
process with the i-th rank is pinned
to the i-th processor in the list. The number should not exceed the amount
of processors on a node. |
<l> | Processor with logical
number <l> |
<l>-<m> | Range of processors with
logical numbers from <l> <m> |
<k>,<l>-<m> | Processors <k> <l> <m> |
Syntax
I_MPI_PIN_PROCESSOR_LIST=[
<procset>
][:[grain=<grain>
][,shift=<shift>
][,preoffset=<preoffset>
][,postoffset=<postoffset>
]Arguments
<procset> | Specify a processor subset
based on the topological numeration. The default value is allcores . |
all | All logical processors.
Specify this subset to define the number of CPUs on a node. |
allcores | All cores (physical CPUs). Specify this subset to define the number of
cores on a node. This is the default value. If Intel® Hyper-Threading Technology is disabled, allcores equals to all . |
allsocks | All packages/sockets.
Specify this subset to define the number of sockets on a node. |
<grain> | Specify the pinning
granularity cell for a defined <procset> <grain> <procset> <grain> <procset> <grain> <procset> <grain> <grain> |
<shift> | Specify the granularity of
the round robin scheduling shift of the cells for the <procset> <shift> <grain> <shift> 1 normal increment. |
<preoffset> | Specify the cyclic shift of
the processor subset <procset> <preoffset> <grain> <preoffset> |
<postoffset> | Specify the cyclic shift of
the processor subset <procset> <postoffset> <grain> <postoffset> |
The following table displays the values for
<grain>
, <shift>
, <preoffset>
, and <postoffset>
options:<n> | Specify an explicit value
of the corresponding parameters. <n> |
fine | Specify the minimal value
of the corresponding parameter. |
core | Specify the parameter value
equal to the amount of the corresponding parameter units contained in one
core. |
cache1 | Specify the parameter value
equal to the amount of the corresponding parameter units that share an L1
cache. |
cache2 | Specify the parameter value
equal to the amount of the corresponding parameter units that share an L2
cache. |
cache3 | Specify the parameter value
equal to the amount of the corresponding parameter units that share an L3
cache. |
cache | The largest value among cache1 , cache2 ,
and cache3 . |
socket | sock | Specify the parameter value
equal to the amount of the corresponding parameter units contained in one
physical package/socket. |
half | mid | Specify the parameter value
equal to socket/2 . |
third | Specify the parameter value
equal to socket/3 . |
quarter | Specify the parameter value
equal to socket/4 . |
octavo | Specify the parameter value
equal to socket/8 . |
Syntax
I_MPI_PIN_PROCESSOR_LIST=[
<procset>
][:map=<map>
]Arguments
<map> | The mapping pattern used
for
process placement. |
bunch | The
processes are mapped as close as
possible on the sockets. |
scatter | The
processes are mapped as remotely as
possible so as not to share common resources: FSB, caches, and
core. |
spread | The
processes are mapped consecutively
with the possibility not to share common resources. |
Description
Set the
I_MPI_PIN_PROCESSOR_LIST
environment
variable to define the processor placement. To avoid conflicts with different
shell versions, the environment variable value may need to be enclosed in
quotes.This environment variable is valid only if
I_MPI_PIN
is enabled.The
I_MPI_PIN_PROCESSOR_LIST
environment variable
has the following different syntax variants:- Explicit processor list. This comma-separated list is defined in terms of logical processor numbers. The relative node rank of aprocessis an index to the processor list such that the i-thprocessis pinned on i-th list member. This permits the definition of anyprocessplacement on the CPUs.For example,processmapping forI_MPI_PIN_PROCESSOR_LIST=p0,p1,p2,...,pnis as follows:Rank on a node012...n-1NLogical CPUp0p1p2...pn-1Pn
- grain/shift/offsetmapping. This method provides cyclic shift of a definedgrainalong the processor list with steps equal toshift*grainand a single shift onoffset*grainat the end. This shifting action is repeatedshifttimes.For example: grain = 2 logical processors, shift = 3 grains, offset = 0.
Legend:
gray
- MPI
process
grainsA)
red
- processor grains chosen on the 1st
passB)
cyan
- processor grains chosen on the 2nd
passC)
green
- processor grains chosen on the final 3rd
passD) Final map table ordered by MPI ranks
A)
0 1 | 2 3 | ... | 2n-2 2n-1 | ||||||
0 1 | 2 3 | 4 5 | 6 7 | 8 9 | 10 11 | ... | 6n-6 6n-5 | 6n-4 6n-3 | 6n-2 6n-1 |
B)
0 1 | 2n 2n+1 | 2 3 | 2n+2 2n+3 | ... | 2n-2 2n-1 | 4n-2 4n-1 | |||
0 1 | 2 3 | 4 5 | 6 7 | 8 9 | 10 11 | ... | 6n-6 6n-5 | 6n-4 6n-3 | 6n-2 6n-1 |
C)
0 1 | 2n 2n+1 | 4n 4n+1 | 2 3 | 2n+2 2n+3 | 4n+2 4n+3 | ... | 2n-2 2n-1 | 4n-2 4n-1 | 6n-2 6n-1 |
0 1 | 2 3 | 4 5 | 6 7 | 8 9 | 10 11 | ... | 6n-6 6n-5 | 6n-4 6n-3 | 6n-2 6n-1 |
D)
0 1 | 2 3 | … | 2n-2 2n-1 | 2n 2n+1 | 2n+2 2n+3 | … | 4n-2 4n-1 | 4n 4n+1 | 4n+2 4n+3 | … | 6n-2 6n-1 |
0 1 | 6 7 | … | 6n-6 6n-5 | 2 3 | 8 9 | … | 6n-4 6n-3 | 4 5 | 10 11 | … | 6n-2 6n-1 |
- Predefined mapping scenario. In this case popularprocesspinning schemes are defined as keywords selectable at runtime. There are two such scenarios:bunchandscatter.
In the
bunch
scenario the
processes
are mapped proportionally to sockets
as closely as possible. This mapping makes sense for partial processor loading. In
this case the number of
processes
is less than the number of
processors.In the
scatter
scenario the
processes
are mapped as remotely as possible
so as not to share common resources: FSB, caches, and cores.In the example, there are two sockets, four cores per socket, one logical CPU per
core, and two cores per shared cache.
Legend:
gray
- MPI
processes
cy
an
- 1st
socket processorsgre
en
- 2nd
socket processorsSame color defines a processor pair sharing a cache
0 | 1 | 2 | 3 | 4 | ||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
bunch
scenario for 5 processes0 | 4 | 2 | 6 | 1 | 5 | 3 | 7 | |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
scatter
scenario for full loadingExamples
To pin the processes to CPU0 and CPU3 on each node
globally, use the following command:
$ mpirun -genv I_MPI_PIN_PROCESSOR_LIST=0,3 -n <# of processes> <executable>
To pin the processes to different CPUs on each node
individually (CPU0 and CPU3 on host1 and CPU0, CPU1 and CPU3 on host2), use the
following command:
$ mpirun -host host1 -env I_MPI_PIN_PROCESSOR_LIST=0,3 -n <# of processes> <executable> : \ -host host2 -env I_MPI_PIN_PROCESSOR_LIST=1,2,3 -n <# of processes> <executable>
To print extra debug information about process
pinning, use the following command:
$ mpirun -genv I_MPI_DEBUG=4 -m -host host1 \ -env I_MPI_PIN_PROCESSOR_LIST=0,3 -n <# of processes> <executable> :\ -host host2 -env I_MPI_PIN_PROCESSOR_LIST=1,2,3 -n <# of processes> <executable>
If the number of processes is greater than the number of CPUs used for pinning,
the process list is wrapped around to the start of the processor list.
Syntax
I_MPI_PIN_PROCESSOR_EXCLUDE_LIST=
<proclist>
Arguments
<proclist> | A comma-separated list of
logical processor numbers and/or ranges of processors. |
<l> | Processor with logical
number <l> |
<l>-<m> | Range of processors with
logical numbers from <l> <m> |
<k>,<l>-<m> | Processors <k> <l> <m> |
Description
Set this environment variable to define the logical
processors that Intel® MPI Library does not use for pinning capability on the
intended hosts. Logical processors are numbered as in
/proc/cpuinfo
.I_MPI_PIN_CELL
Set this environment variable to define the pinning
resolution granularity.
I_MPI_PIN_CELL
specifies the
minimal processor cell allocated when an MPI process is running.Syntax
I_MPI_PIN_CELL=
<cell>
Arguments
<cell> | Specify the resolution
granularity |
unit | Basic processor unit
(logical CPU) |
core | Physical processor
core |
Description
Set this environment variable to define the processor
subset used when a process is running. You can choose from two scenarios:
- all possible CPUs in a node (unitvalue)
- all cores in a node (corevalue)
The environment variable has effect on both pinning
types:
- one-to-one pinning through theI_MPI_PIN_PROCESSOR_LISTenvironment variable
- one-to-many pinning through theI_MPI_PIN_DOMAINenvironment variable
The default value rules are:
- If you useI_MPI_PIN_DOMAIN, then the cell granularity isunit.
- If you useI_MPI_PIN_PROCESSOR_LIST, then the following rules apply:
- When the number of processes is greater than the number of cores, the cell granularity isunit.
- When the number of processes is equal to or less than the number of cores, the cell granularity iscore.
The
core
value is not affected by the
enabling/disabling of Intel® Hyper-Threading Technology in a system.I_MPI_PIN_RESPECT_CPUSET
Respect the process affinity mask.
Syntax
I_MPI_PIN_RESPECT_CPUSET=
<value>
Arguments
<value> | Binary indicator |
enable | yes | on | 1 | Respect the process
affinity mask. This is the default value |
disable | no | off | 0 | Do not respect the process
affinity mask |
Description
If you set
I_MPI_PIN_RESPECT_CPUSET=enable
, the Hydra process launcher uses job
manager's process affinity mask on each intended host to determine logical
processors for applying Intel MPI Library pinning capability.If you set
I_MPI_PIN_RESPECT_CPUSET=disable
, the Hydra process launcher uses its own
process affinity mask to determine logical processors for applying Intel MPI
Library pinning capability.I_MPI_PIN_RESPECT_HCA
In the presence of Infiniband architecture* host
channel adapter (IBA* HCA), adjust the pinning according to the location of IBA
HCA.
Syntax
I_MPI_PIN_RESPECT_HCA=
<value>
Arguments
<value> | Binary indicator |
enable | yes | on | 1 | Use the location of IBA HCA if available. This is the default
value |
disable | no | off | 0 | Do not use the location of
IBA HCA |
Description
If you set
I_MPI_PIN_RESPECT_HCA=enable
, the Hydra process launcher uses the
location of IBA HCA on each intended host for applying Intel MPI Library pinning
capability.If you set
I_MPI_PIN_RESPECT_HCA=disable
, the Hydra process launcher does not use the
location of IBA HCA on each intended host for applying Intel MPI Library pinning
capability.