I_MPI_ADJUST Family Environment Variables

I_MPI_ADJUST_<opname>

Control collective operation algorithm selection.

Syntax

I_MPI_ADJUST_<opname>="<algid>[:<conditions>][;<algid>:<conditions>[...]]"

Arguments

<algid>

Algorithm identifier

>= 0

The default value of zero selects the reasonable settings

 

<conditions>

A comma separated list of conditions. An empty list selects all message sizes and process combinations

<l>

Messages of size <l>

<l>-<m>

Messages of size from <l> to <m>, inclusive

<l>@<p>

Messages of size <l> and number of processes <p>

<l>-<m>@<p>-<q>

Messages of size from <l> to <m> and number of processes from <p> to <q>, inclusive

Description

Set this environment variable to select the desired algorithm(s) for the collective operation <opname> under particular conditions. Each collective operation has its own environment variable and algorithms.

 

 

 

Environment Variables, Collective Operations, and Algorithms

Environment Variable

Collective Operation

Algorithms

I_MPI_ADJUST_ALLGATHER

MPI_Allgather

  1. Recursive doubling
  2. Bruck's
  3. Ring
  4. Topology aware Gatherv + Bcast
  5. Knomial

I_MPI_ADJUST_ALLGATHERV

MPI_Allgatherv

  1. Recursive doubling
  2. Bruck's
  3. Ring
  4. Topology aware Gatherv + Bcast

I_MPI_ADJUST_ALLREDUCE

MPI_Allreduce

  1. Recursive doubling
  2. Rabenseifner's
  3. Reduce + Bcast
  4. Topology aware Reduce + Bcast
  5. Binomial gather + scatter
  6. Topology aware binominal gather + scatter
  7. Shumilin's ring
  8. Ring
  9. Knomial
  10. Topology aware SHM-based flat
  11. Topology aware SHM-based Knomial
  12. Topology aware SHM-based Knary

I_MPI_ADJUST_ALLTOALL

MPI_Alltoall

  1. Bruck's
  2. Isend/Irecv + waitall
  3. Pair wise exchange
  4. Plum's

I_MPI_ADJUST_ALLTOALLV

MPI_Alltoallv

  1. Isend/Irecv + waitall
  2. Plum's

I_MPI_ADJUST_ALLTOALLW

MPI_Alltoallw

Isend/Irecv + waitall

I_MPI_ADJUST_BARRIER

MPI_Barrier

  1. Dissemination
  2. Recursive doubling
  3. Topology aware dissemination
  4. Topology aware recursive doubling
  5. Binominal gather + scatter
  6. Topology aware binominal gather + scatter
  7. Topology aware SHM-based flat
  8. Topology aware SHM-based Knomial
  9. Topology aware SHM-based Knary

I_MPI_ADJUST_BCAST

MPI_Bcast

  1. Binomial
  2. Recursive doubling
  3. Ring
  4. Topology aware binomial
  5. Topology aware recursive doubling
  6. Topology aware ring
  7. Shumilin's
  8. Knomial
  9. Topology aware SHM-based flat
  10. Topology aware SHM-based Knomial
  11. Topology aware SHM-based Knary
  12. NUMA aware SHM-based (SSE4.2)
  13. NUMA aware SHM-based (AVX2)

  14. NUMA aware SHM-based (AVX512)

I_MPI_ADJUST_EXSCAN

MPI_Exscan

  1. Partial results gathering
  2. Partial results gathering regarding layout of processes

I_MPI_ADJUST_GATHER

MPI_Gather

  1. Binomial
  2. Topology aware binomial
  3. Shumilin's
  4. Binomial with segmentation

I_MPI_ADJUST_GATHERV

MPI_Gatherv

  1. Linear
  2. Topology aware linear
  3. Knomial

I_MPI_ADJUST_REDUCE_SCATTER

MPI_Reduce_scatter

  1. Recursive halving
  2. Pair wise exchange
  3. Recursive doubling
  4. Reduce + Scatterv
  5. Topology aware Reduce + Scatterv

I_MPI_ADJUST_REDUCE

MPI_Reduce

  1. Shumilin's
  2. Binomial
  3. Topology aware Shumilin's
  4. Topology aware binomial
  5. Rabenseifner's
  6. Topology aware Rabenseifner's
  7. Knomial
  8. Topology aware SHM-based flat
  9. Topology aware SHM-based Knomial
  10. Topology aware SHM-based Knary
  11. Topology aware SHM-based binomial

I_MPI_ADJUST_SCAN

MPI_Scan

  1. Partial results gathering
  2. Topology aware partial results gathering

I_MPI_ADJUST_SCATTER

MPI_Scatter

  1. Binomial
  2. Topology aware binomial
  3. Shumilin's

I_MPI_ADJUST_SCATTERV

MPI_Scatterv

  1. Linear
  2. Topology aware linear

I_MPI_ADJUST_IALLGATHER

MPI_Iallgather

  1. Recursive doubling
  2. Bruck’s
  3. Ring

I_MPI_ADJUST_IALLGATHERV

MPI_Iallgatherv

  1. Recursive doubling
  2. Bruck’s
  3. Ring

I_MPI_ADJUST_IALLREDUCE

MPI_Iallreduce

  1. Recursive doubling
  2. Rabenseifner’s
  3. Reduce + Bcast
  4. Ring (patarasuk)
  5. Knomial
  6. Binomial
  7. Reduce scatter allgather
  8. SMP
  9. Nreduce

I_MPI_ADJUST_IALLTOALL

MPI_Ialltoall

  1. Bruck’s
  2. Isend/Irecv + Waitall
  3. Pairwise exchange

I_MPI_ADJUST_IALLTOALLV

MPI_Ialltoallv

Isend/Irecv + Waitall

I_MPI_ADJUST_IALLTOALLW

MPI_Ialltoallw

Isend/Irecv + Waitall

I_MPI_ADJUST_IBARRIER

MPI_Ibarrier

Dissemination

I_MPI_ADJUST_IBCAST

MPI_Ibcast

  1. Binomial
  2. Recursive doubling
  3. Ring
  4. Knomial
  5. SMP
  6. Tree knominal
  7. Tree kary

I_MPI_ADJUST_IEXSCAN

MPI_Iexscan

  1. Recursive doubling
  2. SMP

I_MPI_ADJUST_IGATHER

MPI_Igather

  1. Binomial
  2. Knomial

I_MPI_ADJUST_IGATHERV

MPI_Igatherv

  1. Linear
  2. Linear ssend

I_MPI_ADJUST_IREDUCE_SCATTER

MPI_Ireduce_scatter

  1. Recursive halving
  2. Pairwise
  3. Recursive doubling

I_MPI_ADJUST_IREDUCE

MPI_Ireduce

  1. Rabenseifner’s
  2. Binomial
  3. Knomial

I_MPI_ADJUST_ISCAN

MPI_Iscan

  1. Recursive Doubling
  2. SMP

I_MPI_ADJUST_ISCATTER

MPI_Iscatter

  1. Binomial
  2. Knomial

I_MPI_ADJUST_ISCATTERV

MPI_Iscatterv

Linear

The message size calculation rules for the collective operations are described in the table. In the following table, "n/a" means that the corresponding interval <l>-<m> should be omitted.

Message Collective Functions

Collective Function

Message Size Formula

MPI_Allgather

recv_count*recv_type_size

MPI_Allgatherv

total_recv_count*recv_type_size

MPI_Allreduce

count*type_size

MPI_Alltoall

send_count*send_type_size

MPI_Alltoallv

n/a

MPI_Alltoallw

n/a

MPI_Barrier

n/a

MPI_Bcast

count*type_size

MPI_Exscan

count*type_size

MPI_Gather

recv_count*recv_type_size if MPI_IN_PLACE is used, otherwise send_count*send_type_size

MPI_Gatherv

n/a

MPI_Reduce_scatter

total_recv_count*type_size

MPI_Reduce

count*type_size

MPI_Scan

count*type_size

MPI_Scatter

send_count*send_type_size if MPI_IN_PLACE is used, otherwise recv_count*recv_type_size

MPI_Scatterv

n/a

Examples

Use the following settings to select the second algorithm for MPI_Reduce operation:
I_MPI_ADJUST_REDUCE=2

Use the following settings to define the algorithms for MPI_Reduce_scatter operation:
I_MPI_ADJUST_REDUCE_SCATTER="4:0-100,5001-10000;1:101-3200,2:3201-5000;3"

In this case. algorithm 4 is used for the message sizes between 0 and 100 bytes and from 5001 and 10000 bytes, algorithm 1 is used for the message sizes between 101 and 3200 bytes, algorithm 2 is used for the message sizes between 3201 and 5000 bytes, and algorithm 3 is used for all other messages.

I_MPI_COLL_INTRANODE

Syntax

I_MPI_COLL_INTRANODE=<mode>

Arguments

<mode> 

Intranode collectives type

pt2pt

Use only point-to-point communication-based collectives

shm

Enables shared memory collectives. This is the default value

Description

Set this environment variable to switch intranode communication type for collective operations. If there is large set of communicators, you can switch off the SHM-collectives to avoid memory overconsumption.

I_MPI_COLL_INTRANODE_SHM_THRESHOLD

Syntax

I_MPI_COLL_INTRANODE_SHM_THRESHOLD=<nbytes>

Arguments

<nbytes> 

Define the maximal data block size processed by shared memory collectives.

> 0

Use the specified size. The default value is 16384 bytes.

Description

Set this environment variable to define the size of shared memory area available for each rank for data placement. Messages greater than this value will not be processed by SHM-based collective operation, but will be processed by point-to-point based collective operation. The value must be a multiple of 4096.

I_MPI_CBWR

Control reproducibility of floating-point operations results across different platforms, networks, and topologies in case of the same number of processes.

Syntax

I_MPI_CBWR=<arg>

Arguments

<arg>

CBWR compatibility mode

Description

0

None

Do not use CBWR in a library-wide mode. CNR-safe communicators may be created with MPI_Comm_dup_with_info explicitly. This is the default value.

1

Weak mode

Disable topology aware collectives. The result of a collective operation does not depend on the rank placement. The mode guarantees results reproducibility across different runs on the same cluster (independent of the rank placement).

2

Strict mode

Disable topology aware collectives, ignore CPU architecture, and interconnect during algorithm selection. The mode guarantees results reproducibility across different runs on different clusters (independent of the rank placement, CPU architecture, and interconnection)

Description

Conditional Numerical Reproducibility (CNR) provides controls for obtaining reproducible floating-point results on collectives operations. With this feature, Intel MPI collective operations are designed to return the same floating-point results from run to run in case of the same number of MPI ranks.

Control this feature with the I_MPI_CBWR environment variable in a library-wide manner, where all collectives on all communicators are guaranteed to have reproducible results. To control the floating-point operations reproducibility in a more precise and per-communicator way, pass the {“I_MPI_CBWR”, “yes”} key-value pair to the MPI_Comm_dup_with_info call.

Note

Setting the I_MPI_CBWR in a library-wide mode using the environment variable leads to performance penalty.

CNR-safe communicators created using MPI_Comm_dup_with_info always work in the strict mode. For example:

MPI_Info hint;
MPI_Comm cbwr_safe_world, cbwr_safe_copy;
MPI_Info_create(&hint);
MPI_Info_set(hint, “I_MPI_CBWR”, “yes”);
MPI_Comm_dup_with_info(MPI_COMM_WORLD, hint, & cbwr_safe_world);
MPI_Comm_dup(cbwr_safe_world, & cbwr_safe_copy);

In the example above, both cbwr_safe_world and cbwr_safe_copy are CNR-safe. Use cbwr_safe_world and its duplicates to get reproducible results for critical operations.

Note that MPI_COMM_WORLD itself may be used for performance-critical operations without reproducibility limitations.

For more complete information about compiler optimizations, see our Optimization Notice.
Select sticky button color: 
Orange (only for download buttons)