• 2019 Update 7
  • 03/31/2020
Contents

Code Change Guide

Intel® MPI Library Developer Guide for Linux* OS
The example in this section shows you one of the ways to change a legacy program to effectively use the advantages of the
MPI_THREAD_SPLIT
threading model.
In the original code (thread_split.cpp), the functions
work_portion_1()
,
work_portion_2()
,
and work_portion_3()
represent a CPU load that modifies the content of the memory pointed to by the
in
and
out
pointers. In this particular example, these functions perform correctness checking of the
MPI_Allreduce()
function.

Changes Required to Use the OpenMP* Threading Model

  1. To run MPI functions in a multithreaded environment,
    MPI_Init_thread()
    with the argument equal to
    MPI_THREAD_MULTIPLE
    must be called instead of
    MPI_Init()
    .
  2. According to the
    MPI_THREAD_SPLIT
    model, in each thread you must execute MPI operations over the communicator specific to this thread only. So, in this example, the
    MPI_COMM_WORLD
    communicator must be duplicated several times so that each thread has its own copy of
    MPI_COMM_WORLD
    .
    Note
    The limitation is that communicators must be used in such a way that the thread with
    thread_id
    n on one node communicates only with the thread with
    thread_id
    m on the other. Communications between different threads (
    thread_id
    n on one node,
    thread_id
    m on the other) are not supported.
  3. The data to transfer must be split so that each thread handles its own portion of the input and output data.
  4. The barrier becomes a two-stage one: the barriers on the MPI level and the OpenMP level must be combined.
  5. Check that the runtime sets up a reasonable affinity for OpenMP threads. Typically, the OpenMP runtime does this out of the box, but sometimes, setting up the
    OMP_PLACES=cores
    environment variable might be necessary for optimal multi-threaded MPI performance.

Changes Required to Use the POSIX Threading Model

  1. To run MPI functions in a multithreaded environment,
    MPI_Init_thread()
    with the argument equal to
    MPI_THREAD_MULTIPLE
    must be called instead of
    MPI_Init()
    .
  2. You must execute MPI collective operation over a specific communicator in each thread. So the duplication of
    MPI_COMM_WORLD
    should be made, creating a specific communicator for each thread.
  3. The info key
    thread_id
    must be properly set for each of the duplicated communicators.
    Note
    The limitation is that communicators must be used in such a way that the thread with
    thread_id
    n
    on one node communicates only with the thread with
    thread_id
    m
    on the other. Communications between different threads (
    thread_id
    n
    on one node,
    thread_id
    i">m on the other) are not supported.
  4. The data to transfer must be split so that each thread handles its own portion of the input and output data.
  5. The barrier becomes a two-stage one: the barriers on the MPI level and the POSIX level must be combined.
  6. The affinity of POSIX threads can be set up explicitly to reach optimal multithreaded MPI performance.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804