Developer Guide

  • 2021.1
  • 12/04/2020
  • Public Content
Contents

Code Change Guide

The example in this section shows you one of the ways to change a legacy program to effectively use the advantages of the
MPI_THREAD_SPLIT
threading model.
In the original code (thread_split.cpp), the functions
work_portion_1()
,
work_portion_2()
,
and work_portion_3()
represent a CPU load that modifies the content of the memory pointed to by the
in
and
out
pointers. In this particular example, these functions perform correctness checking of the
MPI_Allreduce()
function.
Changes Required to Use the OpenMP* Threading Model
  1. To run MPI functions in a multithreaded environment,
    MPI_Init_thread()
    with the argument equal to
    MPI_THREAD_MULTIPLE
    must be called instead of
    MPI_Init()
    .
  2. According to the
    MPI_THREAD_SPLIT
    model, in each thread you must execute MPI operations over the communicator specific to this thread only. So, in this example, the
    MPI_COMM_WORLD
    communicator must be duplicated several times so that each thread has its own copy of
    MPI_COMM_WORLD
    .
    Note
    The limitation is that communicators must be used in such a way that the thread with
    thread_id
    n on one node communicates only with the thread with
    thread_id
    m on the other. Communications between different threads (
    thread_id
    n on one node,
    thread_id
    m on the other) are not supported.
  3. The data to transfer must be split so that each thread handles its own portion of the input and output data.
  4. The barrier becomes a two-stage one: the barriers on the MPI level and the OpenMP level must be combined.
  5. Check that the runtime sets up a reasonable affinity for OpenMP threads. Typically, the OpenMP runtime does this out of the box, but sometimes, setting up the
    OMP_PLACES=cores
    environment variable might be necessary for optimal multi-threaded MPI performance.
Changes Required to Use the POSIX Threading Model
  1. To run MPI functions in a multithreaded environment,
    MPI_Init_thread()
    with the argument equal to
    MPI_THREAD_MULTIPLE
    must be called instead of
    MPI_Init()
    .
  2. You must execute MPI collective operation over a specific communicator in each thread. So the duplication of
    MPI_COMM_WORLD
    should be made, creating a specific communicator for each thread.
  3. The info key
    thread_id
    must be properly set for each of the duplicated communicators.
    Note
    The limitation is that communicators must be used in such a way that the thread with
    thread_id
    n
    on one node communicates only with the thread with
    thread_id
    m
    on the other. Communications between different threads (
    thread_id
    n
    on one node,
    thread_id
    i">m on the other) are not supported.
  4. The data to transfer must be split so that each thread handles its own portion of the input and output data.
  5. The barrier becomes a two-stage one: the barriers on the MPI level and the POSIX level must be combined.
  6. The affinity of POSIX threads can be set up explicitly to reach optimal multithreaded MPI performance.
See Also

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.