Improve Performance and Stability with Intel® MPI Library on InfiniBand*

Published:02/07/2020   Last Updated:08/14/2020

Overview

Intel® MPI Library 2019 has transitioned to exclusively using libfabric to facilitate communications. The libfabric infrastructure is based around providers to handle implementation of message transfer for various hardware vendors. The MLX provider is implemented to facilitate usage of InfiniBand* hardware.

Rationale

Stability and performance when utilizing InfiniBand* was sub-optimal in the initial and early update releases of Intel® MPI Library. The MLX provider in libfabric addresses these concerns.

Availability

The MLX provider is available in Intel® MPI Library 2019 Update 5 for Linux* as a technical preview, and as a full feature in Intel® MPI Library 2019 Update 6 for Linux*.

Requirements

  • Intel® MPI Library 2019 Update 5 or higher
  • Mellanox UCX* Framework v1.4 or higher

Basic Usage

Ensure you are using the libfabric version provided with Intel® MPI Library. In Intel® MPI Library 2019 Update 5, the MLX provider is a technical preview, and will not be selected by default. To enable it, set FI_PROVIDER=mlx

Intel® MPI Library 2019 Update 6 and later uses the MLX by default if InfiniBand* is detected at runtime.

Performance Tuning Options

Option Usage Reference
I_MPI_COLL_EXTERNAL Set to 1 to enable external collective operations (HCOLL) I_MPI_ADJUST Family Environment Variables
Autotuner Automatically tune application at the beginning of the run. Autotuning

Limitations

  • Dynamic process management is not yet supported as of Intel® MPI Library 2019 Update 6. Support will be implemented in a future release.
  • Older InfiniBand hardware doesn't support all of the expected transports. To check and resolve transport issues:
    $ucx_info -d | grep Transport

    Output should include dc, rc, and ud transports. On older hardware, the dc transport will likely be missing. As a workaround, set

    UCX_TLS=rc,ud,sm,self

    If none of the required transports are present, this is usually due to a driver misconfiguration, missing libraries, or other fabric software problems. Please recheck your UCX configuration using one of the following:

    $ibv_devinfo
    $lspci | grep Mellanox

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804