Intel® MPI Library 2019 over libfabric*

Overview

DAPL, TMI, OFA fabrics have been deprecated since Intel® MPI Library 2017 Update 1. Intel MPI Library 2019 does not support these fabrics, it supports only libfabric*. The required version of libfabric is 1.5.0 or higher.

The Intel MPI Library 2019 package contains the libfabric 1.7.0 Alpa library used when at least one of these rules is applied:

  • mpivars.sh script is executed with the -ofi_internal argument (or I_MPI_OFI_LIBRARY_INTERNAL is set to a positive value)
  • the system, where Intel MPI Library is running on, contains the libfabric version lower than the packaged libfabric.

What is libfabric?

The Open Fabrics Interfaces1 (OFI), is a framework focused on exporting communication services to applications. OFI is specifically designed to meet the performance and scalability requirements of high-performance computing (HPC) applications running in a tightly coupled network environment. The key components2 of OFI are: application interfaces, provider libraries, kernel services, daemons, and test applications.

Libfabric is a library that defines and exports the userspace API of OFI, and is typically the only software that applications deal with directly. The libfabric's API does not depend on the underlying networking protocols, as well as on the implementation of particular networking devices, over which it may be implemented. OFI is based on the notion of application centric I/O, meaning that the libfabric library is designed to align fabric services with application needs, providing a tight semantic fit between applications and the underlying fabric hardware. This reduces overall software overhead and improves application efficiency when transmitting or receiving data over a fabric.

See the OFI web site for more details. This link provides a project overview and detailed documentation for the libfabric's APIs.

Installing libfabric

You can install libfabric directly from the Intel Intel® Omni-Path Fabric Software package or build and compile it manually from the GitHub* repository.

 

Installing libfabric with the Intel® Omni-Path Fabric Software package

The Intel® Omni-Path Fabric Software package versions 10.5 and later already contain the libfabric libraries.

 

Building and installing libfabric from the source

Distribution tar packages are available from the GitHub* releases tab. If you are building libfabric from a developer Git clone, run the autogen.sh script first. This will invoke the GNU* Autotools to bootstrap libfabric's configuration and build mechanisms. If you are building libfabric from an official distribution tarball, do not run autogen.sh as libfabric distribution tarballs are already bootstrapped for you.

Libfabric currently supports GNU/Linux*, Microsoft Windows*, and OS X*. Please note that the Intel MPI Library does not support OS X.

Configuration options

The configure script has built-in command-line options. Run the following command:

 ./configure --help

to view available options. Some of the configuration switches are:

--prefix=<directory>

Where <directory> is a meta-symbol for the actual directory path that is to be supplied by the user. By default, make install places the files in the /usr directory tree. If the --prefix option is used, it indicates that libfabric files should be installed into the directory tree specified by <directory>. The executables built from the configure command are placed into <directory>/bin.

--with-valgrind=<directory>

The meta-symbol <directory> is the directory where valgrind is installed. If valgrind is found, valgrind annotations are enabled. This may incur a performance penalty.

--enable-debug

Enable debug code paths. This enables various extra checks and allows for using the highest verbosity logging output that is normally compiled out in production builds.

--enable-<provider>=[yes | no | auto | dl | <directory>]

--disable-<provider>

This enables or disables the fabric provider referenced by the meta-symbol <provider>. Valid options are:

  • auto (This is the default if the --enable-<provider> option is not specified).
    The provider will be enabled if all its requirements are satisfied. If one of the requirements cannot be satisfied, the provider is disabled.
  • yes (This is the default if the --enable-<provider> option is specified).
    The configure script will abort if the provider cannot be enabled (for example, due to some of its requirements not being available).
  • no
    Disable the provider. This is synonymous with --disable-<provider>.
  • dl
    Enable the provider and build it as a loadable library.
  • <PATH>
    Enable the provider and use the installation given in <PATH>.

    Some providers, most relevant for the Intel MPI library are: gni*, psm, psm2, sockets, and verbs.

Example

The command below tells libfabric to disable the sockets provider and explicitly enable the PSM2 provider. All other providers will be enabled if possible.

$ ./configure --disable-sockets --enable-psm2=yes

Compilation and Installation

To compile and install the already configured libfabric package, run:

$ make && make install

Note: If the library is installed in a non-default location via the --prefix configuration parameter, modify the LD_LIBRARY_PATH.

Validate installation

The fi_info utility can be used to validate the libfabric and provider installation, as well as provide details about provider support and available interfaces. See the fi_info man page for details on using the fi_info utility. fi_info is installed as part of the libfabric package.

A more comprehensive test suite is available via the fabtests software package. Also, fi_pingpong, which is a Ping-Pong test for transmitting data between two processes may be used for validation purposes as well.

Readme

For further Information on building and installing libfabric, refer to the libfabric readme file.

OFI provider requirements

Intel MPI Library requires that the underlying OFI provider support the following features:

Endpoint types:

  • FI_EP_RDM

Capabilities:

  • FI_MSG, FI_SEND, FI_RECV, FI_MULTI_RECV (to support Active Message protocol used for 1-sided and 2-sided operations)
  • FI_TAGGED, FI_SEND, FI_RECV (to support tag-matching for MPI 2-sided)
  • FI_RMAFI_REMOTE_READ, FI_REMOTE_WRITE, FI_WRITE, FI_READ (to support large messages and MPI 1-sided)
  • FI_ATOMIC (to support MPI 1-sided)
  • FI_DIRECTED_RECV (to support not putting the source address in the match bits)

Other:

  • FI_RM_ENABLED
  • Counters
  • Scalable endpoints (for Intel MPI Library 2019 MT)

Intel MPI Library is capable to work with the following:

Modes:

  • FI_CONTEXT
  • FI_ASYNC_IOV
  • FI_RX_CQ_DATA

 Memory Registration modes:

  • FI_MR_BASIC
  • FI_MR_SCALABLE

Address Vector types:

  • FI_AV_TABLE
  • FI_AV_MAP

Each OFI provider is built as a separate dynamic library to ensure that the single libfabric library can be run on top of different network adapters. Set the FI_PROVIDER_PATH environment variable to specify the path to provider's libraries. 

OFI providers support

The Intel MPI Library 2019 supports the following OFI providers:

psm2

The PSM21 provider runs over the PSM 2.x interface supported by the Intel® Omni-Path Fabric. PSM 2.x has all the PSM 1.x features plus a set of new functions with enhanced capabilities. Since PSM 1.x and PSM 2.x are not application binary interface (ABI) compatible, the PSM2 provider works with PSM 2.x only and does not support Intel True Scale Fabric.

The provider uses the following runtime parameters:

Name

Description

FI_PSM2_INJECT_SIZE

Define the maximum message size allowed for fi_inject and fi_tinject calls.  The default value is 64.

FI_PSM2_LAZY_CONN

Control the connection mode established between PSM2 endpoints that OFI endpoints are built on top of. When set to 0 (eager connection mode), connections are established when addresses are inserted into the address vector. When set to 1 (lazy connection mode), connections are established when addresses are used the first time in communication.

Note: Lazy connection mode may reduce the start-up time on large systems at the expense of higher data path overhead.

See the fi_psm2(7) main page for more details.

psm

The PSM1 provider runs over the PSM 1.x interface that is currently supported by the Intel TrueScale Fabric. PSM provides tag-matching message queue functions optimized for MPI implementations. PSM also has a limited Active Message support, which is not officially published but is quite stable and well-documented in the source code (a part of the OFED release). The PSM provider makes use of both the tag-matching message queue functions and the Active Message functions to support a variety of libfabric data transfer APIs, including the tagged message queue, message queue, RMA, and atomic operations.

The provider uses the following runtime parameters:

Name

Description

FI_PSM_TAGGED_RMA

The RMA functions are implemented on top of the PSM Active Message functions. The Active Message functions have a limit on the size of data to be transferred in a single message. Large transfers can be divided into small chunks and can be pipe-lined. However, in this case, the bandwidth is sub-optimal.

The psm provider uses PSM tag-matching message queue functions to achieve higher bandwidth for large size RMA. For this purpose, a bit is reserved from the tag space to separate the RMA traffic from the regular tagged message queue.

The option is enabled by default. To turn it off, set the variable to 0.

FI_PSM_AM_MSG

The psm provider implements the non-tagged message queue over the PSM tag-matching message queue. One tag bit is reserved for this purpose. Alternatively, the non-tagged message queue can be implemented over Active Message. This experimental feature has a slightly larger latency.

This option is disabled by default. To turn it on, set the variable to 1.

See the fi_psm(7) main page for more details.

Note:

In order to use IMPI 2019 over the OFI/PSM provider, the following variables must be set:

FI_PSM_TAGGED_RMA=0
FI_PSM_AM_MSG=1

sockets

The sockets1 provider is a general purpose provider that can be used on any system that supports TCP sockets. The provider is not intended to provide performance improvements over regular TCP sockets, but rather to allow developers to write, test, and debug application code even on platforms that do not have high-performance fabric hardware. The sockets provider supports all libfabric provider requirements and interfaces.

The provider uses the following runtime parameters:

Name

Description

FI_SOCKETS_IFACE

Define the prefix or the name of the network interface. By default, it uses any.

See the fi_sockets(7) main page for more details.

verbs

The verbs1 provider enables applications using OFI to be run over any verbs hardware (InfiniBand*, iWarp*, and so on). It uses the Linux Verbs API for network transport and provides a translation of OFI calls to appropriate verbs API calls. It uses librdmacm for communication management and libibverbs for other control and data transfer operations.

The verbs provider uses the RxM utility provider to emulate FI_EP_RDM endpoint over verbs FI_EP_MSG endpoint by default. 

The provider uses the following runtime parameters:

Name

Description

FI_VERBS_INLINE_SIZE

Define the maximum message size allowed for fi_inject and fi_tinject calls.  The default value is 64.

FI_VERBS_IFACE

Define the prefix or the full name of the network interface associated with the verbs device. By default, it is ib.

FI_VERBS_MR_CACHE_ENABLE

Enable Memory Registration caching. The default value is 0.

Note: Set the value to 1 to increase bandwidth for medium and large messages.

See the fi_verbs(7) main page for more details.

Dependencies

The verbs provider requires libibverbs (v1.1.8 or newer) and librdmacm (v1.0.16 or newer). RDMA Connection Manager requires:

If you are compiling libfabric from source and want to enable verbs support, it is essential to have the matching header files for the above two libraries. If the libraries and header files are not in default paths, specify them in the CFLAGS, LDFLAGS, and LD_LIBRARY_PATH environment variables.

RxM

The RxM1 (RDM over MSG) provider (ofi_rxm) is a utility provider that supports FI_EP_RDM endpoint emulated over FI_EP_MSG endpoint of the core provider.

RxM provider requires the core provider to support the following features:

  • MSG endpoints (FI_EP_MSG)
  • FI_MSG transport (to support data transfers)
  • FI_RMA transport (to support rendezvous protocol for large messages and RMA transfers)
  • FI_OPT_CM_DATA_SIZE of at least 24 bytes

The provider uses the following runtime parameters:

Name

Description

FI_OFI_RXM_BUFFER_SIZE

Define the transmit buffer size/inject size. Messages of smaller size are transmitted via an eager protocol. Messages of greater size are transmitted via SAR or rendezvous protocol. Transmitted data is copied up to the specified size. By default, the size is 16k.  

FI_OFI_RXM_USE_SRX

Control the RxM receive path. If the variable is set to 1, the RxM uses Shared Receive Context of the core provider. The default value is 0.

Note: This mode improves memory consumption, but may increase small message latency as a side-effect.

FI_OFI_RXM_SAR_LIMIT

Сontrol the RxM SAR (Segmentation аnd Reassembly) protocol. Messages of greater size are transmitted via rendezvous protocol. The default value is 256 Kb.

See the fi_rxm(7) main page for more details.

Fabric selection

The libfabric is used as a fabric for both intra-node and inter-node communications (default). You can expose the I_MPI_FABRICS environment variable to change this behavior.

Syntax
export I_MPI_FABRICS=[shm:]ofi

To select the OFI provider from the libfabric library, use the FI_PROVIDER environment variable, which defines the name of the OFI provider to load. The default: I_MPI_FABRICS=shm:ofi

Syntax
export FI_PROVIDER=<name>

where <name> is the OFI provider to load. You can specify the OFI with the I_MPI_OFI_PROVIDER environment variable.

References

  1. Open Fabrics Initiative Working Group
  2. Paul Grun, Sean Hefty, Sayantan Sur, David Goodell, Robert D. Russell, Howard Pritchard, Jeffrey M. Squyres, "A Brief Introduction to the OpenFabrics Interfaces"
For more complete information about compiler optimizations, see our Optimization Notice.