Experience with various interconnects and DAPL* providers


What DAPL* versions does the Intel® MPI Library support?
The Intel® MPI Library uses the Direct Access Programming Library (DAPL*) as a fabric independent API to run on fast interconnects like InfiniBand* or Myrinet*. Currently The Intel MPI Library supports DAPL* version 1.2 as well as DAPL* version 2.0-capable providers. The Intel MPI Library automatically determines the version of DAPL* standard to which the provider conforms.


How to select one DAPL* provider out of many others?
Look into the /etc/dat.conf file or its equivalent on your system. Note that some vendors put this file into a directory different from /etc and may name this file differently. Consult your DAPL* provider documentation to clarify this.

Every line of the aforementioned dat.conf file that does not start with a hash "# " sign describes a DAPL* provider. The first word in the respective line is the provider identifier you should look for. Add a colon sign ":" and this word to the I_MPI_DEVICE environment variable value when selecting the rdma and rdssm devices. For example, if the first word is "OpenIB-cma0", set the I_MPI_DEVICE variable to "rdma:OpenIB-cma0" or "rdssm:OpenIB-cma0".


How to run MPI programs over InfiniBand*?
To use InfiniBand*, do the following:

  1. Install and configure your InfiniBand hardware. See your InfiniBand hardware vendor's documentation for details.
  2. Install and configure your InfiniBand software, drivers, and DAPL* provider. See your Infin iBand software vendor's documentation for details. Alternatively, if you do not have a vendor-supplied InfiniBand DAPL* software, download and install an OpenFabrics* Enterprise Distribution† (OFED*). Use the OFED-1.4 or higher.
  3. Test that the installed InfiniBand hardware, software, and DAPL provider work as expected, independent of the Intel MPI Library.
  4. Make sure that the directory containing libdat.so is either listed in /etc/ld.so.conf (and that ldconfig has been run) or is listed in LD_LIBRARY_PATH.
  5. Intel MPI Library selects the most appropriate fabric combination automatically. Set I_MPI_DEVICE =<device>: <provider> (for example, I_MPI_DEVICE=rdma:OpenIB-cma0) to select InfiniBand explicitly.

How to run MPI programs over Myrinet*?
To use Myrinet*, do the following:

  1. Install and configure your Myrinet hardware. See your Myrinet hardware vendor's documentation for details.
  2. Install and configure your Myrinet software, drivers, and DAPL* provider. See your Myrinet software vendor's documentation for details. Alternatively, if you do not have a vendor-supplied Myrinet DAPL software, download and install the open source DAPL* provider for Myrinet*†.
  3. Test that the installed Myrinet hardware, software, and DAPL provider work as expected, independent of the Intel MPI Library.
  4. Make sure that the directory containing libdat.so is either listed in /etc/ld.so.conf (and that ldconfig has been run) or is listed in LD_LIBRARY_PATH.
  5. Make sure that an entry for the DAPL provider (libdapl.so) exists, with the correct library path, in /etc/dat.conf.
  6. When executing a MPI program, use the mpiexec -gm -mx options or set I_MPI_DEVICE= <device>: <provider> (for example, I_MPI_DEVICE=rdma:GmHca0) to activate Myrinet.

 

See the Intel MPI Library Reference Manual for more details


What does the <provider> field mean in the Intel MPI device specification rdma[: <provider>] or rdssm[: <provider>]?
Use this feature to utilize a particular provider instead of the first valid provider described in the dat.conf file. The default location of the configuration file is /etc/dat.conf.

For example, you have a dat.conf file:

#
# Generic DAT configuration file
#
# This is a DAPL provider configuration for InfiniBand fast fabric
OpenIB-cma0 u1.2 nonthreadsafe default /opt/ofed/lib64/libdaplcma.so mv_dapl.1.2 "ib0 0" ""
# This is a DAPL provider configuration for Myrinet fast fabric
GmHca0 u1.2 nonthreadsafe default /Myrinet_DAPL_Providers/libdapl.so gm_dapl.1.2 "GmHca0 0" ""

 

Use the following command to utilize the Myrinet* fabric instead of the default InfiniBand* fabric:

mpiexec -np $numproc -env I_MPI_DEVICE rdma:GmHca0 $application $app_args

 

How to verify which device/provider is used for communication?
Set the I_MPI_DEBUG environment variable to two. The Intel MPI Library will report what device/provider is in use.

For example:

mpiexec -np numproc -env I_MPI_DEBUG 2 $executable

 

How to specify an alternative path to the DAPL* configuration file dat.conf?
Use the DAT_OVERRIDE environment variable to override the default location of the dat.conf file.


I got the ("CMA: unable to open/dev/infiniband/rdma cm") error message while using the Intel MPI Library over OFED*. How do I fix it?
Some of the nodes are not loading rdma_cm modules. Use "modprode rdma_ucm" to start the modules on the nodes with errors to eliminate this problem?


Why do I get the following error message "librdmacm: kernelABI version 4 does not match library version 2" while starting the Intel MPI Library over InfiniBand*?
Update the librdmacm library to avoid this issue. Download it from http://www.openfabrics.org† as a separate package or a part of the OFED* distribution.


Why do I get DAT_INTERNAL_ERROR or DAT_INVALID_ADDRESS errors during Intel MPI application start over Infiniband*?
Make sure you have properly configured IP addresses for InfiniBand*. It is required for proper functioning of the OpenSM subnet manager.


How to disable DAPL* provider versions compatibility check at runtime?
Set the I_MPI_CHECK_DAPL_PROVIDER_MISMATCH environment variable to none.


Operating System:

SUSE* Linux Enterprise Server 10, Red Hat* Enterprise Linux 5.0, SUSE* Linux Enterprise Server 9, Red Hat* Enterprise Linux 4.0

 


Para obter informações mais completas sobre otimizações do compilador, consulte nosso aviso de otimização.