What DAPL* versions does the Intel® MPI Library support? The Intel® MPI Library uses the Direct Access Programming Library (DAPL*) as a fabric independent API to run on fast interconnects like InfiniBand* or Myrinet*. Currently The Intel MPI Library supports DAPL* version 1.2 as well as DAPL* version 2.0-capable providers. The Intel MPI Library automatically determines the version of DAPL* standard to which the provider conforms.
How to select one DAPL* provider out of many others? Look into the /etc/dat.conf file or its equivalent on your system. Note that some vendors put this file into a directory different from /etc and may name this file differently. Consult your DAPL* provider documentation to clarify this.
Every line of the aforementioned dat.conf file that does not start with a hash "# " sign describes a DAPL* provider. The first word in the respective line is the provider identifier you should look for. Add a colon sign ":" and this word to the I_MPI_DEVICE environment variable value when selecting the rdma and rdssm devices. For example, if the first word is "OpenIB-cma0", set the I_MPI_DEVICE variable to "rdma:OpenIB-cma0" or "rdssm:OpenIB-cma0".
How to run MPI programs over InfiniBand*? To use InfiniBand*, do the following:
- Install and configure your InfiniBand hardware. See your InfiniBand hardware vendor's documentation for details.
- Install and configure your InfiniBand software, drivers, and DAPL* provider. See your Infin iBand software vendor's documentation for details. Alternatively, if you do not have a vendor-supplied InfiniBand DAPL* software, download and install an OpenFabrics* Enterprise Distribution† (OFED*). Use the OFED-1.4 or higher.
- Test that the installed InfiniBand hardware, software, and DAPL provider work as expected, independent of the Intel MPI Library.
- Make sure that the directory containing libdat.so is either listed in /etc/ld.so.conf (and that ldconfig has been run) or is listed in LD_LIBRARY_PATH.
- Intel MPI Library selects the most appropriate fabric combination automatically. Set I_MPI_DEVICE =<device>: <provider> (for example, I_MPI_DEVICE=rdma:OpenIB-cma0) to select InfiniBand explicitly.
How to run MPI programs over Myrinet*? To use Myrinet*, do the following:
- Install and configure your Myrinet hardware. See your Myrinet hardware vendor's documentation for details.
- Install and configure your Myrinet software, drivers, and DAPL* provider. See your Myrinet software vendor's documentation for details. Alternatively, if you do not have a vendor-supplied Myrinet DAPL software, download and install the open source DAPL* provider for Myrinet*†.
- Test that the installed Myrinet hardware, software, and DAPL provider work as expected, independent of the Intel MPI Library.
- Make sure that the directory containing libdat.so is either listed in /etc/ld.so.conf (and that ldconfig has been run) or is listed in LD_LIBRARY_PATH.
- Make sure that an entry for the DAPL provider (libdapl.so) exists, with the correct library path, in /etc/dat.conf.
- When executing a MPI program, use the mpiexec -gm -mx options or set I_MPI_DEVICE= <device>: <provider> (for example, I_MPI_DEVICE=rdma:GmHca0) to activate Myrinet.
See the Intel MPI Library Reference Manual for more details
What does the <provider> field mean in the Intel MPI device specification rdma[: <provider>] or rdssm[: <provider>]? Use this feature to utilize a particular provider instead of the first valid provider described in the dat.conf file. The default location of the configuration file is /etc/dat.conf.
For example, you have a dat.conf file:
# # Generic DAT configuration file # # This is a DAPL provider configuration for InfiniBand fast fabric OpenIB-cma0 u1.2 nonthreadsafe default /opt/ofed/lib64/libdaplcma.so mv_dapl.1.2 "ib0 0" "" # This is a DAPL provider configuration for Myrinet fast fabric GmHca0 u1.2 nonthreadsafe default /Myrinet_DAPL_Providers/libdapl.so gm_dapl.1.2 "GmHca0 0" ""
Use the following command to utilize the Myrinet* fabric instead of the default InfiniBand* fabric:
mpiexec -np $numproc -env I_MPI_DEVICE rdma:GmHca0 $application $app_args
How to verify which device/provider is used for communication? Set the I_MPI_DEBUG environment variable to two. The Intel MPI Library will report what device/provider is in use.
For example:
mpiexec -np numproc -env I_MPI_DEBUG 2 $executable
How to specify an alternative path to the DAPL* configuration file dat.conf? Use the DAT_OVERRIDE environment variable to override the default location of the dat.conf file.
I got the ("CMA: unable to open/dev/infiniband/rdma cm") error message while using the Intel MPI Library over OFED*. How do I fix it? Some of the nodes are not loading rdma_cm modules. Use "modprode rdma_ucm" to start the modules on the nodes with errors to eliminate this problem?
Why do I get the following error message "librdmacm: kernelABI version 4 does not match library version 2" while starting the Intel MPI Library over InfiniBand*? Update the librdmacm library to avoid this issue. Download it from http://www.openfabrics.org† as a separate package or a part of the OFED* distribution.
Why do I get DAT_INTERNAL_ERROR or DAT_INVALID_ADDRESS errors during Intel MPI application start over Infiniband*? Make sure you have properly configured IP addresses for InfiniBand*. It is required for proper functioning of the OpenSM subnet manager.
How to disable DAPL* provider versions compatibility check at runtime? Set the I_MPI_CHECK_DAPL_PROVIDER_MISMATCH environment variable to none.
Operating System:
SUSE* Linux Enterprise Server 10, Red Hat* Enterprise Linux 5.0, SUSE* Linux Enterprise Server 9, Red Hat* Enterprise Linux 4.0
|