Intel® MPI Library

Examples of MPI Failures

This section provides examples of typical MPI failures including descriptions, output messages, and related recommendations. The following problems which may cause MPI failures are discussed in this section:

  • Communication problems

  • Environmental problems

  • Other problems

Prerequisite Steps

Before you start using any of the Intel® MPI Library functionality, make sure to establish the proper environment for Intel MPI Library. Follow these steps:

  1. Set up the Intel MPI Library environment. Source the mpivars.[c]sh script:

DDT* Debugger

You can debug MPI applications using the Allinea* DDT* debugger. Intel does not provide support for this debugger, you should obtain the support from Allinea*. According to the DDT documentation, DDT supports the Express Launch feature for the Intel® MPI Library. You can debug your application as follows:

$ ddt mpirun -n <# of processes> [<other mpirun arguments>] <executable>

If you have issues with the DDT debugger, refer to the DDT documentation for help.

Communication Problems

Communication problems with the Intel® MPI Library are usually caused by a signal termination (SIGTERM, SIGKILL, or other signals). Such terminations may be due to a host reboot, receiving an unexpected signal, out-of-memory (OOM) manager errors and others.

To deal with such failures, you need to find out the reason for the MPI process termination (for example, by checking the system log files).

Using -gtool for Debugging

The -gtool runtime option can help you with debugging, when attaching to several processes at once. Instead of attaching to each process individually, you can specify all the processes in a single command line. For example:

$ mpirun -n 16 -gtool "gdb:3,5,7-9=attach" ./myprog

The command line above attaches the GNU* Debugger (GDB*) to processes 3, 5, 7, 8 and 9.

Environment Problems

Environmental errors may happen when there are problems with the system environment, such as mandatory system services are not running, shared resources are unavailable and so on.

When you encounter environmental errors, check the environment. For example, verify the current state of important services.

Example 1

Symptom/Error Message

librdmacm: Warning: couldn't read ABI version.
librdmacm: Warning: assuming: 4
librdmacm: Fatal: unable to get RDMA device list

or:

Compiling an MPI Program

This topic describes the basic steps required to compile and link an MPI program, using the Intel® MPI Library SDK.

To simplify linking with MPI library files, Intel MPI Library provides a set of compiler wrapper scripts with the mpi prefix for all supported compilers. To compile and link an MPI program, do the following:

Statistics and Analysis

Intel® MPI Library provides a variety of options for analyzing MPI applications. Some of these options are available within the Intel MPI Library, while some require additional analysis tools. For such tools, Intel MPI Library provides compilation and runtime options and environment variables for easier interoperability.

See details in the following sections:

Subscribe to Intel® MPI Library