Random fabric errors on Red Hat Enterprise Linux* 5.4

By Jeremy C Siadal, Published: 12/02/2009, Last Updated: 12/02/2009

Problem:

The Intel® MPI Library fails intermittently when run over the RDSSM or RDMA devices. Approximately 5-10% of runs fail on RHEL (Red Hat Enterprise Linux) 5.4, but this problem does not occur on earlier versions of RHEL.

When reviewing the debug output, the following error is seen during the Intel MPI Library operations:

setup_listener Cannot assign requested address

 

Environment:

 

Red Hat Enterprise Linux 5.4 only

Root Cause:

This error occurs with the Intel MPI Library and the version of OFED (Open Fabrics Enterprise Distribution) included with RHEL 5.4. There is a potential port space conflict with RDS (reliable datagram sockets) and when this port space conflict occurs, uDAPL does not resolve it correctly.

By default, the Intel MPI Library uses its process ID to define its port number. In RHEL 5.4, the process ID can occasionally match a port number that the RDS driver has already allocated, which creates a port space conflict. Currently, uDAPL will reply with the wrong return code to the Intel MPI Library and communication will fail.

Resolution:

As a temporary workaround, set the following environment variable on all nodes:

$ export I_MPI_RDMA_CREATE_CONN_QUAL = 0

After setting this variable, the Intel MPI Library will not define its port number from its process ID.

This error is resolved in DAPL 2.0.25, to be included in Open Fabrics Enterprise Distribution (OFED) 1.5.  Status of the resolution can be found in the latest OFED release notes.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804