Error Message: Fatal Error

Case 1

Error Message

Abort(1094543) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack: 
MPIR_Init_thread(653)......:
MPID_Init(860).............:
MPIDI_NM_mpi_init_hook(698): OFI addrinfo() failed
(ofi_init.h:698:MPIDI_NM_mpi_init_hook:No data available)

Cause

The current provider cannot be run on these nodes. The MPI application is run over the psm2 provider on the non-Intel® Omni-Path card or over the verbs provider on the non-InfiniBand*, non-iWARP, or non-RoCE card.

Solution

  1. Change the provider or run MPI application on the right nodes. Use FI_INFO to get information about the current provider.
  2. Check if services are running on nodes (opafm for Intel® Omni-Path and opensmd for InfiniBand).

Case 2

Error Message

Abort(6337423) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: 
Other MPI error, error stack:

MPIDI_OFI_send_handler(704)............: OFI tagged inject failed
(ofi_impl.h:704:MPIDI_OFI_send_handler:Transport endpoint is not connected)

Cause

OFI transport uses IP interface without access to remote ranks.

Solution

Set FI_SOCKET_IFACE If the socket provider is used or FI_TCP_IFACE and FI_VERBS_IFACE in case of TCP and verbs providers, respectively. To retrieve the list of configured and active IP interfaces, use, the ifconfig utility.

Case 3

Error Message

Abort(6337423) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: 
Other MPI error, error stack:

MPIDI_OFI_send_handler(704)............: OFI tagged inject failed
(ofi_impl.h:704:MPIDI_OFI_send_handler:Transport endpoint is not connected)

Cause

Ethernet is used as an interconnection network.

Solution

Run FI_PROVIDER = sockets mpirun … to overcome this problem.

For more complete information about compiler optimizations, see our Optimization Notice.
Select sticky button color: 
Orange (only for download buttons)