Developer Guide

Contents

Error Message: Fatal Error

Case 1

Error Message
Abort(1094543) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack: MPIR_Init_thread(653)......: MPID_Init(860).............: MPIDI_NM_mpi_init_hook(698): OFI addrinfo() failed (ofi_init.h:698:MPIDI_NM_mpi_init_hook:No data available)
Cause
The current provider cannot be run on these nodes. The MPI application is run over the
psm2
provider on the non-Intel® Omni-Path card or over the
verbs
provider on the non-InfiniBand*, non-iWARP, or non-RoCE card.
Solution
  1. Change the provider or run MPI application on the right nodes. Use
    fi_info
    to get information about the current provider.
  2. Check if services are running on nodes (opafm for Intel® Omni-Path and opensmd for InfiniBand).

Case 2

Error Message
Abort(6337423) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack: … MPIDI_OFI_send_handler(704)............: OFI tagged inject failed (ofi_impl.h:704:MPIDI_OFI_send_handler:Transport endpoint is not connected)
Cause
OFI transport uses IP interface without access to remote ranks.
Solution
Set
FI_SOCKET_IFACE
If the
socket
provider is used or
FI_TCP_IFACE
and
FI_VERBS_IFACE
in case of
TCP
and
verbs
providers, respectively. To retrieve the list of configured and active IP interfaces, use the
ifconfig
utility.

Case 3

Error Message
Abort(6337423) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack: … MPIDI_OFI_send_handler(704)............: OFI tagged inject failed (ofi_impl.h:704:MPIDI_OFI_send_handler:Transport endpoint is not connected)
Cause
Ethernet is used as an interconnection network.
Solution
Run
FI_PROVIDER = sockets mpirun …
to overcome this problem.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.