MPI doesn't work (Fatal error in MPI_Init)

MPI doesn't work (Fatal error in MPI_Init)

Hi,

I have the following problem:

I have two nodes and config file:

-n 1 -host node0 myapp
-n 1 -host node1 myapp

In this way it works fine. However If I change the order of lines in config to:

-n 1 -host node1 myapp
-n 1 -host node0 myapp

It fails with the error:

Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(658)................:
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(104)..................:
MPID_nem_tcp_post_init(344)..........:
MPID_nem_newtcp_module_connpoll(3102):
gen_cnting_fail_handler(1816)........: connect failed - The semaphore timeout period has expired.
 (errno 121)

job aborted:
rank: node: exit code[: error message]
0: node1: 1: process 0 exited without calling finalize
1: node0: 123

What can be the reason for? Any ideas?

2 帖子 / 0 全新
最新文章
如需更全面地了解编译器优化,请参阅优化注意事项

Hi Ivan,

Are you able to ssh from node0 to node1 and from node1 to node0?  Do the IP addresses of the nodes resolve identically between each node?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

登陆并发表评论。