MPI Crash (Hydra) in FMS

MPI Crash (Hydra) in FMS

Gilad Berman的头像

Hello,

I'm running FMS application (http://www.gfdl.noaa.gov/fms) and some of the runs fail with the following error -

[proxy:0:1@n04] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:70): assert (!(pollfds[i].revents & ~POLLIN & ~POLLOUT & ~POLLHUP)) failed
[proxy:0:1@n04] main (./pm/pmiserv/pmip.c:387): demux engine error waiting for event
[mpiexec@n01] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:101): one of the processes terminated badly; aborting
[mpiexec@n01] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:18): bootstrap device returned error waiting for completion
[mpiexec@n01] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:521): bootstrap server returned error waiting for completion
[mpiexec@n01] main (./ui/mpich/mpiexec.c:548): process manager error waiting for completion
set date_name = `$time_stamp -eh

Please note that some of the runs are successful so i'm aware that this might not be MPI issue. setting I_MPI_DEBUG to 3 do not provide additional useful information. any idea how i can find the reason for this failure? any debug tips? some env parameters that might help?
I tried running with I_MPI_FABRICS "shm:tcp", same result.  

thx in advance!  

2 帖子 / 0 new
最新文章
如需更全面地了解编译器优化,请参阅优化注意事项
James Tullos (Intel)的头像

Hi Gilad,

You also submitted this issue to Intel® Premier Support, and it is being handled there.  I'm noting this for others who see this thread.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

登陆并发表评论。