MPI Crash (Hydra) in FMS

MPI Crash (Hydra) in FMS

Portrait de Gilad Berman

Hello,

I'm running FMS application (http://www.gfdl.noaa.gov/fms) and some of the runs fail with the following error -

[proxy:0:1@n04] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:70): assert (!(pollfds[i].revents & ~POLLIN & ~POLLOUT & ~POLLHUP)) failed
[proxy:0:1@n04] main (./pm/pmiserv/pmip.c:387): demux engine error waiting for event
[mpiexec@n01] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:101): one of the processes terminated badly; aborting
[mpiexec@n01] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:18): bootstrap device returned error waiting for completion
[mpiexec@n01] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:521): bootstrap server returned error waiting for completion
[mpiexec@n01] main (./ui/mpich/mpiexec.c:548): process manager error waiting for completion
set date_name = `$time_stamp -eh

Please note that some of the runs are successful so i'm aware that this might not be MPI issue. setting I_MPI_DEBUG to 3 do not provide additional useful information. any idea how i can find the reason for this failure? any debug tips? some env parameters that might help?
I tried running with I_MPI_FABRICS "shm:tcp", same result.  

thx in advance!  

2 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.
Portrait de James Tullos (Intel)

Hi Gilad,

You also submitted this issue to Intel® Premier Support, and it is being handled there.  I'm noting this for others who see this thread.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Connectez-vous pour laisser un commentaire.