MPI program behavior on node crash ...

MPI program behavior on node crash ...

imagem de Na Na

In the production environment, it happens that some nodes crash once in a while. What's the behavior of Intel's MPI when an MPI program encounters lost contact of some of its processes? Would there be any difference if the node crashed contains rank 0? Is there any option of Intel's MPI to control the behavior of such situation so that the program will be cleaned up in case one of the MPI processesis lost?Thank you very much,Tofu

2 posts / 0 new
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.
imagem de James Tullos (Intel)

Hi Tofu,

If a node containing a process crashes, the entire job will end. You can use the -cleanup option (or I_MPI_HYDRA_CLEANUP) to create a temporary file that will list the PID of each process, and the mpicleanup utility will use this file to clean the environment if the job does not end correctly. You can also use I_MPI_MPIRUN_CLEANUP if you are using MPD instead of Hydra.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Faça login para deixar um comentário.