Failure to launch (ssh becomes zombie)

Failure to launch (ssh becomes zombie)

I have a problem where in about 1-2% of mpiruns (0.1-0.2% of ssh processes launched my mpiexec.hydra) one of the ssh processes fails to launch and becomes a zombie. As a consequence the overall process will hang forever.

With setenv I_MPI_DEBUG 1 and -verbose added to the mpirun command I get some information (see Bug.txt attached). The node that in this case failed to start is qnode0708, and if you wade through the file you will see no "Start PMI_proxy 5".

At this moment I do not know if this is an impi issue (version 4.1 is being used), a ssh race condition (this appears to be possible), something with the large cluster I am using or what. Two specific questions:

a) Has anyone seen anything like this?

b) Is there a way to launch with "ssh -v", which might be informative. I cannot find anything about how to do this.

N.B., 99.99% certain that this is nothing to do with the code being run, compilation or anything else. In fact the failure occurs equally for three different mpi executables which are very different.

AttachmentSize
Download bug.txt162.35 KB
4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

You can launch with ssh -v by putting this into a script and setting I_MPI_HYDRA_BOOTSTRAP_EXEC to point to this script.

Try using I_MPI_DEBUG=5 instead of I_MPI_DEBUG=1.  This will provide additional information.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Thanks for the comment. For reference, in case others run across the same problem please see the thread starting at http://lists.mindrot.org/pipermail/openssh-unix-dev/2013-July/031518.html and http://lists.mindrot.org/pipermail/openssh-unix-dev/2013-July/031527.html.

I was not able to trace the fault beyond localizing it to ssh/sshd on that system and I came to the conclusion that ssh is just not robust enough for some reason on the Quest computer at Northwestern. Since I don't have any rights to see any of the log files on that cluster, I gave up and replaced ssh by openmpi/rmpirun as the bootstrap. While this is an ugly hack, it has proved to be 100% reliable.

N.B., for the future I suggest the your hydra should check to see if the ssh process it has launched has become a zombie.

As an addendum, I now have a way to reproduce this issue and it is a "bug" since the end result is highly undesirable (mpi tasks running forever is the consequence). Curing the bug may not be trivial, and there are probably many ways to reproduce it which are not particularly user friendly.

To reproduce, arrange so that an impi task is being run on more than one node where a secondary node has a cooling problem and as a consequence oom-killer gets invoked to terminate the mpi task. For whatever reason (beyond my pay grade) this leaves the ssh connection as a zombie. The other nodes/cores do not know and will continue to run forever, probably sending requests to send/receive data which go into a black hole.

Leave a Comment

Please sign in to add a comment. Not a member? Join today