Erroneus [pmi_proxy] <defunct> left behind

Erroneus [pmi_proxy] <defunct> left behind

My application makes heavy use of MPI_Comm_spawn calls to dynamically create and abandon processes.

I am using Intel(R) MPI Library for Linux* OS, Version 4.1 Update 1 Build 20130522 on a Linux Cluster environment.

Each subsequent call of MPI_Comm_spawn unfortunately leaves a

 [pmi_proxy] <defunct>

process behind, even if the subprocess has finished normally. These processes will be killed when the whole application finishes. They do not take in any resources. Since I make about 2000 MPI_Comm_spawn calls, these can become a serious and hard to detect bug if the OS reaches its file handle limit.

Searching the Web gives certain results on the mpich bug tracker, namely ticket 670 and 1504 (spam filter prevents me from posting convenient links) and the mpich discussion board:

http://lists.mpich.org/pipermail/discuss/2013-March/000515.html

Could this still be an issue in the hydra implementation used by intel mpi?

Thank you very much for your help!

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi,

Thank you for the message.

Please submit the ticket against this issue on Intel(R) Premier Support.

--

Dmitry

Seems this issue still persists for intel MPI 5.0.3.048

are there any workarounds to fix the issue. I'm also spawning lots of mpi processes dynamicall and it will hit the ulimit -u

Thanks.

 

Leave a Comment

Please sign in to add a comment. Not a member? Join today