We have a client-server application using MPI_Comm_connect and MPI_Comm_Accept - it works most of the time, but intermittently crashes... On the server side we call MPI_Open_port, transfer the port name to the client side, then call MPI_Comm_accept. On the client side we received the port information, then call MPI_Comm_connect.
Occasionally we find that MPI_Comm_connect on the client crashes - there is no error provided. We are unable to use try/catch loops to dig further (complicated setup using Fortran calling C++ - unresolved globals if we try to use exception handling).
We think there may be a race condition due to how the client and server apps are being started (we are not starting them with mpiexec - also a long story) - if the client calls connect before the server calls accept will this cause a crash? If so, has this been fixed (to give an error message, which we can trigger upon and retry)? Would setting an MPI error handler help to handle the error in a controlled way?
Note the following notes in MPICH2
which refers to this same problem and a fix...
All help appreciated - thanks!