Recover from crash

Recover from crash

Hi

I'm devloping an MPI application on a single CPU shared memory machine, and sometimes after a crash, I can't start my program again. The message I get is shown below. I've tried terminating all the MPI processes and restarting the service, but the only way I've found to get going again is to reboot the machine. Is there another way to recover without rebooting?

c:\Users\John\Documents\xyz\DP>mpiexec -n 3 -l -mapall ..\2009\xyz_dbg_64 paralleldp
op_read error on left context: generic socket failure, error stack:
MPIDU_Sock_wait(2815): The specified network name is no longer available. (errno 64)
unable to read the cmd header on the left context, generic socket failure, error stack:
MPIDU_Sock_wait(2815): The specified network name is no longer available. (errno 64).
mpiexec aborting job...
several ^C to get DOS prompt back

4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

The standard way to clean up with Intel MPI or MPICH2 is mpdallexit, after which mpdboot or mpirun should work.

Quoting - tim18

The standard way to clean up with Intel MPI or MPICH2 is mpdallexit, after which mpdboot or mpirun should work.

Hello,

I meet the same problem in Windows XP platform. I think it should be something wrong with the -mapall and -map option for mpiexec in windows platform. Since the mpiallexit only exists in Linux platform. There are no helpful at all.

If someone can give some useful information, it would be great.

Quoting - xuy3@psu.edu

Hello,

I meet the same problem in Windows XP platform. I think it should be something wrong with the -mapall and -map option for mpiexec in windows platform. Since the mpiallexit only exists in Linux platform. There are no helpful at all.

If someone can give some useful information, it would be great.

Could you please try to use "mpdkilljob -a". These commands (mpdallexit and mpdkilljob) doesn't always work. Sometimes it's impossible to get information about MPD ring.

Leave a Comment

Please sign in to add a comment. Not a member? Join today