mpd error

mpd error

Hello,

I get the following error on my cluster when I submit jobs

mpiexec_node050: cannot connect to local mpd (/tmp/mpd2.console_sudharshan); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)

While I see that this error has been discussed in the threads before, what I see is that the error pops up quite unpredictably. While my job runs fine with a particular number of processors, and when I submit it again with a different number of processors, this error comes up. It is not clear under what conditions I get this issue. I have been getting this error for the same number of processors with which I have been able to run jobs fine, with the same scripts and with the same code. Any siggestion/help shall be sincerely appreciated.

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Quoting sudh
While my job runs fine with a particular number of processors, and when I submit it again with a different number of processors, this error comes up. It is not clear under what conditions I get this issue. I have been getting this error for the same number of processors with which I have been able to run jobs fine, with the same scripts and with the same code. Any siggestion/help shall be sincerely appreciated.

Before you execute mpiexec command, does mpdtrace show list of all the nodes on which you want to run your job?

Are you using -machinefile option in your mpiexec command?

Hi sudh,

Could you provide command line and library version?

Regards!

Dmitry

Leave a Comment

Please sign in to add a comment. Not a member? Join today