mpdboot take two via .. is correct?

mpdboot take two via .. is correct?

I run using mpirun version of intel mpi 4.0 and sometimes the process down with the message

[04:34 PM]claudial@cystorm1:Ba10P6O24F2$ cat nodelist
n202
n204
n21
n22
n23
n24
n25
n26
[04:34 PM]claudial@cystorm1:Ba10P6O24F2$ cat Ba10.celldynamic-30.out
running mpdallexit on n204
LAUNCHED mpd on n204 via
RUNNING: mpd on n204
LAUNCHED mpd on n202 via n204
LAUNCHED mpd on n26 via n204
LAUNCHED mpd on n25 via n204
LAUNCHED mpd on n24 via n204
RUNNING: mpd on n202
LAUNCHED mpd on n23 via n202
LAUNCHED mpd on n22 via n202
LAUNCHED mpd on n21 via n202
mpdboot_n204 (handle_mpd_output 846): mpdboot: can not get anything from the mpd daemon; please check connection to n22

This is a problem of mpirun option?

Regards

2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi JP,

Considering that mpirun launches the daemons successfully on nodes n204 and n202, it might be an issue with your connection to node n22. Can you verify the node is up and running (maybe via 'ping')? Also, make sure you can log into the node without being prompted for a password. For example, can you do:

ssh n22 hostname

from n204 (or any other node)?

Finally, make sure you don't have any security settings and/or firewalls preventing you to connect to the other nodes on your cluster.

Let me know what happens or if you have questions.

Regards,
~Gergana

Gergana Slavova
Technical Consulting Engineer
Intel® Cluster Tools
E-mail: gergana.s.slavova_at_intel.com

Leave a Comment

Please sign in to add a comment. Not a member? Join today