Master node issue

Issue

I have a cluster, which has for example 3 nodes, where the mpd.hosts file contains:

$ cat mpd.hosts
node2
node3

The master node, which is node1 is not listed in the mpd.hosts file. The shell commands:

$ mpdboot -r ssh -n 2 -f ~/mpd.hosts
$ mpiexec -n 4 /bin/hostname
which are issued from node1 reveal the following:
node1
node1
node2
node2

 

Solution

By design, the master node (i.e., the node where the mpiexec command is launched from) needs to have a daemon running. If you want to run your application on node2 and node3, you can use the -host options with the mpiexec command as follows:

$ mpdboot -r ssh -n 3 -f ~/mpd.hosts
$ mpiexec -n 2 -host node2 /bin/hostname : -n 2 -host node3 /bin/hostname
Or simply to use the -nolocal option for mpiexec as follows:
$ mpdboot -r ssh -n 3 -f ~/mpd.hosts
$ mpiexec -nolocal -n 4 /bin/hostname

This will run the hostname command on nodes node2 and node3 only, even though there are three mpd daemons total.

For more complete information about compiler optimizations, see our Optimization Notice.

1 comment

Top
anonymous's picture

When we run mpdboot command from master node itself it will start mpdboot on master node i.e. node1 and node2 not on node 3. It is not good to run mpdboot on master node and inform user to run job with mpiexece etc. By chance, if user run heavy mpi job, that will slow down access of master node and in turn entire cluster. How to stop running mpdboot on master node by any user ?

Regards
Jigar

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.