mpdboot for a cluster

mpdboot for a cluster

Dear all,I have following lines in my job script:# Number of cores:#SBATCH --nodes=8 --ntasks-per-node=8## Set up job environmentsource/site/bin/jobsetup#startmpd:mpdboot -n 64## Run the program:../bin/sem3dcaf ../input/test_nproc64_sf2.psemI amtryingto run a coarray program with 64 cores (8cores per node). But I could not correctly start mpd. I am new to intel cluster toolkit. I would be grateful for the suggestion.Thanks.

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi homng,

Seems that it would be nice to read Getting Started from Intel MPI Library documentation.

mpdboot create an mpd ring and '-n' says how many nodes will be used. In case of 'mpdboot -n 12' mpd ring will be created using 12 nodes - doesn't matter how many cores on each.

So, you need to change your script:
#startmpd:mpdboot -n 8

To successfully create an mpd ring you need to have passwodless ssh connection. You can check it by:
$ ssh node_one
From node_one:
$ ssh node_two
and so on.

Usually mpdboot gets list of nodes from mpd.hosts file located in the currect directory, but you can use '-f' option.

mpdtrace shows you a list of nodes in the ring.

I assumed that environment was set up properly.


Thank you for the suggestion. I went through the documentation but could not really figure out the way to start the mpd from job script. For example, I tried following job script:
#SBATCH --nodes=2 --ntasks-per-node=8## Set up job environmentsource/site/bin/jobsetup#startmpd:mpdboot -n 2But I get the following errors:totalnum=2 numhosts=1there are not enough hosts on which to start all processetotalnum=2 numhosts=1there are not enough hosts on which to start all processesPossible cause would be that other node is not visible, butI confirmed that I can ssh to all the nodes without password. I can run ordinary MPI program without problem using"mpirun -n 16 test"in the job script. But of course I cannot define the mpd.hosts file because in the big cluster I don't know in advance which nodes will be allocated for my job. I think I am missing some key points!Your help is greatly appreciated.Thanks

Finally, I think I solved the problem. Thanks.

If you run your application under any job scheduler you need to use 'mpirun -n #_of_processes program' because mpirun understands scheduler's settings readding appropriate environment variables.

In case of co-array, it seems to me that compiler creates a config file which is used by a program itself and you don't need to create an mpd ring. I believe that people from Fortran forum may provide more information about co-array programs invocations.


Leave a Comment

Please sign in to add a comment. Not a member? Join today