running Intel MPI 4.0 with SLURM

running Intel MPI 4.0 with SLURM

Hi,I have just installed intel mpi (v4.0.0.027) on a nahelem-IB based cluster system which uses the SLURM resource manager. All of the compilers and mpi stacks are installed using modules, including the intel mpi. After I load the intel-mpi module, build the application, and try to run it using a SLURM batch file, the program crashes, as the intel-mpi runtime environment does not obtain all of the SLURM environments. I get the message..<---mpiexec_rm1867: cannot connect to local mpd (/tmp/mpd2.console_apurkay); possible causes:1. no mpd is running on this host2. an mpd is running but was started without a "console" (-n option):--->It does not get the information on the compute nodes and tries instead to run on the login-console which is not the place to run, and hence fails.I assume then that the SLURM environments relating to the mpd.hosts file was not captured by the intel-mpi? If so, what runtime parameters or environments do need to pass/define in the slurm batch script?BTW, the default setup with OMPI/OpenFabrics and SLURM works fine.Thanks for any help.-- Avi

20 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Ali,

Intel MPI Library recognizes SLURM_JOBID, SLURM_NNODES, SLURM_NODELIST and some other environment variables. But you need to use mpirun to start your application. Only in this case mpd ring will be created.

You probably need to add '-nolocal' option because a node on which you start your application will be added to the ring automatically.

Regards!
Dmitry

Hi Dmitry,thanks for the pointers.I verified that the SLURM_NODELIST was recognized and echoed that variable inside the batch script.However when I run mpirun inside the slurm batch-script with the following options..mpirun -np 2 -nolocal ./a.outI am getting the following errors..<--Node list is rm[1203-1204]mpdboot_rm1203 (handle_mpd_output 905): from mpd on rm1204, invalid port info:connect to address 10.1.4.180 port 544: Connection refusedconnect to address 10.1.4.180 port 544: Connection refusedtrying normal rsh (/usr/bin/rsh)rm1204: Connection refused-->What am I missing? I can print out other relevant SLURM variables to obtain additional clues.Thanks-- Avi

Hi Avi,

Are you using rsh or ssh connection between nodes?
If you are using ssh you need to provide '-r ssh' option.
Please be sure that there is passwordless connection and you can login both from rm1203 to rm1024 and vice versa.

Regards!
Dmitry

Hi,

passwordless (better: passphraseless) ssh needs to be set up for each user, and they might be tempted to copy these keypairs to other systems to ease their login. It's advantageous to have a hostbased login instead:

http://gridengine.sunsource.net/howto/hostbased-ssh.html

-- Reuti

Hi,we are using passphraseless ssh, and have used it with our other MPI installs with SLURM. However when I tried running adding the '-r ssh' as an mpirun run option with the intel-mpi, I keep getting the same error message..<---Node list is rr[108,129]mpdboot_rr108 (handle_mpd_output 905): from mpd on rr129, invalid port info:connect to address 10.1.0.129 port 544: Connection refusedconnect to address 10.1.0.129 port 544: Connection refusedtrying normal rsh (/usr/bin/rsh)Node list is rr[108,129]mpdboot_rr108 (handle_mpd_output 905): from mpd on rr129, invalid port info:connect to address 10.1.0.129 port 544: Connection refusedconnect to address 10.1.0.129 port 544: Connection refusedtrying normal rsh (/usr/bin/rsh):-->I also looked at the 'mpirun --help', and did not see '-r ssh' as an option that was listed. In fact there were no options listed for using ssh.Thanks-- Avi

mpirun -help
......
--rsh specifies the name of the command used to start remote mpds; it
defaults to rsh; an alternative is ssh
--shell says that the Bourne shell is your default for rsh'
--verbose shows the ssh attempts as they occur; it does not provide
confirmation that the sshs were successful

mpirun -version
Intel MPI Library for Linux Version 4.0
Build 20100422 Platform Intel 64 64-bit applications

As Dmitry said, you must try stand-alone ssh in both directions among the offending nodes, from the relevant account, to guard against problems such as ~/.ssh/known_hosts containing stale information.

mpdboot (with the appropriate nodelist) followed by mpdtrace and mpdallexit can be used for one-time check on this problem without actually waiting for a chance to run the entire application.

I had set N=n=2 in the slurm script so one proc in each node were to communicate with each other. When I ranNode list is rm[1562-1563]% mpdboot-n 2 -v -r sshI got back..totalnum=2 numhosts=1there are not enough hosts on which to start all processesmpdtracempdtrace: cannot connect to local mpd (/tmp/mpd2.console_apurkay); possible causes:1. no mpd is running on this host2. an mpd is running but was started without a "console" (-n option)This may be the reason why mpirun is not running as mpd is not running, given that a compute node from where the job is launched is also the host and a compute node at the same time.Any suggestions for a fix?Thanks-- Avi

Avi,

By default, mpdboot is looking for mpd.hosts in the current directory to get information about nodes. Mpdboot doesn't recognize SLURM settings!
If you don't have mpd.hosts file use '-f hosts_file.txt'
In your case it might look like: 'mpdboot -f $SLURM_NODELIST -n 2 -r ssh'

Regards!
Dmitry

Hi Dimitry,I made the change that you suggested in the script file..% mpdboot -f $SLURM_NODELIST -n 2 -v -r ssh% mpdtraceUnfortunately the result is the same as before..:mpdboot hostliststotalnum=2 numhosts=1there are not enough hosts on which to start all processesmpdtracempdtrace: cannot connect to local mpd (/tmp/mpd2.console_apurkaya); possible causes:1. no mpd is running on this host2. an mpd is running but was started without a "console" (-n option)So it appears the key to the problem to be resolved is for recognition of the mpd.hosts file or equivalent by mpdboot, which is still not happening.Cheers-- Avi

Avi,

Could you print out $SLURM_NODELIST? (echo $SLURM_NODELIST)

Regards!
Dmitry

Avi,

Just one thought:
Do you run your commands after salloc? Or might be you use sbatch or srun commands?

Could you provide details about all commands used to start an application?

Regards!
Dmitry

I have actually been doing that. I may not have cut-pasted it on the messages on this thread. But here is a snippet from the batch script and the latest output file...In the batch script, I have ..echo "Node list is" $SLURM_NODELISTIn the output file, I get..Node list is rr[10,72]-- Avi

Dmitry,We use a batch slurm script using sbatch, srun and mpirun. We do not use salloc, although there is nothing to prevent us from using salloc.Here is a simple SLURM batch script that we have used. We use something like "% sbatch intel.batch" to submit the job where intel.batch looks something like..<---#!/bin/bash#SBATCH --time=01:00:00 # WALLTIME#SBATCH -N 2 # Number of nodes#SBATCH -n 2 # Number of cores/processors#SBATCH -o intel-mpi-%j.log#SBATCH -p pbatch#SBATCH --job-name intel-mpi-test # job nameecho "Node list is" $SLURM_NODELIST### testing mpd ###### module load intelMPI/4.0.0.027echo "mpdboot hostlists"mpdboot -f $SLURM_NODELIST -n 2 -v -r sshecho "mpdtrace"mpdtraceecho "mpdallexit"mpdallexit########mpirun -np 2 -nolocal --ssh ./a.out-->Cheers-- Avi

Avi,

and what is the output when you run "% sbatch intel.batch"?

Regards!
Dmitry

When the job is submitted, there is a normal response with a job id # coming back.rrlogin1<8>sbatch intel-rr.batchSubmitted batch job 3852rrlogin1<8>sbatch intel-rr.batchSubmitted batch job 3852-- Avi

Avi,

In your script intel-rr.batch, please remove "mpdboot -f $SLURM_NODELIST -n 2 -v -r ssh" - you cannot use 'mpdboot'! You need to use mpirun instead.
Change your command line for mpirun:
mpirun -r ssh -nolocal -np 2 ./a.out
Check the log after "sbatch intel-rr.batch" - the format of node list ('rr[10,72]') should be parsed correctly.

Please provide the output if the problem persists.

Regards!
Dmitry

Dmitry,I had originally tried just with mpirun only to no avail. So here's the complete run script and output..<--- run script#!/bin/bash#SBATCH --time=10:00 # WALLTIME#SBATCH -N 2 # Number of nodes#SBATCH -n 2 # Number of cores/processors#SBATCH --job-name intel-mpi_test # Name of job##SBATCH -p inter # see "Queues" section for details#SBATCH -o intel-mpi-rr.out.%jecho "Node list is" $SLURM_NODELISTcd /home/apurkaya/apps/OMB-3.1.1/intel-mpi/tests/2-node mpirun -r ssh -nolocal -np 2 ./osu_bw---><--- outputNode list is rr[76-77]mpdboot_rr76 (handle_mpd_output 846): mpdboot: can not get anything from the mpd daemon; please check connection to rr77-->

Avi,

mpirun was able to parse host name correctly.
Please check that you are able to login on node rr77 from rr76 and vice versa without entering password:
ssh rr77
(from rr77) ssh rr76

Regards!
Dmitry

Avi,

Could you try the following scenario?
I assume that you have hello application. It may be any simple MPI test

1. Create test.sh file. For instance,
$ cat test.sh

#!/bin/bash

srun hostname -s | sort -u >mpd.hosts
source /opt/mpi-4.0.026/bin64/mpivars.sh

# Example 1
# Launch application using hydra process manager
mpiexec.hydra -f mpd.hosts -n $SLURM_NPROCS -env I_MPI_DEBUG 5 ./hello

# Example 2
# Launch application using MPD process manager
mpdboot -n $SLURM_NNODES -r ssh
mpiexec -n $SLURM_NPROCS -env I_MPI_DEBUG 5 ./hello
mpdallexit

2. submit a job using sbatch command. For instance,
$ sbatch -n 4 test.sh

Please let me know how the suggestion help.

Best regards,
Andrey

Leave a Comment

Please sign in to add a comment. Not a member? Join today