I've followed the instructions from the Intel MPI Getting started documentation, but I'm having problems getting mpd running across my system. I've done the following:
1) Verified that no python / mpd processes are running on compute nodes
2) Started mpdboot from head node "mpdboot -d -v -n 20 -r ssh"
mpdboot fails with a connection error. each time I run it it errors out on a different system..
debug: mpd on n14 on port 43729
mpdboot_n1 (handle_mpd_output 703): Failed to establish a socket connection with n14:43729 : (111, 'Connection refused')
mpdboot_n1 (handle_mpd_output 720): failed to connect to mpd on n14
When I ssh to n14, I do see mpd running...
n14:~ # ps -ef | grep python
root 7535 1 99 Jun09 ? 18:05:45 python /opt/intel/impi/3.1/bin/mpd.py -h icn4 -p 62021 --ifhn=172.18.1.14 --ncpus=1 --myhost=icn14 --myip=172.18.1.14 -e -d -s 20
After I clean up these mpd processes on the compute nodes, and try to re-run mpdboot.... i'll get a connection error on a different node..
Any ideas? By the way I can connect via SSH to any of the nodes OK.