Problems mpdboot

Problems mpdboot

Imagen de carlos.veralive.cl

Hello,

I am having a the following problem when executing mpdboot:

$ mpdboot -n 2 -f /home/comsol/mpd.hosts -r ssh
mpdboot_cluster (handle_mpd_output 672): Failed to establish a socket connection with cl1n001:42406 : (111, 'Connection refused')
mpdboot_cluster (handle_mpd_output 689): failed to connect to mpd on cl1n001

I need to utilize mpi to be able to make Comsol 3.5 work in parallel form.
Comsol is paralleled in the following form:
cluster comsol35/bin> ./comsol -nn 2 mpd boot -f /home/comsol/mpd.hosts

The error I get is:

mpdboot_cluster (handle_mpd_output 725): from mpd on cl1n001, invalid port info:
cl1n001: Connection refused

Information:
Operating System: SLES 10 sp2
Version Intel Mpi: 3.1

I really hope someone can help me.

Thank you.

publicaciones de 3 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.
Imagen de Tim Prince

Does ssh without password connect to that node, or does it refuse to connect? This can be as simple as stale entries in ~/.ssh/known_hosts or a disconnected or powered off component.

Imagen de Gergana Slavova (Intel)

Hi Carlos,

The issue here is that, when you try to start the MPD daemons from the 'cluster' node, it's unable to connect to the 'cl1n001' node.

As Tim mentioned, can you verify that passwordless SSH is setup on the cluster? Meaning that you can ssh from cluster to cl1n001 without being prompted for a password? That's a requirement for the Intel MPI Library.

Also, make sure that no old MPD daemons are running on the cluster. To do so, execute:

$ ps aux | grep mpd

If you see a listing of any 'mpd' python processes running under your account, kill -9 those to clear out the port Intel MPI is trying to use (both for cluster and cl1n001).

Finally, this could be an issue where Intel MPI tries to create the initial mpd logfile but it can't. By default, this will be done in /tmp on the node. Can you verify that you have access and can indeed write into /tmp, or if there is a file called /tmp/mpd2.logfile_?

Generally, I would also recommend upgrading to the latest Intel MPI Library 3.2 Update 1.

Regards,
~Gergana

Gergana Slavova
Technical Consulting Engineer
Intel® Cluster Tools
E-mail: gergana.s.slavova_at_intel.com

Inicie sesión para dejar un comentario.