mpirun failed to start when TMPDIR=. is set

mpirun failed to start when TMPDIR=. is set

Hi,

We found when the environment variable TMPDIR is set to the current directory, not matter it is '.', './', or full path, the Intel MPI failed to run.

This happens on all MPI versions ( including 4.0)

[linfa@babbage testrun]$ setenv TMPDIR .
[linfa@babbage testrun]$ {/opt/intel/impi/3.2.0.011/bin64/mpirun} -n 2
mpdboot_babbage.tx.altair.com (handle_mpd_output 730): Failed to establish a socket connection with babbage:54848 : (111, 'Connection refused')
mpdboot_babbage.tx.altair.com (handle_mpd_output 747): failed to connect to mpd on babbage


Is this a bug? Is there any workaround ? Thanks.

5 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

Hey linfa,

I would actually recommend upgrading to our newest version: Intel MPI Library 4.0 Update 3. You can grab it from the Intel Registration Center. While I was able to reproduce this with the 4.0 release, I don't see this issue with the 4.0.3 release:

[user@host1:~]> export TMPDIR=.
[user@node1:~]> /opt/intel/impi/4.0.0.025/bin64/mpirun -n 2 hostname
mpdboot_node1 (handle_mpd_output 949): Failed to establish a socket connection with node1:33751 : (111, 'Connection refused')
mpdboot_node1 (handle_mpd_output 969): failed to connect to mpd on node1
[user@node1:~]> /opt/intel/impi/4.0.3/bin64/mpirun -n 2 hostname
node1
node1

We have a new default process manager in 4.0.3. I don't believe we supported the shorthand symbols with our old PM.

Give this a try and let us know how it goes.

Regards,
~Gergana

Gergana Slavova
Technical Consulting Engineer
Intel® Cluster Tools
E-mail: gergana.s.slavova_at_intel.com

Hi Gergana,Thanks for your quick reply. I have several questions1) What is "default process manager"? How is it related to this issue?Could you explain to me a little bit more?2) What should I update, SDK for building executable or run-time library only?3) It is not a problem for me to update. But it is more difficult to ask our customer to do it. So I am wondering if there is an workaround.Thanks.Linfa

Hi Linfa,

1) A process manager is the part of the library that launches the MPI ranks, interracts with the job or batch schedulers, makes the physically connections between the nodes (e.g. via ssh), etc. It would also do parsing of your command and any env variables you're using (like TMPDIR) to start that job.
In older versions of our library, we used the Multi-Purpose Daemons (MPDs) as the process manager. In the 4.0.3 version and later, we use the Hydra process manager. Hydra has some advantages to the MPDs - as you can see here.

2) I recommend updating the full SDK - if you have a valid license, that would be free and easy to do. But, if not possible, all of our 4.0.x packages are compatible with each other. So you can simply update the runtimes and be ok.

3) The only workaround would be to use the full path:

[user@node1:~]> export TMPDIR=/home/user
[user@node1:~]> /opt/intel/impi/4.0.0.025/bin64/mpirun -n 2 hostname
node1
node1

Is your customer just running your application? If yes, they can simply update the runtimes. Those are available as a free download from our website: www.intel.com/go/mpi.

Does that sound reasonable?

Regards,
~Gergana

Gergana Slavova
Technical Consulting Engineer
Intel® Cluster Tools
E-mail: gergana.s.slavova_at_intel.com

Thanks. That's what I want

Melden Sie sich an, um einen Kommentar zu hinterlassen.