Unable to run IntelMPI on two different machines

Unable to run IntelMPI on two different machines

This mail has been previously posted on a wrong forum (Cluster OpenMP for Intel compiler)

Hello

Here is a sum up of the problems my custormer is currently facing:

Trying to run the job on 4 cpus on a single machine, it works fine but when he try to run the job on 2 cpus on machine A and 2 cpus on machine B it fails on machine B (whatever the machine: reversing the machine order in the host.list file, it's always the second machine which fails to run the job) with a message telling "You can't run mpdboot on machine 'name of the second machine" version of python should be >= 2.4 current version is ' ' (empty)". The job is launched from the first machine listed on the host.list file of the following command:

mpirun -f host.list -np $6 $IWRUN/bin/$NOMOS/csh_presti_ex -fl $input_data -output `pwd`/diagnostic -io_driver $_io_driver >> $IWETU/liste_presti 2>&1 ($6 is the number of cpu).

This is done using IntelMPI2.0 (which is, I know obsolete) but trying IntelMPI3.2 there is no error message but no jobs start either on machine A or machine B (even with verbose and debug option). We ask him to set the envronment variable I_MPI_DEBUG to 7 and we are waiting for the result of this test.

It seems that the problem seems related to a test in mpdboot.py inplying the function getversionpython.

NB The tests have been done using both rsh and ssh with the same results.

Thanks a lot for any suggestion.

4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Martial,

Indeed, Intel MPI Library 2.0 is no longer fully supported. Upgrading to Intel MPI 3.2 is the right thing to do. We've actually disabled all python checks starting with that version (3.2).

Seeing the error with 2.0, and your description of the Intel MPI 3.2 behavior makes me think perhaps it's an issue with the connectivity between the nodes. Are the nodes setup to do passwordless ssh? Meaning that doing ssh node2 from node1 does not prompt you for a password, or is not waiting for any other sort of response (e.g. accept a new authentication key)?

Actually, it would be nice to run mpdboot -d -v -f host.list -np 2 and see if you get any messages. Also, the debug output will be very helpful so I'll be waiting to hear back on that.

Regards,
~Gergana

Gergana Slavova
Technical Consulting Engineer
Intel® Cluster Tools
E-mail: gergana.s.slavova_at_intel.com

Hi Gergana

Thank you for your prompt answer.
Unfortunately my client has already tested using passwordless ssh (and also rsh) and he also tried using verbose an debug option (-v -d) without any message. Anyway, I'll ask him to (re)do the test.
I also suggest him to use the I_MPI_DEBUG variable in order toget more informations, to try -machinefile instead of -f option and at last -perhost option to force the execution on specific cpu/machine.
My client being in Koweit I think the answer will arrive tomorrow, I will let you know.

Thanks again
Regards

Martial

Thank you, Martial.

One thing to note is that the -d and -v options are for mpdboot only; they won't be recognized by mpirun. I'm saying this because your example included mpirun.

I look forward to hearing back.

Regards,
~Gergana

Gergana Slavova
Technical Consulting Engineer
Intel® Cluster Tools
E-mail: gergana.s.slavova_at_intel.com

Login to leave a comment.