mpdboot problem

mpdboot problem

Hi all,

I'm using Intel-MPI3 (icc & ifort 10 compilers) on a two node cluster with Ethernet interconnect.

The mpdboot command:

# mpdboot --totalnum=2 --file=/root/mpd.hosts --mpd=/opt/MPI_LIBS/INTEL-MPI/bin64/mpd --verbose --ncpus=4 --ifhn=10a0101

gave following error:

running mpdallexit on 10a0101
LAUNCHED mpd on 10a0101 via
RUNNING: mpd on 10a0101
LAUNCHED mpd on compute-0-0 via 10a0101
mpdboot_10a0101 (handle_mpd_output 589): from mpd on compute-0-0, invalid port info:
connect to address 10.255.255.254: Connection refused
connect to address 10.255.255.254: Connection refused
trying normal rsh (/usr/bin/rsh)
32833

If --rsh=/usr/bin/ssh option is used, mpdboot works fine. But again gives error during a job submission across 2 nodes.

With MPICH2, mpdboot and the job submission are working without any error.

I'm not getting why its not happening with Intel MPI.

Can someone help me out to resolve this issue?

- Sanagmesh

5 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione

Hi Sanagmesh,

It looks like a known bug. I belive that it should not appear in the latest release.

Package ID: l_mpi_p_3.1.026

Could you clarify the package ID for the Intel MPI Library you have? Itcan be found in the mpisupport.txt file. Would it be possible for you to do an upgrade if you have an older version?

Best regards, Andrey

I'm using:
Package ID: l_mpi_p_3.0.043

Is it happen in every cluster, if booted on >1 node?

Thanks
-Sangamesh

Is it acceptable for you to do an upgrade to Intel MPI Library 3.1? If not so I would suggest you request a patch for "invalid port info" issueat https://premier.intel.com. As far as I know it is available for 3.0.043 package

I upgraded the Intel MPI to 3.1 version. Now I can mpdboot without any errors.

Thanks..

-Sangamesh

Accedere per lasciare un commento.