mpiexec fails under SGE

mpiexec fails under SGE

Imagen de san

Hi everyone,

I'm trying to run Intel MPI-3.2.1 on a SGI Altix Linux cluster under SGE-6.2. It fails with following error:

cat output.32.Hello
/var/sge/default/spool/r1i0n12/active_jobs/32.1/pe_hostfile
r1i0n12
r1i0n12
r1i0n12
r1i0n12
r1i0n12
r1i0n12
r1i0n12
r1i0n12
mpdroot: cannot connect to local mpd at: /tmp/32.1.all.q/mpd2.console_root_r1i0n12
probable cause: no mpd daemon on this machine
possible cause: unix socket /tmp/32.1.all.q/mpd2.console_root_r1i0n12 has been removed
mpiexec_r1i0n12 (__init__ 1162): forked process failed; status=255

But, if job is submitted without using SGE(i.e. from command line) then it works well on the same set of nodes

The mpi job is submitted using mpiexec command and mpd's are already booted by root and user has MPD_USE_ROOT_MPD=1 in .mpd.conf file in his home directory.

What could be the reason for failure here?

Thanks

publicaciones de 2 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.
Imagen de Dmitry Kuzmin (Intel)

Hi San,

It seems to me that SGE changes TMPDIR environment variable and after that mpdroot cannot find console file.
Could you set I_MPI_MPD_TMPDIR=/tmp before you create an mpd ring and give it a try? Don't forget to set this variable for the user.

Please let me know if it doesn't help.

Regards!
Dmitry

Inicie sesión para dejar un comentario.