MPI Library 4.1 and Torque

MPI Library 4.1 and Torque

Аватар пользователя Marc O.

Dear all,

I'm trying to run a classical MPI test code on our cluster, and I'm still in trouble with it. I have installed the Intel Cluster Studio XE 2013 for Linux and Torque 4.1.3. 

If I don't use torque "mpirun -f machine -np 18 ./code", it runs fine (machine is the list of nodes). If i use torque, it runs and stop at the end of walltime with the following errors

=>> PBS: job killed: walltime 143 exceeded limit 120
[mpiexec@node4] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:221): assert (!closed) failed
[mpiexec@node4] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:128): unable to send SIGUSR1 downstream
[mpiexec@node4] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@node4] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:388): error waiting for event
[mpiexec@node4] main (./ui/mpich/mpiexec.c:718): process manager error waiting for completio

Do you have any idea ?

Thanks in advance,

M.

10 сообщений / 0 новое
Последнее сообщение
Пожалуйста, обратитесь к странице Уведомление об оптимизации для более подробной информации относительно производительности и оптимизации в программных продуктах компании Intel.
Аватар пользователя James Tullos (Intel)

Hi Marc,

First I have to ask the obvious question.  How long does the job take to complete without Torque*?  If the job takes more than 2 hours, increase the allocated time for the job.

If that is not the case, then please send me the output with I_MPI_DEBUG=5 and we'll proceed from there.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Аватар пользователя Marc O.

Hi James,

The code should give things like "hello world i'm proccessor number ". I will run it for longer time.

the first test with "mpirun -genv I_MPI_DEBUG 5 -np 32 ./code"  in my batch file, doesn't give more information...

M.

Аватар пользователя James Tullos (Intel)

Hi Marc,

Please send me the output from the following commands:

which mpirun

env | grep I_MPI

ldd ./code

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Аватар пользователя Marc O.

Hi James,

Sorry for the delay. So, these are the output you asked for:

$ which mpirun
/opt/intel/impi/4.1.0.024/intel64/bin/mpirun

$ env |grep I_MPI
I_MPI_ROOT=/opt/intel/impi/4.1.0.024

$ ldd ./code
linux-vdso.so.1 => (0x00007ffff81ff000)
libdl.so.2 => /lib64/libdl.so.2 (0x00000036efc00000)
libmpi.so.4 => /opt/intel/impi/4.1.0.024/intel64/lib/libmpi.so.4 (0x00007f3e13c0f000)
libmpigf.so.4 => /opt/intel/impi/4.1.0.024/intel64/lib/libmpigf.so.4 (0x00007f3e139df000)
librt.so.1 => /lib64/librt.so.1 (0x0000003fac200000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003fad200000)
libm.so.6 => /lib64/libm.so.6 (0x00000036f0800000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00000036fb000000)
libc.so.6 => /lib64/libc.so.6 (0x00000036f0000000)
/lib64/ld-linux-x86-64.so.2 (0x00000036ef800000)

Thank for the interest,

M.

Аватар пользователя James Tullos (Intel)

Hi Marc,

Please send the output with -verbose.  Let's see if that offers any insight about what's going on.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Аватар пользователя Marc O.

Hi James,

I finally solved the problem. It was coming from interactions between OpenMPI and Intel MPI... Thanks a lot for your help

M.

Аватар пользователя James Tullos (Intel)

Hi Marc,

I'm glad to hear it's resolved now.  Are you attempting to use both OpenMPI and the Intel® MPI Library on the same program?  The two are not binary compatible.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Аватар пользователя Marc O.

Hi James,

On our new cluster, I let the users choose the MPI they want. They only use one of them in a program (I have implemented the module files, so they can load the MPI and compilers they want). I have to say that on the first tests, the intel MPI is much more efficient with our codes.

Sincerely,

M.

Аватар пользователя James Tullos (Intel)

Hi Marc,

There is no problem with having both installed on the same cluster.  You just need to make certain that you are running with the same implementation that you use in compiling/linking.

I'm glad to hear that our implementation is working well for you.  If you do have performance concerns, or any others, feel free to let us know, and we'll see what can be done.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Зарегистрируйтесь, чтобы оставить комментарий.