Hi Marc,
First I have to ask the obvious question. How long does the job take to complete without Torque*? If the job takes more than 2 hours, increase the allocated time for the job.
If that is not the case, then please send me the output with I_MPI_DEBUG=5 and we'll proceed from there.
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools




MPI Library 4.1 and Torque
Dear all,
I'm trying to run a classical MPI test code on our cluster, and I'm still in trouble with it. I have installed the Intel Cluster Studio XE 2013 for Linux and Torque 4.1.3.
If I don't use torque "mpirun -f machine -np 18 ./code", it runs fine (machine is the list of nodes). If i use torque, it runs and stop at the end of walltime with the following errors
=>> PBS: job killed: walltime 143 exceeded limit 120
[mpiexec@node4] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:221): assert (!closed) failed
[mpiexec@node4] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:128): unable to send SIGUSR1 downstream
[mpiexec@node4] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@node4] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:388): error waiting for event
[mpiexec@node4] main (./ui/mpich/mpiexec.c:718): process manager error waiting for completio
Do you have any idea ?
Thanks in advance,
M.