Suspend an MPI job

Suspend an MPI job

Hi,

How can I suspend all the processes in an MPI job? I tried to use I_MPI_JOB_SIGNAL_PROPAGATION but it didn't seem to work. I am using Intel MPI 4.0.1.007. Thanks.

Jacky

8 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione

Hi Jacky,

Well, I've just check with 4.0.2 and it works.
[dk@cl210 ~]$ export I_MPI_JOB_SIGNAL_PROPAGATION=1
[dk@cl210 ~]$ mpiexec -n 8 IMB-MPI1

In other terminal window:
[dk@cl210 ~]$ ps ux | grep mpiexec
dk 13809 0.1 0.0 140860 9876 pts/11 T 12:06 0:00 python /users/dk/impi/4.0.2/intel64/bin/mpiexec -n 8 IMB-MPI1
[dk@cl210 ~]$ kill -20 13809 (send SIGTSTP)

In the first window you'll see:
[1]+ Stopped mpiexec -n 8 IMB-MPI1

Again in the second window type:
[dk@cl210 ~]$ kill -18 13809 (send SIGCONT)

And IMB is continuing to work.

Is it not your case?

Regards!
Dmitry

Dmitry,

Thanks for your reply. I tried what you did and unfortunately it didn't work in my case. Actually, nothing happened when I used "kill -18" in another terminal window. When I used "Ctrl-Z" in the terminal window where the program was running, only the first process was suspended and all the other processes kept running.

Is this because I am using Intel MPI 4.0.1.007? Or is there anything else I need to configure? Thanks.

Jacky

Hi Jacky,

I've taken a look into the code of mpiexec and you know you are absolutely right - documentation and reality are not the same. So, SIGTSTP and SIGCONT are not propogated to an application. It can be easily changed, but I doubt that you'll be able to do this.
You can submit a tracker at premier.intel.com and I'll send you a patch for testing.

Regards!
Dmitry

Thanks for the reply. I tried to submit an issue at premier.intel.com, but intel cluster kit is not in my product list. What can I do then? Thanks.

What's your product? If it is Cluster Toolkit or Cluster Studio you should be able to submit a tracker againt Intel MPI Library for Linux.

Regards!
Dmitry

Dmitry,

I have submitted the issue. Could you please take a look? Thanks.

Hi Jacky,

Got it - will be working on that.

Regards!
Dmitry

Accedere per lasciare un commento.