Processes created by MPI_Comm_spawn don't terminate before parent process finalization

Processes created by MPI_Comm_spawn don't terminate before parent process finalization

I've created a child process by MPI_Comm_spawn and I need it really terminates (don't exist anymore) before parent process finalization. I can't find any reason to a child process still being alive after MPI_Finalize. It's a logical bug implementation? I mean, most of other mpi implementation doesn't present this behavior.

Tanks,
Fernanda

publicaciones de 9 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

Hi Fernanda,

Could you please clarify a bit or might be provide a test case.
Do you call MPI_Abort() to terminate the process. Do you terminate parent process by 'kill -signal'? Do you have anything in the code after MPI_Finalize()?
Strictly speaking, MPI_Finalize() is a collective operation and each process should call this function and MPI communication is not allowed after that.

Regards!
Dmitry

Hi,

My code is simple:
#include
#include

int main(int argc, char ** argv)
{
int rank;
MPI_Comm comm_parent, intercomm;
int errcodes;

MPI_Init(&argc, &argv);
MPI_Comm_get_parent(&comm_parent);
if(comm_parent == MPI_COMM_NULL){
// Parent process
MPI_Comm_spawn(argv[0], &argv[1], 1, MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm, &errcodes);
sleep(15);
printf("parent finalizes\n");
}
else{
// Child process
sleep(5);
printf("child finalizes\n");
}
MPI_Finalize();
return(0);
}

I ran it with top in the same machine and in another terminal. I noticed that child process still stay in top after 5 seconds (in sleeping status). It only really terminates when parent process finalizes.
I also noticed that it's not happen with another mpi intel build (but same version). In this case, child process terminates correctly, after past 5 seconds (so, there's no child process in top).

At the first execution I used build/version: Intel MPI Library for Linux Version 4.0 Build 20100422 Platform Intel 64 64-bit applications.
The second one was: Intel MPI Library for Linux Version 4.0 Update 1 Build 20100818 Platform Intel 64 64-bit applications.

Ok. It seems to be a difference between both mpi intel builds.
Can anyone confirm this?

Thanks,
Fernanda

Fernanda,

Could you try to use 'ps' utility instead?
$ mpiicc -o spawn spawn_test.c
$ mpiexec -n 1 ./spawn
In another terminal window:
$ ps ux | grep spawn

You'll see that child process exists till finalization in any implementation.

Regards!
Dmitry

Hi,

I have to disagree. LAM/MPI and Open MPI, for instance, don't have this behavior. Child process does not exist after finalization. Besides, I didn't have this problem using Intel MPI 4.0 Update 1 too.

Using the same code posted previously and including "sleep(5);" before MPI_Comm_spawn call, I can prove it by these executions:

=====================================================
Using Open MPI 1.3.3:

[fgoliveira@rio1 testes]$ mpirun -V
mpirun (Open MPI) 1.3.3

Report bugs to http://www.open-mpi.org/community/help/
[fgoliveira@rio1 testes]$ mpicc teste_spawn.c -o teste_spawn
[fgoliveira@rio1 testes]$ mpirun -n 1 ./teste_spawn & sleep 2; ps ux | grep spawn; sleep 4; echo "-----"; ps ux | grep spawn; sleep 5; echo "-----"; ps ux | grep spawn;
[1] 17201
13169 17201 0.0 0.0 52276 2468 pts/1 S 13:32 0:00 mpirun -n 1 ./teste_spawn
13169 17203 1.0 0.0 92732 3536 pts/1 S 13:32 0:00 ./teste_spawn
13169 17205 0.0 0.0 61176 724 pts/1 S+ 13:32 0:00 grep spawn
-----
13169 17201 0.0 0.0 52276 2484 pts/1 S 13:32 0:00 mpirun -n 1 ./teste_spawn
13169 17203 0.6 0.1 92732 4128 pts/1 S 13:32 0:00 ./teste_spawn
13169 17207 2.0 0.1 92736 4108 pts/1 S 13:32 0:00 ./teste_spawn
13169 17209 0.0 0.0 61172 720 pts/1 S+ 13:33 0:00 grep spawn
child finalizes
-----
13169 17201 0.0 0.0 52276 2508 pts/1 S 13:32 0:00 mpirun -n 1 ./teste_spawn
13169 17203 0.3 0.1 92732 4128 pts/1 S 13:32 0:00 ./teste_spawn
13169 17231 0.0 0.0 61176 724 pts/1 S+ 13:33 0:00 grep spawn
[fgoliveira@rio1 testes]$ parent finalizes

[1]+ Done mpirun -n 1 ./teste_spawn
[fgoliveira@rio1 testes]$

=====================================================
Using MPI Intel 4.0:

[fgoliveira@gsn08 ~]$ mpirun -V
Intel MPI Library for Linux Version 4.0
Build 20100422 Platform Intel 64 64-bit applications
Copyright (C) 2003-2010 Intel Corporation. All rights reserved
[fgoliveira@gsn08 ~]$ mpicc teste_spawn.c -o teste_spawn
[fgoliveira@gsn08 ~]$ mpirun -n 1 ./teste_spawn & sleep 2; ps ux | grep spawn; sleep 4; echo "-----"; ps ux | grep spawn; sleep 5; echo "-----"; ps ux | grep spawn;
[1] 459
WARNING: Unable to read mpd.hosts or list of hosts isn't provided. MPI job will be run on the current machine only.
503 459 0.0 0.0 63948 1260 pts/0 S 13:32 0:00 /bin/bash /opt/intel/impi/4.0.0.028/intel64/bin/mpirun -n 1 ./teste_spawn
503 496 4.0 0.0 138684 9700 pts/0 S 13:32 0:00 python /opt/intel/impi/4.0.0.028/intel64/bin/mpiexec -n 1 ./teste_spawn
503 499 0.0 0.0 33932 2156 ? S 13:32 0:00 ./teste_spawn
503 501 0.0 0.0 63204 772 pts/0 S+ 13:32 0:00 grep spawn
-----
503 459 0.0 0.0 63948 1260 pts/0 S 13:32 0:00 /bin/bash /opt/intel/impi/4.0.0.028/intel64/bin/mpirun -n 1 ./teste_spawn
503 496 1.3 0.0 138684 9700 pts/0 S 13:32 0:00 python /opt/intel/impi/4.0.0.028/intel64/bin/mpiexec -n 1 ./teste_spawn
503 499 0.0 0.0 33932 2428 ? S 13:32 0:00 ./teste_spawn
503 504 0.0 0.0 33932 2408 ? S 13:32 0:00 ./teste_spawn
503 506 0.0 0.0 63204 772 pts/0 S+ 13:32 0:00 grep spawn
child finalizes
-----
503 459 0.0 0.0 63948 1260 pts/0 S 13:32 0:00 /bin/bash /opt/intel/impi/4.0.0.028/intel64/bin/mpirun -n 1 ./teste_spawn
503 496 0.7 0.0 138688 9704 pts/0 S 13:32 0:00 python /opt/intel/impi/4.0.0.028/intel64/bin/mpiexec -n 1 ./teste_spawn
503 499 0.0 0.0 33932 2428 ? S 13:32 0:00 ./teste_spawn
503 504 4.8 0.0 33932 2412 ? R 13:32 0:00 ./teste_spawn
503 509 0.0 0.0 63204 772 pts/0 S+ 13:32 0:00 grep spawn
[fgoliveira@gsn08 ~]$ parent finalizes

[1]+ Done mpirun -n 1 ./teste_spawn
[fgoliveira@gsn08 ~]$

=====================================================
Using MPI Intel 4.0 Update 1:

[fgoliveira@rio1 testes]$ mpirun -V
Intel MPI Library for Linux Version 4.0 Update 1
Build 20100818 Platform Intel 64 64-bit applications
Copyright (C) 2003-2010 Intel Corporation. All rights reserved
[fgoliveira@rio1 testes]$ mpicc teste_spawn.c -o teste_spawn
[fgoliveira@rio1 testes]$ mpirun -n 1 ./teste_spawn & sleep 2; ps ux | grep spawn; sleep 4; echo "-----"; ps ux | grep spawn; sleep 5; echo "-----"; ps ux | grep spawn;
[1] 17736
WARNING: Unable to read mpd.hosts or list of hosts isn't provided. MPI job will be run on the current machine only.
13169 17736 0.0 0.0 63996 1308 pts/1 S 13:55 0:00 /bin/bash /opt/intel/compilerpro-12.0.1.107/mpirt/bin/intel64/mpirun -n 1 ./teste_spawn
13169 17773 4.0 0.2 138768 9796 pts/1 S 13:55 0:00 python /opt/intel/compilerpro-12.0.1.107/mpirt/bin/intel64/mpiexec -n 1 ./teste_spawn
13169 17775 0.5 0.0 92740 3520 ? S 13:55 0:00 ./teste_spawn
13169 17778 0.0 0.0 61172 720 pts/1 S+ 13:55 0:00 grep spawn
-----
13169 17736 0.0 0.0 63996 1308 pts/1 S 13:55 0:00 /bin/bash /opt/intel/compilerpro-12.0.1.107/mpirt/bin/intel64/mpirun -n 1 ./teste_spawn
13169 17773 1.3 0.2 138768 9796 pts/1 S 13:55 0:00 python /opt/intel/compilerpro-12.0.1.107/mpirt/bin/intel64/mpiexec -n 1 ./teste_spawn
13169 17775 0.6 0.1 92740 4112 ? S 13:55 0:00 ./teste_spawn
13169 17780 2.0 0.1 92736 4108 ? S 13:55 0:00 ./teste_spawn
13169 17782 0.0 0.0 61176 724 pts/1 S+ 13:55 0:00 grep spawn
child finalizes
-----
13169 17736 0.0 0.0 63996 1308 pts/1 S 13:55 0:00 /bin/bash /opt/intel/compilerpro-12.0.1.107/mpirt/bin/intel64/mpirun -n 1 ./teste_spawn
13169 17773 0.7 0.2 138772 9800 pts/1 S 13:55 0:00 python /opt/intel/compilerpro-12.0.1.107/mpirt/bin/intel64/mpiexec -n 1 ./teste_spawn
13169 17775 0.3 0.1 92740 4112 ? S 13:55 0:00 ./teste_spawn
13169 17785 0.0 0.0 61176 728 pts/1 S+ 13:55 0:00 grep spawn
[fgoliveira@rio1 testes]$ parent finalizes

[1]+ Done mpirun -n 1 ./teste_spawn
[fgoliveira@rio1 testes]$

I'm being annoying because I need this functionality in Intel MPI. I see no reason why the child process continues to exist after its finalization.

Thanks,
Fernanda

Best Reply

Hi Fernanda,

Might be you just need to use MPI_Comm_disconnect()? Something like:
if(comm_parent == MPI_COMM_NULL){
MPI_Comm_rank (MPI_COMM_WORLD, &rank);
MPI_Comm_spawn(argv[0], &argv[1], 1, MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm, &errcodes);
MPI_Comm_disconnect(&intercomm);
sleep(15);

}
else{
MPI_Comm_rank (MPI_COMM_WORLD, &rank);
sleep(5);
MPI_Comm_disconnect(&comm_parent);
}

Regards!
Dmitry

Ok. It seems to work.
It is not the same solution as MPI/LAM and Open MPI, but it works.

I hope next version don't have this behavior as well as MPI Intel version 4 Update 1. It will be great!

Thanks!
Fernanda

Ferbanda,
Please don't expect any changes related to behavior of MPI_Finalize(). The difference in behavior between different Intel MPI versions is very strange. I could not reproduce it with any version (even with upcoming 4.0 update 3) but I work on RHEL and it seems to me that you are using SuSe.

Regards!
Dmitry

Ok, Dmitry.
I'm using CentOS, but I don't believe S.O. affects the results, in this case.
My opinion is that child process should not exist after finalization because I've been using LAM/MPI and Open MPI. I do not find any description about the process spawned finalization on MPI-forum. So, I don't know exactly what is correct.
Anyway, your solution can help me specifically in my implementation.
However, in your solution, if I want to use a communication between parent and child processes, I would have to implement a termination algorithm like sending a message of end to parent (then, parent process could use MPI_disconnect).

Fernanda

Deje un comentario

Por favor inicie sesión para agregar un comentario. ¿No es socio? Únase ya