Dynamically start MPI processes

Dynamically start MPI processes

Hi,

I have a master/slave type MPI program, and I'd like the master to dynamically spawn the slave processes. I tried MPI_Comm_spawn, but it seems that I could only start slave processes on nodes where mpd.py has already been started, ie., nodes specified in mpd.hosts. However, in my case, I'd like to assume that I don't know which nodes I will use when I start the program using mpirun. The nodes where the slave processes will run are determined at run-time.

I tried to use the hydra process manager in Intel 4.1, and MPI_Comm_spawn failed. Does hydra support spawning at all?

Could anyone give me some insight or advice on how to solve my problem? Thanks.

publicaciones de 3 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

Do you have a reproducer?  I have been able to use MPI_Comm_spawn with Hydra.  Let me check on the exact method for specifying a host outside of the provided host list for launching.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Consulting Engineer

Thanks for replying.

I am attaching the test program I've been using. I compiled them using:

                mpiicpc -o master master.cpp

                mpiicpc -o worker worker.cpp

 

and ran it using:

                mpirun -n 1 -env MPI_UNIVERSE_SIZE 3 ./master.

 

The program completed successfully with impi 4.0.2.003, but when run with impi 4.1.1.036 on the same nodes, I got the following output:

 

                universe_size = 8

                node1:2d9d:  dapl_cma_connect: rdma_connect ERR -1 Function not implemented

                [0:node1] unexpected DAPL connection event 0x4006 from 7

                Assertion failed in file ../../dapl_poll_rc.c at line 1679: 0

                internal ABORT - process 0

                APPLICATION TERMINATED WITH THE EXIT STRING: Interrupt (signal 2)

 

I tried with mpdboot & mpiexec, and got the same error. So it's not the hydra manager's fault. Do you know what is wrong? Thanks.

 

Adjuntos: 

AdjuntoTamaño
Descargar master.cpp1.61 KB

Inicie sesión para dejar un comentario.