problem with mpi_comm_accept

problem with mpi_comm_accept

imagem de kellerd@lle.rochester.edu

 I am trying to set up a server client pair that establishes a connection (after being launched independently) using mpi_comm_accept and connect. I successfully have the server wait for a connection request using MPI_Comm_accept.  The client successfully connects to using MPI_Comm_connect.   Both the server and the client return without any error but the a negative handle is returned in 'newcomm' to both the server and the client.  I launch both using mpiexec and have mpd running.

I cannot figure out what is wrong and it is probably something simple.  Any ideas?

SERVER:

Integer*4 ierr,rt_gang_comm

call MPI_OPEN_PORT(MPI_INFO_NULL,portA,ierr)

write(contape,*) 'Gang Master port = ' // trim(portA)

… server writes the name to a file

call MPI_COMM_ACCEPT(portA,MPI_INFO_NULL,0,MPI_COMM_SELF,rt_gang_comm, ierr)

if (ierr) then

  stop ‘serever problem’

else

  write(contape,*) 'MPI_COMM_WORLD = ',MPI_COMM_WORLD,' rt_gang_com = ',rt_gang_comm

endif

call MPI_Bcast(i_rec,n_i_s,MPI_INTEGER4,id_mpi_rank,rt_gang_comm, ierr)

if (ierr)then

   callMPI_ERROR_STRING(ierr,errs,err_len,ierr1)

   stop

endif

 

 

 

 

SERVER OUTPUT:

 

Gang Master port = tag#0$rdma_port0#5124$rdma_host0#2:0:0:192:168:11:220:0:0:0:

 0:0:0:0:0$

MPI_COMM_WORLD =   1140850688  rt_gang_com =  -2080374784

  

CLIENT:

 

Integer*4 ierr,intercom

… client reads port name from the server written file

write(contape,*)'Opening port', trim(portA)

call MPI_COMM_CONNECT(portA, MPI_INFO_NULL, 0, MPI_COMM_WORLD, intercomm, ierr)

if (ierr) then

  stop ‘client problem’

else

  print*,"MPI_COMM_WORLD=",MPI_COMM_WORLD,"intercomm=", intercom

endif

 

CLIENT OUTOPUT:

 

Opening port  tag#0$rdma_port0#5124$rdma_host0#2:0:0:192:168:11:220:0:0:0:0:0:0:0:0$

MPI_COMM_WORLD=  1140850688 intercomm= -2080374783

 

 

A subsequent broadcasr hangs.  They are trying to communicate over a communicator that differs by one.

 

I cannot find anything around that can seem to help so I would appreciate even ideas to try.

 

Thanks,

 

Dave

15 posts / 0 new
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.
imagem de kellerd@lle.rochester.edu

A few particulars:

ifort version 12.1.0
CentOS release 5.7
impi/4.0.3/lib64

imagem de Tim Prince

If you should happen to use an architecture option -xHost on AVX hardware, or one including AVX, I don't know whether it will work, since CentOS 5.7 doesn't support AVX.

imagem de kellerd@lle.rochester.edu

Thanks for you input Tim. I was compiling with -xSSE4.2 because it helped with optimization (at some point). I removed it and recompiled with defaults for both the client and server and still end up with a negative pointer.

imagem de kellerd@lle.rochester.edu

Are there any other ideas out there? I am so stumped.

imagem de James Tullos (Intel)

Hi Dave,

I've tested the basic functionality of your program, and it appears to work even with a negative value for the communicator. That is simply a handle to the communicator, and as long as the two have the same communicator, the connection should work.

From the server side:


$ mpirun -n 1 ./server

 MPI_COMM_WORLD =   1140850688  newcomm =  -2080374784

 value =           25

$ cat portname.txt

 tag#0$rdma_port0#25033$rdma_host0#2:0:0:36:101:26:30:0:0:0:0:0:0:0:0$

And the client side:


$ mpirun -n 1 ./client

 MPI_COMM_WORLD =   1140850688  newcomm =  -2080374784

 value =           25

If you're still having problems, please run with I_MPI_DEBUG=5 and attach the output.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

imagem de kellerd@lle.rochester.edu

James,
When I run it the negative communicators differ by one as reflected in the output.
Both the client and server have valid communicators but the do not match so the each hange on a broadcast.
Dave

imagem de kellerd@lle.rochester.edu

James,
My example was snipped out so the output does not match excatly but it is attached.

Anexos: 

AnexoTamanho
Download problem.txt1.58 KB
imagem de kellerd@lle.rochester.edu

James,
My example was snipped out so the output does not match excatly but it is attached.

Anexos: 

AnexoTamanho
Download problem.txt1.91 KB
imagem de kellerd@lle.rochester.edu

James
attached is debug=10
Dave

imagem de kellerd@lle.rochester.edu

James
attached is debug=10
Dave

imagem de kellerd@lle.rochester.edu

Don't think my last attachment made it

Anexos: 

AnexoTamanho
Download problem-mpidb10.txt7.44 KB
imagem de James Tullos (Intel)

Hi Dave,

I don't see anything immediately wrong in your output (other than the mismatched communicator handle). Can you try using the current version of the Intel® MPI Library (Version 4.1)? Can you send me a reproducer code to test here?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

imagem de kellerd@lle.rochester.edu

James,
I sent the perfect test case to you but not have heard back?
Dave

imagem de James Tullos (Intel)

Hi Dave,

I have not received the test case. Did you send it via private message? How large is the file?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Faça login para deixar um comentário.