MPI 4.1 fails to end gracefully when ranks > 2000

MPI 4.1 fails to end gracefully when ranks > 2000

I am testing Intel MPI 4.1 with test.c (the provided test program).

Whenever I run > 2000 ranks the program executes correctly but fails to end gracefully.

Running:

mpiexec.hydra -n 2001 -genv I_MPI_FABRICS shm:ofa -f hostfile ./testc

It stalls at 

...

....

Hello World: Rank 2000 running on host xxxx

##<stalls here; does not return to command prompt>

(If I use -n 2000 or less, it runs perfectly.)

I have testing 3000 ranks using OpenMPI, so it doesn't seem to be a cluster/network issue.

 

4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
James Tullos (Intel)'s picture

Does it work with I_MPI_FABRICS=shm:dapl or I_MPI_FABRICS=shm:tcp?  Please attach output with I_MPI_DEBUG=5.

Are you running more ranks than slots available?  If so, were you enabling pinning on OpenMPI, and does it help to turn it off for Intel MPI?

1. DAPL UD works with > 2000 ranks.

2. Attached is the output from I_MPI_FABRICS shm:ofa - stalls after rank 0 receives a single message from ranks 1-2000

3. The cluster has more than 2000 slots: for OpenMPI/OFA I use --map-by socket, with no oversubscription to force the MPI to go across all the nodes.

I am using Mellanox OFED 2.2-1.0.1 on a mlx4 card.

The problem seems to be a MPI -> OFED interaction.

 

Attachments: 

AttachmentSize
Download output_3.txt417.05 KB

Login to leave a comment.