Is Intel MPI torus aware?

Is Intel MPI torus aware?

rcummins64's picture

I am working on a cluster that is a 3D torus infiniband topology. I am trying to scale a job to more than 300 nodes and I am finding that my jobs fail with what appears to be congestion on the fabic. I have been searching for the cause of the congestion and it appears that the "application" needs to query the subnet manager (SM) for the SLs to properly manage fabric traffic. I did some checking and it appears that my verion of Intel MPI, 4.0.2.003, is not "torus" aware and does not actually query the SM for the SLs to properly route traffic. Can someone either confirm or refute my findings and if refute, please tell me how to tell mpiexec to query the SM for SLs at runtime?

4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
James Tullos (Intel)'s picture
Best Reply

Hi Robert,

Unfortunately, the Intel MPI Library currently does not support network topology awareness. Some of the collective operations can use topology aware algorithms, and this capability might help you. The full list of algorithms and details of how to set them are in section 3.5.1 of the Intel MPI Library for Linux* OS Reference Manual. As an example, with MPI_Bcast, you can set I_MPI_ADJUST_BCAST=4 to use the topology aware binomial algorithm for all message sizes when using MPI_Bcast.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

rcummins64's picture

Thanks for the prompt reply. One final question, is it safe to assume that even with setting I_MPI_ADJUST_BCAST=4 this is not likely to scale to 1000 nodes? I could see getting over the next "hump" but 3x where I fall over now?

James Tullos (Intel)'s picture

Hi Robert,

The only answer I have for that is to try it and see. I don't have the details of the collective algorithms used and thus can't comment on their scalability. There are other topology aware algorithms (MPI_Bcast has 3), my recommendation is to try them on your system with your application and use thecombination that works best for you.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Login to leave a comment.