I am working on a cluster that is a 3D torus infiniband topology. I am trying to scale a job to more than 300 nodes and I am finding that my jobs fail with what appears to be congestion on the fabic. I have been searching for the cause of the congestion and it appears that the "application" needs to query the subnet manager (SM) for the SLs to properly manage fabric traffic. I did some checking and it appears that my verion of Intel MPI, 4.0.2.003, is not "torus" aware and does not actually query the SM for the SLs to properly route traffic. Can someone either confirm or refute my findings and if refute, please tell me how to tell mpiexec to query the SM for SLs at runtime?
For more complete information about compiler optimizations, see our Optimization Notice.