Running large scale Intel® MPI applications on Omni-Path or InfiniBand* clusters, one might have recognized an increasing time spend within the MPI_Init() routine. The reason for this behavior are some MPI runtime infrastructure management operations that are necessary in order to make sure that all MPI ranks have a common environment. Having large MPI runs with multiple thousands of ranks, these operations can consume a huge part of the MPI initialization phase time.
There are several factors which lead to the increased startup time. This includes extra communication over the PMI (Process Management Interface) before the fabric is available. In addition there are initial global- collective operations which may lead to high fabric load during the startup phase. Therefore, with growing MPI rank counts the amount of messages passed across the fabric will increase exponentially, which may cause long startup times - especially at scale.
In order to address the problem, one can reduce the startup times by switching off certain startup checks while making sure that a consistent environment of the individual ranks is given.
Before doing so however, one should make sure to use the latest Intel MPI- as well as the latest Fabric library. The latest IMPI library can be found here - https://software.intel.com/en-us/intel-mpi-library.
Having the latest libraries in place, one can start switching off certain environmental checks while making sure that all ranks share the same environment with reference to the regarding system check.
These settings, among other further fine-tuning work have enabled a simple init-finalize Intel MPI application to run through successfully on 80k Haswell cores (3k nodes) within less than 2 minutes.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804