Using the Intel® MPI Library along with the offload capabilities of the Intel® Xeon Phi™ coprocessor allows a user to access the capabilities of the coprocessor without the need for direct filesystem access on the coprocessor. This can lead to administrative benefits by not requiring additional security on the coprocessor's filesystem.
Offloading of MPI functions
Calling MPI functions within an offload region is not supported.
Offloading within MPI applications
The offload programming model is supported by the Intel® MPI Library. However, no attempt is made to coordinate coprocessor resource usage amongst the MPI ranks. For example, if you are running 12 ranks on a node, with each rank offloading 16 threads to the coprocessor, by default each rank will offload to the first 16 threads of the first coprocessor. Obviously, this will very quickly lead to resource conflicts. Additionally, there is a performance penalty when having multiple ranks offload simultaneously to a single coprocessor. To avoid these conflicts, there are two approaches that can be used.
By only running one offloading rank (or less) per coprocessor, there is no chance of multiple ranks offloading to the same coprocessor. For example, on a system with only one coprocessor, only one rank should offload. If a system has two coprocessors, one or two ranks can offload (as long as the ranks are offloading to different coprocessor. This requires a code to either only run with one rank per coprocessor or to be heterogeneous, with processes arranged to avoid multiple offloads to a single coprocessor. This method is more restrictive, but can be easily implemented.
Setting the pinning on a per-process basis will allow control of where each thread is offloaded. This method can completely prevent core oversubscription, but also requires significant manual setup specific to the run configuration. For more information on how to do this, please see http://software.intel.com/en-us/articles/openmp-thread-affinity-control. Keep in mind that this method can still incur the performance penalty from multiple ranks offloading simultaneously to a single coprocessor.
For more information about how to use offloading, see the Intel® Compiler Documentation.