question about native MPI

question about native MPI


I am planning to port some programs to xeon phi using native MPI. I have not tried intel MPI before, one question is about the shared main memory among cores. Traditionally, mpi is designed for distributed memory, does intel MPI provide some special features to support direct main memory access? For example, if rank0 wants to access data stored in rank1 (but essentially the data is just in the main memory), does intel MPI have some APIs can let rank0 directly access the data to avoid message passing?


2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Nearly all MPI implementations (including all of those for Intel(c) Xeon Phi(tm)) have default facility to use shared memory communication among ranks eligible to do so.  Needless to say, without this, MPI would be fairly useless on this or many other multiple core platforms.

It might be interesting to have documentation about which library function calls visible when profiling (e.g. by VTune) are associated with message passing, and to know of options to improve cache behavior.  For Intel MPI on host, there are documented options to change default thresholds for switching between cache filling and bypassing, for example.

I haven't seen written down a full set of reasons why multiple MPI processes per core aren't as effective as using the MPI_THREAD_FUNNELED model to support multiple threads per MPI process.  A problem with having multiple copies of a message in the same L2 but different L1 cache might be among them.

In practice, it's usually necessary to combine the use of multiple MPI ranks using shared memory communication with each rank using multiple threads with exclusive use of one or more cores in order to optimize performance.  We may use anything from 5 MPI ranks of 36 OpenMP threads to 30 ranks of 2-6 threads.

Leave a Comment

Please sign in to add a comment. Not a member? Join today