I have been using the KMP_AFFINITY envvar to display and set the affinity settings for the OMP threads in MPI code whose tasks use several OMP threads each.
I have noticed that when Intel MPI is used (and the KMP_AFFINITY setting requires pinning of OMP threads), the OMP library "knows" to pin OMP threads belogning to different mpi tasks onto disjoint sets of cores. However, when I try to do this on the same code compiled against a non-Intel MPI stack, the OMP runtime pins the OMP threads to the same cores for all MPI tasks running on the same node.
Is there any way to instruct the OMP runtime to pin OMP threads in a more reasonable way for the non-Intel MPI case? How could I replicate the behavior of OMP runime when it works under Intel MPI tasks vs non Intel ones?
For example, assume a 2 socket SMP node with 4 or 6 cores / socket, how would I ask the OMP run-time to bind the OMP threads which task k uses only to the sockets (or cores) the associated tasks are supposed to run only?
Another Q: the KMP_AFFINITY also directly affects MKL's behavior, correct?
Uur Intel MPI v4.0.0.028 ;