Using KMP_AFFINITY to create OpenMP* thread mapping to OS proc IDs



The Intel® Compiler's OpenMP* runtime library has ability to bind OpenMP* threads to physical processing units.

This article will show you how to use KMP_AFFINITY environment variable extensions from the Intel Compiler, the high-level affinity interface, to determine the machine topology and assigns OpenMP* threads to the processors based upon their physical location in the machine.


Example 1:

You have one quad core system with Intel® Hyper-Threading Technology (Intel®HT Technology) enabled.  By default, Intel compiler OpenMP* runtime libraries will create 8 threads, running freely on 8 logical processors provided by the operating system.

Now you want the OpenMP* thread ID 0 running exclusively on OS proc ID 0, and OpenMP* thread ID 1 on OS proc ID 1 only.  All other OpenMP* threads ID 2-7 could run on any OS proc ID betwen 2 to 7.

To do this, we need add the modifier proclist into Intel defined environment variable KMP_AFFINITY before the program is executed.

Another modifier verbose tells the Intel OpenMP* runtime libraries to print out messages concerning the supported affinity, including information about the number of packages, number of cores in each package, number of thread contexts for each core, and OpenMP* thread bindings to physical thread contexts.

Here are the commands we used in Windows*:


> icl /Qpenmp testcase.cpp

> set KMP_AFFINITY=verbose,granularity=fine,proclist=[0,1,{2,3,4,5,6,7},{2,3,4,5,6,7},{2,3,4,5,6,7},{2,3,4,5,6,7},{2,3,4,5,6,7},{2,3,4,5,6,7},{2,3,4,5,6,7}],explicit

> testcase.exe
......
OMP: Info #204: KMP_AFFINITY: decoding cpuid leaf 11 APIC ids.
OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7}
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0}
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {2,3,4,5,6,7}
......


For Linux*, you should use export command instead (with quotation marks)

]$ export KMP_AFFINITY="verbose,granularity=fine,proclist=[0,1,{2,3,4,5,6,7},{2,3,4,5,6,7},{2,3,4,5,6,7},{2,3,4,5,6,7},{2,3,4,5,6,7},{2,3,4,5,6,7},{2,3,4,5,6,7}],explicit"



From the compiler output message, you will see that OS processors specified in the list are assigned to OpenMP* threads, in order of OpenMP* Global Thread IDs.  If more OpenMP* threads are created than there are elements in the list, then the assignment occurs modulo the size of the list. That is, OpenMP* Global Thread ID n is bound to list element n mod <list_size>.

Remember, information about binding OpenMP* threads to physical thread contexts is indirectly shown in the form of the mappings between hardware thread contexts and the operating system (OS) processor (proc) IDs. The affinity mask for each OpenMP* thread is printed as a set of OS processor IDs.


More information is discussed on "Thread Affinity Interface (Linux* and Windows*)" section in Intel C/C++ Compiler and Intel Fortran Compiler user and reference guides.

Einzelheiten zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.