Developer Guide

Contents

Managing Multi-core Performance

You can obtain best performance on systems with multi-core processors by requiring that
threads do not migrate from core to core. To do this, bind threads to the CPU cores by
setting an affinity mask to threads. Use one of the following options:
  • OpenMP facilities (if available), for example, the
    KMP_AFFINITY
    environment variable using the Intel OpenMP library
  • A system function, as explained below
  • Intel TBB facilities (if available), for example, the
    tbb::affinity_partitioner
    class (for details, see https://www.threadingbuildingblocks.org/documentation)
Consider the following performance issue:
  • The system has two sockets with two cores each, for a total of four cores (CPUs).
  • The application sets the number of OpenMP threads to
    four and calls an
    Intel® MKL
    LAPACK routine
    . This call takes considerably different amounts of time from run to run.
To resolve this issue, before calling
Intel® MKL
, set an affinity mask for each OpenMP thread using the
KMP_AFFINITY
environment variable or the
SetThreadAffinityMask 
system function. The following code example shows how to resolve the issue by setting an affinity mask by operating system means using the Intel compiler. The code calls the function
SetThreadAffinityMask 
to bind the threads to
appropriate
cores
,
preventing migration of the threads
. Then the
Intel® MKL
LAPACK routine
is called:
// Set affinity mask #include <windows.h> #include <omp.h> int main(void) { #pragma omp parallel default(shared) { int tid = omp_get_thread_num(); // 2 packages x 2 cores/pkg x 1 threads/core (4 total cores) DWORD_PTR mask = (1 << (tid == 0 ? 0 : 2 )); SetThreadAffinityMask( GetCurrentThread(), mask ); } // Call Intel MKL LAPACK routine return 0; }  
Compile the application with the Intel compiler using the following command:
icl /Qopenmp test_application.c
where
test_application.c
is the filename for the application.
Build the application. Run it in
four
threads, for example, by using the environment
variable to set the number of threads:
set OMP_NUM_THREADS=4 test_application.exe
See
Windows API documentation at msdn.microsoft.com/
for
the restrictions on the usage of Windows API routines and
particulars of the
SetThreadAffinityMask 
function used in the above example.
See also a similar example at en.wikipedia.org/wiki/Affinity_mask
.
Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
This notice covers the following instruction sets: SSE2, SSE4.2, AVX2, AVX-512.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804