Developer Guide and Reference

Contents

Controlling Thread Allocation

The
KMP_HW_SUBSET
and
KMP_AFFINITY
environment variables allow you to control how the OpenMP* runtime uses the hardware threads on the processors. These environment variables allow you to try different thread distributions on the cores of the processors and determine how these threads are bound to the cores. You can use the environment variables to work out what is optimal for your application.
The
KMP_HW_SUBSET
variable controls the allocation of hardware resources and the
KMP_AFFINITY
variable controls how the OpenMP threads are bound to those resources.

Controlling Thread Distribution

The
KMP_HW_SUBSET
variable controls the hardware resource that will be used by the program. This variable specifies the number of sockets to use, how many cores to use per socket and how many threads to assign per core. While specifying two threads per core often yields better performance than one thread per core, specifying three or four threads per core may or may not improve the performance. This variable enables you to conveniently measure the performance of up to four threads per core.
For example, you can determine the effects of assigning 24, 48, 72, or the maximum 96 OpenMP threads in a system with 24 cores by specifying the following variable settings:
To Assign This Number of Threads ...
... Use This Setting
24
KMP_HW_SUBSET=24c,1t
48
KMP_HW_SUBSET=24c,2t
72
KMP_HW_SUBSET=24c,3t
96
KMP_HW_SUBSET=24c,4t
Take care when using the
OMP_NUM_THREADS
variable along with this variable. Using the
OMP_NUM_THREADS
variable can result in over or under subscription.

Controlling Thread Bindings

The
KMP_AFFINITY
variable controls how the OpenMP threads are bound to the hardware resources allocated by the
KMP_HW_SUBSET
variable. While this variable can be set to several binding or affinity types, the following are the recommended affinity types to use to run your OpenMP threads on the processor:
  • compact
    : sequentially distribute the threads among the cores that share the same cache.
  • scatter
    : distribute the threads among the cores without regard to the cache.
The following table shows how the threads are bound to the cores when you want to use three threads per core on two cores by specifying
KMP_HW_SUBSET=2c,3t
:
Affinity
OpenMP Threads on Core 0
OpenMP Threads on Core 1
KMP_AFFINITY=compact
0, 1, 2
3, 4, 5
KMP_AFFINITY=scatter
0, 2, 4
1, 3, 5

Determining the Best Setting

To determine the best thread distribution and bindings using these variables, use the following:
  1. Ensure that your OpenMP code is working properly before using these environment variables.
  2. Establish a baseline with your current OpenMP code to compare to the performance when you allocate the threads to a processor.
  3. Measure the performance of distributing one, two, three, or four threads per core by use the
    KMP_HW_SUBSET
    variable.
  4. Measure the performance of binding the threads to the cores by using the
    KMP_AFFINITY
    variable.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804