• 2019 Update 4
  • 03/20/2019
  • Public Content
Contents

__private Memory

__private
memory that is allocated to registers is typically very efficient to access. If the private memory doesn’t fit in registers, however, the performance can be very poor. Since each work-item has its own spill space for
__private
memory, there is no locality for
__private
memory accesses, and each work-item frequently accesses a unique cache line for every access to
__private
memory. For this reason, accesses to
__private
memory data that has not been allocated to registers are very slow. In most cases, the compiler can map statically-indexed private arrays into registers. Also, in some cases, it can map dynamically-indexed private arrays in registers, but the performance of this code will be slightly lower than accessing statically indexed private arrays. As such, a common optimization is to modify code to ensure private arrays are statically indexed.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804