__privatememory that is allocated to registers is typically very efficient to access. If the private memory doesn’t fit in registers, however, the performance can be very poor. Since each work-item has its own spill space for
__privatememory, there is no locality for
__privatememory accesses, and each work-item frequently accesses a unique cache line for every access to
__privatememory. For this reason, accesses to
__privatememory data that has not been allocated to registers are very slow. In most cases, the compiler can map statically-indexed private arrays into registers. Also, in some cases, it can map dynamically-indexed private arrays in registers, but the performance of this code will be slightly lower than accessing statically indexed private arrays. As such, a common optimization is to modify code to ensure private arrays are statically indexed.