• 2019 Update 4
  • 03/20/2019
  • Public Content
Contents

Kernel Memory Access Optimization Summary

A kernel should access at least 32-bits of data at a time, from addresses that are aligned to 32-bit boundaries. A
char4
,
short2
,
int
, or
float
counts as
32
-bits of data. If you can, load two, three, or four
32
-bit quantities at a time, which may improve performance. Loading more than four
32
-bit quantities at a time may reduce performance.
Optimize
__global
memory and
__constant
memory accesses to minimize the number of cache lines read from the L3 cache. This typically involves carefully choosing your work-group dimensions, and how your array indices are computed from the work-item local or global id.
If you cannot access
__global
memory or
__constant
memory in an optimal manner, consider moving part of your data to
__local
memory, where more access patterns can execute with full performance.
Local memory is most beneficial when the access pattern favors the banked nature of the SLM hardware.
Optimize
__local
memory accesses to minimize the number of bank conflicts. Reading the same address from the same bank is OK, but reading different addresses from the same bank results in a bank conflict. Writes to the same bank always result in a bank conflict, even if the writes are going to the same address. Consider adding a column to two-dimensional local memory arrays if it avoids bank conflicts when accessing columns of data.
Avoid dynamically-indexed
__private
arrays if possible.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804