• 2019 Update 4
  • 03/20/2019
  • Public Content

Memory Access Overview

Optimizing memory accesses is the first step to achieving high performance with OpenCL™ on the Intel® Graphics. Tune your kernel to access memory at an optimal granularity and with optimal addresses.
The OpenCL™ implementation for the Intel® Graphics primarily accesses
global and constant
memory through the following caches:
  • GPU-specific L3 cache
  • CPU and GPU shared Last Level Cache (LLC).
Of these two caches, it is important to optimize memory accesses for the L3 cache. L3 cache line is
Finally, there are L1 and L2 caches that are specific to the sampler and renderer.
Accesses to
memory and
memory go through the L3 cache and LLC. In addition,
memory that spill from registers do the same. If multiple OpenCL work-items in the same hardware thread make requests to the same L3 cache line, these requests are collapsed to a single request. This means that the effective
memory, and
memory bandwidth is determined by the number of the accessed L3 cache lines that are accessed.
For example, if two L3 cache lines are accessed from different work items in the same hardware thread, memory bandwidth is one half of the memory bandwidth in case when only one L3 cache line is accessed.
memory is allocated directly from the L3 cache, and is divided into 16 banks at a
-bit granularity. Because it is so highly banked, it is more important to minimize bank conflicts when accessing local memory than to minimize the number of L3 cache lines accesses.
All memory can be accessed in
-bit, or
-bit quantities.
-bit quantities can be accessed as vectors of one, two, three, or four components.

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804