• 10/30/2018
  • Public Content
Contents

Prefer Row-Wise Data Accesses

OpenCL™ enables you to submit kernels on one-, two-, or three-dimensional index space. Consider using one-dimensional ranges for reasons of cache locality and saving index computations.
If two- or three-dimensional range naturally fits your data dimensions, try to keep work-items scanning along rows, not columns. For example, the following code is not optimized (it might trigger gathers instructions):
__kernel void smooth(__constant float* input,                      uint image_width, uint image_height,                      __global float* output) {   int myX = get_global_id(1);   int myY = get_global_id(0);   int myPixel = myY * image_width + myX;   float data = input[myPixel];   … }
In this code example, the image height is the first dimension and the image width is the second dimension. The resulting column-wise data access is inefficient, since Intel® OpenCL™ implementation initially iterates over the first dimension.
Below is more optimal version, because of more memory-friendly (sequential) access.
__kernel void smooth(__constant float* input,                      uint image_width, uint image_height,                      __global float* output) {   int myX = get_global_id(0);   int myY = get_global_id(1);   int myPixel = myY * image_width + myX;   float data = input[myPixel];   … }
In the example above, the first dimension is the image width and the second is the image height.
The same rule applies if each work-item calculates several elements. To optimize performance, make sure work-items read from consecutive memory addresses.
Finally, if you run two-dimensional NDRange, prefer the data access to be consecutive along dimension zero.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804