• 2019 Update 4
  • 03/20/2019
  • Public Content
Contents

Use Row-Wise Data Accesses

OpenCL™ enables you to submit kernels on one-, two- or three-dimensional index space. Consider using one-dimensional ranges for cache locality and to save index computations.
If a two- or three-dimensional range naturally fits your data dimensions, try to keep work-items scanning along rows, not columns. For example:
__kernel void smooth(const __global float* input, uint image_width, uint image_height, __global float* output) { int myX = get_global_id(
0
); int myY = get_global_id(
1
); int myPixel = myY * image_width + myX; float data = input[myPixel]; … }
In the example above, the first dimension is the image width and the second is the image height. The following code is less effective:
__kernel void smooth(const __global float* input, uint image_width, uint image_height, __global float* output) { int myY = get_global_id(
0
); int myX = get_global_id(
1
); int myPixel = myY * image_width + myX; float data = input[myPixel]; … }
In the second code example, the image height is the first dimension and the image width is the second dimension. The resulting column-wise data access is inefficient, since CPU OpenCL™ framework initially iterates over the first dimension.
The same rule applies if each work-item calculates several elements. To optimize performance, make sure work-items read from consecutive memory addresses.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804