• 2019 Update 4
  • 03/20/2019
  • Public Content
Contents

Use Row-Wise Data Accesses

OpenCL™ enables you to submit kernels on one-, two- or three-dimensional index space. Consider using one-dimensional ranges for cache locality and to save index computations.
If a two- or three-dimensional range naturally fits your data dimensions, try to keep work-items scanning along rows, not columns. For example:
__kernel void smooth(const __global float* input, uint image_width, uint image_height, __global float* output) { int myX = get_global_id(
0
); int myY = get_global_id(
1
); int myPixel = myY * image_width + myX; float data = input[myPixel]; … }
In the example above, the first dimension is the image width and the second is the image height. The following code is less effective:
__kernel void smooth(const __global float* input, uint image_width, uint image_height, __global float* output) { int myY = get_global_id(
0
); int myX = get_global_id(
1
); int myPixel = myY * image_width + myX; float data = input[myPixel]; … }
In the second code example, the image height is the first dimension and the image width is the second dimension. The resulting column-wise data access is inefficient, since CPU OpenCL™ framework initially iterates over the first dimension.
The same rule applies if each work-item calculates several elements. To optimize performance, make sure work-items read from consecutive memory addresses.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.