• 2019 Update 4
  • 03/20/2019
  • Public Content
Contents

Loading and Storing Data in Greatest Chunks

“Saturating” the available memory bandwidth is very important. Bytes data types actually load
integer
data types (
DWORDS
), but also trigger instructions to pack and unpack data. Using
(u)int4
or
float4
for buffers saves a lot of compute, even if you unpack data manually afterward. In other words, you should avoid using
uchar4
or
char4.
See the example below:
__kernel void amp (__constant uchar4* src, __global uchar4* dst) … uint4 tempSrc = convert_uint4(src[offset]);//Load one RGBA8 pixel … //some processing … dst[offset] = convert_uchar4(tempDst); }
Consider data accesses by using
int4
data type:
__kernel void amp (__constant uint4* src, __global uint4* dst) … uint4 tempSrc = src[offset]; // Load 4 RGBA8 pixels … //some processing in uint4 uint r0 = (tempSrc.x & 0xff);//Red component of 1st pixel uint r1 = (tempSrc.y & 0xff);//Red component of 2nd pixel … tempSrc.x >>= 8; tempSrc.y >>= 8; … tempSrc.x >>= 8; tempSrc.y >>= 8; … uint a0 = (tempSrc.x & 0xff);// Alpha component of 1st pixel uint a1 = (tempSrc.y & 0xff);// Alpha component of 2nd pixel //any calculations on the individual components … uint4 final = 0; // repack them: final.x = (r0) | ((g0) << 8) | ((b0) << 16) | ((a0) << 16);//first pixel final.y = (r1) | ((g1) << 8) | ((b1) << 16) | ((a1) << 16);//second pixel … dst[offset] = final; }
Note
The global size is 1/4th of the original size in the second example above.
If your kernel operates on floating-point data, consider using
float4
data type, which gets four times as much data in one load. It also helps to ensure that the kernel has enough work to do, amortizing the work-item scheduling overheads.
For the CPU device this optimization is equivalent to explicit (manual) vectorization, see the “Using Vector Data Types” section for more information.
Accessing data in greater chunks can improve the Intel® Graphics device data throughput, but it might slightly reduce the CPU device performance as also explained in the “Using Vector Data Types” section.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.