Use Floating Point for Calculations
Intel® Xeon® processors significantly accelerate floating-point calculations
on the device.
Consider the following code snippet that performs calculations in
int
:__kernel void scale (__constant uchar* srcA, __constant uchar* srcB, __constant uchar nSaturation, __global uchar* dst) int offset = get_global_id(); uint tempSrcA = convert_uint(srcA[offset]);//Load one RGBA8 pixel uint tempSrcB = convert_uint(srcB[offset]);//Load one RGBA8 pixel //some processing uint tempDst = (tempSrcA - tempSrcB) * nSaturation; //store dst[offset] = convert_uchar(tempDst); }
The following example uses the
float
equivalent:__kernel void scale (__constant uchar* srcA, __constant uchar* srcB, __constant uchar nSaturation, __global uchar* dst) int offset = get_global_id(); float tempSrcA = convert_float(srcA[offset]);//Load one RGBA8 pixel float tempSrcB = convert_float(srcB[offset]);//Load one RGBA8 pixel //some processing float tempDst = (tempSrcA - tempSrcB) * nSaturation; //store dst[offset] = convert_uchar(tempDst); }
Using built-in functions improves performance. See the Use
Built-In Functions section for more information.
NOTE
: The compiler is capable of automatic
fusion of multiplies and adds. Use the -cl-mad-enable
compiler
flag to enable this optimization when compiling. Still, using explicit
"mad" built-in ensures that the built-in is mapped directly
to the efficient instruction.