• 2019 Update 4
  • 03/20/2019
  • Public Content
Contents

Using Floating Point for Calculations

Intel® Graphics device is much faster for floating-point
add
,
sub
,
mul
and so on in compare to the
int
type.
For example, consider the following code that performs calculations in type
int4
:
__kernel void amp (__constant uchar4* src, __global uchar4* dst) … uint4 tempSrc = convert_uint4(src[offset]);//Load one RGBA8 pixel //some processing uint4 value = (tempSrc.z + tempSrc.y + tempSrc.x); uint4 tempDst = value + (tempSrc - value) * nSaturation; //store dst[offset] = convert_uchar4(tempDst); }
Below is its
float4
equivalent:
__kernel void amp (__constant uchar4* src, __global uchar4* dst) … uint4 tempSrc = convert_uint4(src[offset]);//Load one RGBA8 pixel //some processing float4 value = (tempSrc.z + tempSrc.y + tempSrc.x); float4 tempDst = mad(tempSrc – value, fSaturation, value); //store dst[offset] = convert_uchar4(tempDst); }
Intel® Advanced Vector Extensions (Intel® AVX) support (if available) accelerates floating-point calculations on the modern CPUs, so floating-point data type is preferable for the CPU OpenCL device as well.
Note
The compiler can perform automatic fusion of multiplies and additions. Use compiler flag
-cl-mad-enable
to enable this optimization when compiling for both Intel® Graphics and CPU devices. However, explicit use of the "mad" built-in ensures that it is mapped directly to the efficient instruction.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.