Using Floating Point for Calculations

Intel® Graphics device is much faster for floating-point add, sub, mul and so on in compare to the int type.

For example, consider the following code that performs calculations in type int4:

__kernel void amp (__constant uchar4* src, __global uchar4* dst)
        …
        uint4 tempSrc = convert_uint4(src[offset]);//Load one RGBA8 pixel
        //some processing
        uint4 value = (tempSrc.z + tempSrc.y + tempSrc.x);
        uint4 tempDst = value + (tempSrc - value) * nSaturation;
        //store 
        dst[offset] = convert_uchar4(tempDst);
}

Below is its float4 equivalent:

__kernel void amp (__constant uchar4* src, __global uchar4* dst)
        …
        uint4 tempSrc = convert_uint4(src[offset]);//Load one RGBA8 pixel
        //some processing
        float4 value = (tempSrc.z + tempSrc.y + tempSrc.x);
        float4 tempDst = mad(tempSrc – value,  fSaturation, value);
        //store 
        dst[offset] = convert_uchar4(tempDst);
}

Intel® Advanced Vector Extensions (Intel® AVX) support (if available) accelerates floating-point calculations on the modern CPUs, so floating-point data type is preferable for the CPU OpenCL device as well.

Note

The compiler can perform automatic fusion of multiplies and additions. Use compiler flag -cl-mad-enable to enable this optimization when compiling for both Intel® Graphics and CPU devices. However, explicit use of the "mad" built-in ensures that it is mapped directly to the efficient instruction.

For more complete information about compiler optimizations, see our Optimization Notice.