• 2019 Update 4
  • 03/20/2019
  • Public Content
Contents

Using Specialization in Branching

You can improve the performance of both CPU and Intel® Graphics devices by converting the uniform conditions that are equal across all work-items into compile time branches, a techniques known as specialization.
The approach, which is sometimes referred as Uber-Shader in the pixel shader context, is to have a single kernel that implements all needed behaviors, and to let the host logic disable the paths that are not currently required. However, setting constants to branch on calculations wastes the device facilities, as the data is still being calculated before it is thrown away. Consider a preprocess approach instead, using
#ifndef
blocks.
Original kernel that uses constants to branch:
__kernel void foo(__constant int* src, __global int* dst, unsigned char bFullFrame, unsigned char bAlpha) { … if(bFullFrame)//uniform condition (equal for all work-items { … if(bAlpha) //uniform condition { … } else { … } else { … } }
The same kernel with compile time branches:
__kernel void foo(__constant int* src, __global int* dst) { … #ifdef bFullFrame { … #ifdef bAlpha { … } #else { … } #endif #else { … } #endif }
Also consider similar optimization for other constants.
Finally, avoid or minimize use of branching in short computations with using
min
,
max
,
clamp
or
select
built-ins instead of “
if
and
else
”.
Also, optimizing specifically for the OpenCL™ Intel Graphics device, ensure all conditionals are evaluated outside of code branches (for the CPU device it does not make any difference).
For example, the following code demonstrates conditional evaluation in the conditional blocks:
if(x && y || (z && functionCall(x, y, z)) { // do something } else { // do something else }
The following code demonstrates the conditional evaluation moved outside of the conditional blocks:
//improves compilation time for Intel® Graphics device bool comparison = x && y || (z && functionCall(x, y, z)); if(comparison) { // do something } else { // do something else }
See Also

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804