• 10/30/2018
  • Public Content
Contents

Use Branching Accurately

You can improve the performance of the Intel® Core™ and Intel® Xeon® processors by converting the uniform conditions that are equal across all work-items into compile time branches.
According to this approach, you have a single kernel that implements all desired behaviors, and let the host logic disable the paths that are not currently required. However, setting constants to branch on calculations wastes the device facilities, as the data is still being calculated before it discarded. Consider a preprocessor directives-based approach instead - use
#ifndef
blocks.
Consider the example where the original kernel uses constants for branching:
__kernel void foo(__constant int* src,                                                            __global int* dst,                                                                                         unsigned char bFullFrame, unsigned char bAlpha) {         …         if(bFullFrame)//uniform condition (equal for all work-items         {                                …                 if(bAlpha) //uniform condition                 {                        …                                }                 else                 {                 …                 }         else         {         …         } }
Now consider the same kernel, but with use of compile time branches (“specialization” technique):
__kernel void foo(__constant int* src,                                                            __global int* dst) {         …         #ifdef bFullFrame         {                                …                 #ifdef bAlpha                 {                        …                                }                 #else                 {                 …                 }                 #endif         #else         {         …         }         #endif } }
Also consider similar optimization for other constants.
Minimize or, in best case, avoid using branching in short computations with
min
,  
max
,  
clamp
, and select built-ins instead of
if
and
else
clauses.
Move memory accesses that are common to the
then
and
else
blocks outside of the conditional code.
Consider the original code with use of the
if
and
else
clauses:
if (…) {//condition         x = A[i1];// reading from A         … // calculations         B[i2] = y;// storing into B } else {          q = A[i1];// reading from A with same index as in first clause          …  // different calculations          B[i2] = w; // storing into B with same index as in first clause }
Now consider the optimized code that uses temporary variables:
temp1 = A[i1]; //reading from A in advance if (…) {//condition         x = temp1;         … // some calculations         temp2 = y; //storing into temporary variable } else {         q = temp1;         … //some calculations         temp2 = w; //storing into temporary variable } B[i2] =temp2; //storing to B once

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.