Perform Initialization in a Separate Task

Consider the following code snippet:

__kernel void something(const __global int* data)
{
  int tid = get_global_id(0);
  if (0 == tid)
  {
    //Do some one-shot work
  }
  barrier(CLK_GLOBAL_MEM_FENCE);
  //Regular kernel code
}

In this example, all work-items encounter the first branch, while the branch is relevant to only one of them. A better solution is to move the initialization phase outside the kernel code, either to a separate kernel, or to the host code.

If you need to run some kernel only once (for a single work-item), use clEnqueueTask, which is specially crafted for this purpose.

For more complete information about compiler optimizations, see our Optimization Notice.