• 10/30/2018
  • Public Content
Contents

Common Mistakes in OpenCL™ Applications

This topic describes several cases of OpenCL™ applications undefined behavior that you might encounter if you do not follow the OpenCL™ specification.

Infinite Execution of OpenCL™ Kernels

  • Independent forward progress between work items within a workgroup
    Compilation of kernels assuming independent forward progress of the work-items may produce non-terminating code.
    Consider the following example:
    __kernel void kern(__local uint *flag) {     size_t id = get_local_id(0);     if (id==0) {         flag[0] = 0;     }     barrier( CLK_LOCAL_MEM_FENCE );     while ( flag[0]<1 ) {         if (id==0) {             atomic_inc( &flag[0] );         }     } }
    According to the OpenCL 2.0 specification, section 3.2.2:
    "In the absence of work-group functions (e.g. a barrier), work-items within a workgroup may be serialized. In the presence of work-group functions, work-items within a workgroup may be serialized before any given work-group function, between dynamically encountered pairs of work-group functions and between a work-group function and the end of the kernel."
    For example, if the order of work-items execution is 3, 2, 1, 0 and the work items are serialized, the first executed work item with id = 3 will never exit the loop.
    The section 3.2.2 also states:
    "The work-items within a single work-group execute concurrently but not necessarily in parallel (i.e. they are not guaranteed to make independent forward progress)."
    Redesign your code to comply with specification and not to rely on independent forward progress of work items within a work group.
  • barrier() divergence
    Compilation of kernels containing divergent barriers may produce non-terminating code.
    A branch is considered divergent when some work items within a work group take it and other do not take it.
    A simplified code looks as follows:
    __kernel void kern() {   size_t lid = get_local_id(0);   if (lid %2) { // any condition that would force only some work items to take the branch     ...     barrier();   } }
    According to the OpenCL™ 1.2 specification, section 3.4.3:
    "Note that the work-group barrier must be encountered by all workitems of a work-group executing the kernel or by none at all".
    It is harder to follow this rule if you place a barrier inside a loop. Redesign your code to avoid hitting the barrier only by a subset of work items within a work group.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804