• 10/30/2018
  • Public Content
Contents

Avoid Needless Synchronization

For better results, avoid explicit command synchronization primitives, such as
clEnqueueMarker
and
Barrier
. Explicit synchronization commands and event tracking result in cross-module round trips, which decrease performance. The less you use explicit synchronization commands, the better the performance is.
Use the following techniques to reduce the explicit synchronization:
  • Merge kernels whenever possible. It also improves data locality.
  • If you need to wait for a kernel to complete execution before reading the resulting buffer, continue execution until you need the first buffer with results.
  • If an in-order queue expresses the dependency chain correctly, use it to define a string of dependent kernels. In the in-order execution model, the commands in a command queue are executed in the order of submission, with each command running to completion before the next one begins. This is a typical case for a straightforward processing pipeline. Consider the following:
    • Using the blocking OpenCL™ API is more effective than explicit synchronization schemes based on OS synchronization primitives.
    • If you are optimizing the kernel pipeline, first measure kernels separately to find the most time-consuming one. Avoid calling
      clFinish
      or
      clWaitForEvents
      in the final pipeline version frequently after, for example, each kernel invocation. Prefer submitting the whole sequence (to the in-order queue) and issue
      clFinish
      once or wait on the OpenCL event object, which reduces host-device round trips.

See Also

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804