• 10/30/2018
  • Public Content
Contents

Avoid Needless Synchronization

For better results, avoid explicit command synchronization primitives, such as
clEnqueueMarker
and
Barrier
. Explicit synchronization commands and event tracking result in cross-module round trips, which decrease performance. The less you use explicit synchronization commands, the better the performance is.
Use the following techniques to reduce the explicit synchronization:
  • Merge kernels whenever possible. It also improves data locality.
  • If you need to wait for a kernel to complete execution before reading the resulting buffer, continue execution until you need the first buffer with results.
  • If an in-order queue expresses the dependency chain correctly, use it to define a string of dependent kernels. In the in-order execution model, the commands in a command queue are executed in the order of submission, with each command running to completion before the next one begins. This is a typical case for a straightforward processing pipeline. Consider the following:
    • Using the blocking OpenCL™ API is more effective than explicit synchronization schemes based on OS synchronization primitives.
    • If you are optimizing the kernel pipeline, first measure kernels separately to find the most time-consuming one. Avoid calling
      clFinish
      or
      clWaitForEvents
      in the final pipeline version frequently after, for example, each kernel invocation. Prefer submitting the whole sequence (to the in-order queue) and issue
      clFinish
      once or wait on the OpenCL event object, which reduces host-device round trips.

See Also

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.