For best results, try to avoid explicit command synchronization primitives,
Barrier. Explicit synchronization
commands and event tracking result in cross-module round trips, which
decrease performance. The less you use explicit synchronization commands,
the better the performance.
Use the following techniques to reduce explicit synchronization:
clWaitForEventscalls between command submissions.
clFlushto issue all previously queued commands in a command queue, and do something useful in the host thread rather than blocking on
clFinishand waiting for results. For CPU OpenCL* device, using
clFinishis more effective than using the
clWaitForEventsblocks the underlying thread, whereas
clFinishenables the thread to participate in kernels execution.
clWaitForEventsfrequently (for example, after each kernel invocation) in the final pipeline version. Prefer to submit the whole sequence (to the in-order queue) and issue
clFinish(or wait on the event) once. This reduces host-device round trips.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804