Developer Guide

Synchronization among Threads in a Kernel

There are a variety of ways in which the work-items in a kernel can synchronize to exchange data, update data, or cooperate with each other to accomplish a task in a specific order. These are:
Accessor classes
Accessor classes specify acquisition and release of buffer and image data structures. Depending on where they are created and destroyed, the runtime generates appropriate data transfers and synchronization primitives.
Atomic operations
DPC++ devices support a restricted subset of C++ atomics.
Fences
Fence primitives are used to order loads and stores. Fences can have acquire semantics, release semantics, or both.
Barriers
Barriers are used to synchronize sets of work-items within individual groups.
Hierarchical parallel dispatch
In the hierarchical parallelism model of describing computations, synchronization within the work-group is made explicit through multiple instances of the
parallel_for_work_item
function call, rather than through the use of explicit work-group barrier operations.
Device event
Events are used inside kernel functions to wait for asynchronous operations to complete.
In many cases, any of the above synchronization events can be used to achieve the same functionality but with significant differences in efficiency and performance.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.