Introduction for OpenCL™ Coding on Intel® Architecture Processors
Each work group is assigned to one thread that loops over all work items
within the work group with SIMD. So you have parallelism at the work-group
level (vector instructions) and parallelism between work-groups (threading).
Generally, you think in terms of “total work” first, which is “global
work size” in OpenCL™ notion. Recall that
global-size = work-group-size*number-of-work-groups
.
Thus, understanding the trade-offs between work-group size and number
of work-groups is very important for both types of parallelism that we
just discussed. In this section certain general recommendations are provided.