Each work group is assigned to one thread that loops over all work items within the work group with SIMD. So you have parallelism at the work-group level (vector instructions) and parallelism between work-groups (threading).
Generally, you think in terms of “total work” first, which is “global work size” in OpenCL™ notion. Recall that
global-size = work-group-size*number-of-work-groups. Thus, understanding the trade-offs between work-group size and number of work-groups is very important for both types of parallelism that we just discussed. In this section certain general recommendations are provided.