I wanted to clarify how work groups are mapped to HD 4000 hardware. My understanding is that a workgroup maps to a single EU, and each EU can run multiple workgroups in parallel.
A workgroup is executed within a half-slice (a collection of EUs). Multiple workgroups can be executed on the same half-slice. So your assumption may or may not be correct.
The order in which a work item within a work group gets distributed is - SIMD unit - spread accross EUs - spread accross threads
Interesting. It is not 100% clear to me right now, but I will look at the docs again and attend the webinar and potentially come back to this question later if it is still not clear :)