I try to know the number of parallel instruction involved in a OpenCL kernel regarding the kernel parameters... For instance, with 4-core Xeon, I launch 8 workgroup of 32 threads. (1 workgroup per HW thread). We have so a parallelism degree of = 8 x parallelism degree of workgroup..
What is the parallelism degree of a workgroup? I know that the code is scalarized and vectorized to fit with the xmm registers width.. And we must consider pipeline mechanism..