I have a doubt about the mapping of work-items with Hardware threads, in my understanding, each work-item is mapped to one hardware threads, but when I read the Optimization guide I found this Note :
Work-group size of 16 work-items is enough if you do not ask for SLM. Then each work-group maps to each hardware thread.
in this case a work-group will be mapped to a hardware thread, now I can assume that all computations on the kernel are scalar, my question is: if I use vector operations, is this mapping still correct ? if yes , how this can be done (I guess the compiler scallarize all vector opérations) ?
I'm using OpenCL for intel HD Graphics not CPU.
Thanks in advance,