I originally tried the SDK on Linux on a dual-socket Harpertown, where CL reports the preferred & native width for all datatypes as 16 bytes, i.e. from 16 chars to 2 doubles. That's what fit inside a XMM register, which is what I expected.
But after checking the values on a Sandy Bridge CPU (i5-2400), I get the same preferred & native sizes. This seems strange to me, as my understanding of the architecture is that if the dataset is large enough and floating-point, one should go for AVX instead of SSE. There is very little support for integer stuff in YMM registers, so I understand that char/short/int/long are still 16/8/4/2, but shouldn't float/double be 8/4 rather than 4/2?
Is it deliberate and if so why, or is it just a case of "we haven't had time to implement it yet"?