I have tried to make some OpenCL-related performance optimization for Intel devices. I want to use vectorization and vector data type with optimal lenght for specified device. I called clGetDeviceInfo(.., CL_DEVICE_PREFERRED_VECTOR_WIDTH, ..) method, but it returns not really optimal values: 

uchar - 1
short - 1
int - 1
float - 1

I checked it on GPU Intel HD4600 and CPU Intel Core i5-4570.

I have tried to find the optimal value of the vector length for my problem and got following values:

uchar - 16
short - 8
int - 1
float - 1

If I use uchar16 instead uchar I get x3 acceleration.

I have two question:

1. Why is clGetDeviceInfo(.., CL_DEVICE_PREFERRED_VECTOR_WIDTH, ..) return these values?

2. Is it possible to change these values in future releases? This will make possible to do cross-platform optimization.



4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Best Reply

Hi Alexander,

You are right - Intel OpenCL devices prefer scalar values as they assume that internal autovectorization will produce better results in most cases. And you are right once more - there are cases where internal autovectorization fails and manual tuning produce better results.

Please check for more info


Hi Dmitry,

Thanks for clarification!

Intel OpenCL devices prefer scalar values as they assume that internal autovectorization

The caveat: compiler does best job when vectorizing for 32 bits types (like int and float). In contrast for char/uchar using the short vectors like uchar4 explicitly might be more performant as it  better coalesces the memory accesses (since with uchar4/uchar8/etc you operate on aligned data chunks) and also better amortizes the work-item scheduling costs (since you process multiple pixels simultaneously).

Leave a Comment

Please sign in to add a comment. Not a member? Join today