OpenCL equivalent of CUDA warp vote functions

OpenCL equivalent of CUDA warp vote functions

Do equivalents exist for CUDA __all and __any methods?


3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

As far as I can tell, for OpenCL 1.x on the CPU you can use an explicit vector type coding style (see the Optimization Guide) and then use the any(vecn) and all(vecn) relational functions.   I see no equivalent to the ballot() function. 

For OpenCL 2.x there are both work_group and sub_group any/all/broadcast/reduce/scan functions.  Sub-groups don't appear to be supported yet in the CPU driver.

For the HD Graphics IGP I don't see a high performance way to do this without support for a vector type coding style.  Using shared local memory to implement ANY and ALL is probably the approach everyone is taking.  I haven't tested it but it might be safe to implement a fast ANY/ALL within a SIMD8/16/32 group of items as they're probably executed in lock-step.  But for anything wider (or if the previous is illegal) then you can implement it the old fashioned way.

I would be interested if anyone else had suggestions, tips, tricks for writing fast HD Graphics IGP code!

Thanks, allanmac. I didn't know about the opencl 2.0 functions.

I think that I will wait for the 2.0 release to try this feature. AMD is planning on releasing 2.0 support

later this year.


Leave a Comment

Please sign in to add a comment. Not a member? Join today