Do equivalents exist for CUDA __all and __any methods?
As far as I can tell, for OpenCL 1.x on the CPU you can use an explicit vector type coding style (see the Optimization Guide) and then use the any(vecn) and all(vecn) relational functions. I see no equivalent to the ballot() function.
For OpenCL 2.x there are both work_group and sub_group any/all/broadcast/reduce/scan functions. Sub-groups don't appear to be supported yet in the CPU driver.
For the HD Graphics IGP I don't see a high performance way to do this without support for a vector type coding style. Using shared local memory to implement ANY and ALL is probably the approach everyone is taking. I haven't tested it but it might be safe to implement a fast ANY/ALL within a SIMD8/16/32 group of items as they're probably executed in lock-step. But for anything wider (or if the previous is illegal) then you can implement it the old fashioned way.
I would be interested if anyone else had suggestions, tips, tricks for writing fast HD Graphics IGP code!
Thanks, allanmac. I didn't know about the opencl 2.0 functions.
I think that I will wait for the 2.0 release to try this feature. AMD is planning on releasing 2.0 support
later this year.