OpenCL equivalent of CUDA warp vote functions

OpenCL equivalent of CUDA warp vote functions

Do equivalents exist for CUDA __all and __any methods?


publicaciones de 3 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

As far as I can tell, for OpenCL 1.x on the CPU you can use an explicit vector type coding style (see the Optimization Guide) and then use the any(vecn) and all(vecn) relational functions.   I see no equivalent to the ballot() function. 

For OpenCL 2.x there are both work_group and sub_group any/all/broadcast/reduce/scan functions.  Sub-groups don't appear to be supported yet in the CPU driver.

For the HD Graphics IGP I don't see a high performance way to do this without support for a vector type coding style.  Using shared local memory to implement ANY and ALL is probably the approach everyone is taking.  I haven't tested it but it might be safe to implement a fast ANY/ALL within a SIMD8/16/32 group of items as they're probably executed in lock-step.  But for anything wider (or if the previous is illegal) then you can implement it the old fashioned way.

I would be interested if anyone else had suggestions, tips, tricks for writing fast HD Graphics IGP code!

Thanks, allanmac. I didn't know about the opencl 2.0 functions.

I think that I will wait for the 2.0 release to try this feature. AMD is planning on releasing 2.0 support

later this year.


Deje un comentario

Por favor inicie sesión para agregar un comentario. ¿No es socio? Únase ya