I'm running a kernel on gpu intel graphics device(Intel® HD Graphics 630 - latest driver) with such settings.
enqueue_kernel can fail due to lack of resources. It's ok but I don't understand why it fails so early. It fails when number of global items is more then 256 - only one dimension. Why not 256 * 256 * 256?
The second problem is that I don't know how to deal when it fails. I tried to check returned value in a loop like that - while(0 != enqueue_kernel(.. but the program hangs.
Here is my experiment - https://github.com/OmegaDoom/enque_kernel_test