I am trying to run an OpenCL kernel on a node with multiple CPU and GPU devices. As part of the scheduling, I use an event callback to notify when a kernel has finished, and a new kernel should be sent to the device.
events[queue_num] = cl::Event();
error = queues_all_[queue_num]->enqueueNDRangeKernel(*(stereoMatchKernels_all_[queue_num]), cl::NDRange(0), cl::NDRange(global_sizes[queue_num]), cl::NDRange(wkgrp_sizes[queue_num]), NULL, events+queue_num);
events[queue_num].setCallback(CL_COMPLETE, &signal_complete, (void *)(available_devices + queue_num));
void CL_CALLBACK OpenCLPreparation::signal_complete(cl_event event, cl_int status, void* data)
/*Do something */
This works without a problem on the GPU devices, but the callback is only reached for a kernel sent to the CPU after a queue.finish() has been called. Until then, the CPU sits unused. This has been tested with both an i7 930, and dual Xeon 5600 under 64 bit Linux.
Is there some problem with either the event callbacks, or the command queues in the Intel OpenCL SDK?