Profiling Operations Using OpenCL* Profiling Events

The following code snippet measures kernel execution using the OpenCL* profiling events (error handling is omitted):

g_cmd_queue = clCreateCommandQueue(…CL_QUEUE_PROFILING_ENABLE, NULL);
clEnqueueNDRangeKernel(g_cmd_queue,…, &perf_event);
clWaitForEvents(1, &perf_event);
cl_ulong start = 0, end = 0;
clGetEventProfilingInfo(perf_event, CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &start, NULL);
clGetEventProfilingInfo(perf_event, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &end, NULL);
//END-START gives you hints on kind of “pure HW execution time”
//the resolution of the events is 1e-09 sec
g_NDRangePureExecTimeMs = (cl_double)(end - start)*(cl_double)(1e-06); 

Important caveats:

This way you can profile operations on both Memory Objects and Kernels. Refer to section 5.12 of the OpenCL* 1.1 Specification for the detailed description of profiling events.

NOTE: The host-side wall-clock time might return different results. For CPU the difference is typically negligible.

See Also


Comparing OpenCL* Kernel Performance with Performance of Native Code (suggested next topic)
Related Documents
The OpenCL* 1.1 Specification at http://www.khronos.org portal [PDF]
Overview Presentations of the OpenCL* Standard at http://www.khronos.org portal [Online Article]

Submit feedback on this help topic

Copyright © 2010-2012, Intel Corporation. All rights reserved.