Modification of gen assembly and querying the time stamp register

Modification of gen assembly and querying the time stamp register


I am using OpenCL on Gen9.5 architecture and I am using i7-7700k processor. I want to have something similar to clock function inside my kernel. What I want to do is something similar to

__kernel void TimeMeasurementKernel(){
  //get current clock value
  t0 = clock();
  t1 = clock();
  time_elapsed_to_execute_operations = t1 - t0;

But it seems that intel doesn't provide any means to get the clock value from inside the OpenCL kernel. But there is a timestamp architecture register tm0 as mentioned in here. My goal is to query this timestamp register from inside the kernel. So one of the way that I thought of, if possible, then to generate the .gen assembly file first by using ioc64 offline compiler using the -asm option, modify the assembly file by introducing the timestamp register opcode and then generate the binary from the modified assembly. I would be able to load the modified binary through the  clCreateProgramWithBinary. I am not sure if this is possible. But if it is doable then I would very much appreciate if someone could provide some sort of working example as to how I can do this, specially introducing the timestamp register by modifying the generated assembly from ioc64 and then creating the binary file from the modified assembly. 

Also as I have stated my goal is to read the timestamp register, if this similar thing can be achieved by any other method then that would do as well. All I want to gather is the execution time of a portion of my code during runtime. Please let me know if any further information is required. Thank you.

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Sankha,

This is functionality we are looking to enable, but it's not quite ready yet - stay tuned.

In the meantime, are you on Windows or Linux?  If you're on Linux you may be able to generate code that accesses the timestamp register directly, so long as you are willing to compile a "debug" or "internal" version of the compiler.


One minor word of caution is that this might not quite do what you want it to do since the GPU Execution Units are multi-threaded, so be careful how you interpret the results.

Hope this helps!

Hello Ben,

Thank you so much for your reply. I am using linux (ubuntu 16.04). Working on the debug mode is fine with me. I was just wondering if you could elaborate your point more about generating the code to access the timestamp register. Also how can I compile the :"debug" / "internal" version of the compiler and how can I use it to generate code to access the timestamp register. I would very much appreciate if you could explain the point a bit more. Thank you again for all the information.

Leave a Comment

Please sign in to add a comment. Not a member? Join today