RDTSC() for measuring latency of an operation

RDTSC() for measuring latency of an operation


I am trying to measure the latency of an operation by using rdtsc().
The problem I am facing is that the latency of that operation or number of cycles taken by that operation remains the same even when I change the frequency of the processor core from 3 Ghz to 2 Ghz. In other words there is no effect on output of rdtsc when I change the frequency
Can anyone please tell me why this is happening.

Thank You.

9 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

RDTSC does count cycles. If the frequency is 2 GHz, a cycle lasts 0.5 ns, with 3 GHz itlasts 0.33 ns.

When you measurethe latency in cycles, as you do when using RDTSC, it must remain the same. To convert the result into seconds you need to divide the cycles through the clock frequency.

Thank For your reply.
Actally I am trying to measure the latency of a DVFS switch. I check the rdtsc() before and after changing the frequency and subtract those 2 values to get the number of cycles. The problem is, the number of cycles do not change. I mean the rdtsc() is fixed for 3Ghz and does not change with frequency. I am not able to understand why this is happening.

rdtsc counts front side or QPI buss clock ticks, and multiplies those by the default multiplier (presumably in accordance with 3Ghz in your case). The last Intel CPU where it measured CPU clock ticks rather than buss clock time was Northwood (ia32 only). rdtsc latency is at least 6 CPU clock ticks, depending on CPU model. If you serialize so as to control exactly what you are measuring, you will be measuring a significant additional time for the serialization.
You should at least check the granularity you are seeing for rdtsc by writing a tight loop which does rdtsc repeatedly and finds the smallest time interval by which it changes reliably. According to the tests in the classic Livermore Fortran Kernel, the granularity is nearly 10^-7 sec (100 nsec) on most IA CPUs.

I understand what you are saying. But I am just trying to measure the latency of a frequency DVFS switch by using rdtsc(). When I use sleep(1) and use rdtsc() to measure the cycles spent in 1 second, the reuslt is the same which I get at 3Ghz and 2 Ghz. The result which I get is around 2.97*10^9 cycles, which I would exepct for 3 GHz. But why this also shows up at 2 GHz I don't understand. For 2 GHZ shouldn't it be 2*10^9 cycles approx? Or is the rdtsc() always counts the cycles in terms of highest frequency of the system?

You should interpret it as measuring elapsed time in terms of FSB or QPI with the built-in nominal multiplier, e.g. the one in the cpuid text string, if any. If the events aren't affected by clock multiplier, the reported "cycles" should be the same.

Best Reply

Please see chapter 16.11 from Intel SDM vol 3A.Fora range of Intel CPU (please see in the document)and moving forward the TSC increments at a constant rate (there are two wayshow to set that rate). Therefore your results are the expected ones since the TSC will be incremented ata constant rate.
Anyway please note that rdtsc() is not serializing or ordered with other instructions. Tthereforeyou might end up measuringless or more that what you really want. You could try rdtscp() which is a serializing instruction.


While reading that section, I came across section 16.11.1 which is about "Invariant TSC" in newer processors. How is this different from "Constant TSC" as per section 16.11 ?


invariant TSC (availablefrom Nehalem onwards) will count at aconstant rate no matter what state (P, C or T) is the CPU in.
Previous processors, while incrementing at a constant rate as well,will stop counting while in deep sleep states (for example C6 state).
I hope is clearer now.

Leave a Comment

Please sign in to add a comment. Not a member? Join today