How to get L1, L2 and L3 cache misses by reading performance counters using rdpmc instruction?

How to get L1, L2 and L3 cache misses by reading performance counters using rdpmc instruction?

For example, sample code looks like this

long long get_L3_misses( )
{
  unsigned int a=0, d=0, c;

  c = (1<<30); // what is counter number for L3 cache misses?
  asm volatile(
        "rdpmc"
            : "=a" (a), "=d" (d)
            : "c" (c)
        );

  return ((long long)a) | (((long long)d) << 32);

}

int main(int argc, char* argv[])

{

 long long start, stop;

 double result;

  start = get_L3_misses();

  some funtioncall;

  stop = get_L3_misses;

  result = (double) stop - start;

  return 0;

 

 

Thank you.

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Shailja,

the counters need to be programmed first with the event that they should count. Sample code for doing so can be found here:

https://github.com/opcm/pcm/blob/c21fbce6af8fb2435d390a56c7db75191d1df34f/cpucounters.cpp#L1717

The counter is then read later on in this location:

https://github.com/opcm/pcm/blob/c21fbce6af8fb2435d390a56c7db75191d1df34f/cpucounters.cpp#L2991

In case you are not interested in the details but just want to get the number of cache misses, you might consider using the PCM library as is and simply use the high-level functions. An example for calling the library can be found here:

https://software.intel.com/en-us/articles/intel-performance-counter-monitor#calling_pcm

Kind regards

Thomas

 

Thank you Thomas for the reply. Really helpful.

Leave a Comment

Please sign in to add a comment. Not a member? Join today