L1D_CACHE_LD.I_STATE, L1D_REPL and L1data cache miss rate

L1D_CACHE_LD.I_STATE, L1D_REPL and L1data cache miss rate

Hi,

I am trying to measure L1 cache misses for my program. I turned on the L1 data cache miss rate for sampling. When checked, noticed that it has turned on the L2D_REPL event. The L1 data cache miss rate is defined as

L1 data cache rate = L1D_REPL/INST_RETIRED.ANY

L1D_REPL : This event counts the number of lines brought into the L1 data cache

I was hoping that for L1 data cache misses, it will be

L1 data cache rate = L1D_CACHE_LD.I_STATE/INST_RETIRED.ANY

because

L1D_CACHE_LD.I_STATE : Counts how many times requests miss the cache

Then I tried to measure L1D_CACHE_LD.I_STATE and L1D_REPL events. Though they are not exactly same, those are not far apart either.

So I am trying to understand why L1 data cache miss rate is considering L1D_REPL instead of L1D_CACHE_LD.I_STATE.

On the couterpart of L2 cache miss rate, it seems to be considering L2_LINES_IN.SELF.ANY/INST_RETIRED.ANY.

L2_LINES_IN.SELF.ANY : This event counts the number of cache lines allocated in the L2 cache.

L2 as well as L1 cache miss rate is based on cache lines allocated. Why isnt it based on cache misses such as L1D_CACHE_LD.I_STATE. Also, for L2 cache, I dont see something similar to L1D_CACHE_LD.I_STATE.

Any insight will help.

Thanks,

- Milind

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Quoting kmilindatintel

Hi,

I am trying to measure L1 cache misses for my program. I turned on the L1 data cache miss rate for sampling. When checked, noticed that it has turned on the L2D_REPL event. The L1 data cache miss rate is defined as

L1 data cache rate = L1D_REPL/INST_RETIRED.ANY

L1D_REPL : This event counts the number of lines brought into the L1 data cache

I was hoping that for L1 data cache misses, it will be

L1 data cache rate = L1D_CACHE_LD.I_STATE/INST_RETIRED.ANY

because

L1D_CACHE_LD.I_STATE : Counts how many times requests miss the cache

Then I tried to measure L1D_CACHE_LD.I_STATE and L1D_REPL events. Though they are not exactly same, those are not far apart either.

So I am trying to understand why L1 data cache miss rate is considering L1D_REPL instead of L1D_CACHE_LD.I_STATE.

On the couterpart of L2 cache miss rate, it seems to be considering L2_LINES_IN.SELF.ANY/INST_RETIRED.ANY.

L2_LINES_IN.SELF.ANY : This event counts the number of cache lines allocated in the L2 cache.

L2 as well as L1 cache miss rate is based on cache lines allocated. Why isnt it based on cache misses such as L1D_CACHE_LD.I_STATE. Also, for L2 cache, I dont see something similar to L1D_CACHE_LD.I_STATE.

Any insight will help.

Thanks,

- Milind

Hi Milind,

You talked about "L1 data cache miss rate" - Iperfer to use MEM_LOAD_RETIRED.L1D_MISS event, or L1D_CACHE_LD.I_STATE event.

It doesn't make sense to use L1D_REPL event to measure L1 data cache misses.

For L2 data cache miss, please use MEM_LOAD_RETIRED.L2_MISS event. (L2_LINES_IN measures both L2 instruction cache misses andL2 data cache misses)

Regards, Peter

Thanks Peter.

I am not sure why the built-in L1 data cache miss rate is using L1D_REPL.

I was not very clear about the difference between MEM_LOAD_RETIRED.L1D/L2_MISS and MEM_LOAD_RETIRED.L1D/L2_LINE_MISS. Any thoughts ?

Another related query -- If I add multiple events to be counted, VTune needs multiple runs of the program. If I am counting cache related events, I guess the first run of the program will have impact on the second run as the data will be cached by first run. Is there anyway to invalidate the whole cache before any run ? This will give clean counts for cache related events.

Thanks,- Milind

Quoting kmilindatintel

Thanks Peter.

I am not sure why the built-in L1 data cache miss rate is using L1D_REPL.

I was not very clear about the difference between MEM_LOAD_RETIRED.L1D/L2_MISS and MEM_LOAD_RETIRED.L1D/L2_LINE_MISS. Any thoughts ?

Another related query -- If I add multiple events to be counted, VTune needs multiple runs of the program. If I am counting cache related events, I guess the first run of the program will have impact on the second run as the data will be cached by first run. Is there anyway to invalidate the whole cache before any run ? This will give clean counts for cache related events.

Thanks,- Milind

Milind,

If you have multiple events to be monitored in separated runs, note that each run is independent.

You are right! Sometime first run will remain data in cache - it will impact on second run:-(

So you can split one activity to two activities - before you run second activity, run other program on your platform to invalidate the cache.

Does it help?

Regards, Peter

Hi Peter.

To that matter I have written function to churn lot of data before actually executing the funtion of interest. That way probably most of the cache should have flushed the data. But all this is speculation that the cache will be flushed. I have tried 'clflush' as well but do not see much impact with/without clflush.

Anyway. Thanks for your responses. Its good to know that multiple runs of the program to collect different events are run independently.

Thanks, - Milind

Leave a Comment

Please sign in to add a comment. Not a member? Join today