Performance events for L3 cache miss

Performance events for L3 cache miss

Hello,

I'm trying to use vtune to understand the performance of my test program by measuring L3 cache misses. I use Xeon E5-4620. Intel software developer's manual lists all performance events supported by the E5 family, but I can't find one that measure L3 cache misses. If I understand correctly, OFFCORE_REQUESTS or OFFCORE_RESPONSE measures the number of requests sent to uncore or the number of responses from uncore. I also tried MEM_LOAD_UOPS_RETIRED.LLC_MISS, but I'm not sure if it's equivalent to the number of L3 cache misses.

My test program reads an array of 1GB sequentially, so there should be a lot of cache misses (I have turned off prefetch in BIOS), but MEM_LOAD_UOPS_RETIRED.LLC_MISS shows that there are very few cache misses. I wonder if it's the problem of my test program or MEM_LOAD_UOPS_RETIRED.LLC_MISS is a wrong event for L3 cache misses. Any comments?

Thank you,
Da 

9 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

As a quick answer - you may use (predefined) "memory access" analysis type which includes all L1/L2/LLC cache metrics.

It seems by reading description that MEM_LOAD_UOPS_RETIRED.LLC_MISS count of retired memory load uops which data source were not hit in LLC.

Thanks, Peter. The predefined memory access analysis also monitors MEM_LOAD_UOPS_RETIRED.LLC_MISS.

iliyapolak, does it mean MEM_LOAD_UOPS_RETIRED.LLC_MISS is the same as the number of L3 cache misses?

I monitored my test program with MEM_LOAD_UOPS_RETIRED.LLC_MISS and some other events: MEM_LOAD_UOPS_RETIRED.L2_HIT, MEM_LOAD_UOPS_RETIRED.LLC_HIT, OFFCORE_REQUESTS.DEMAND_DATA_RD. And here are the results of these events of a run:
MEM_LOAD_UOPS_RETIRED.L2_HIT=0
MEM_LOAD_UOPS_RETIRED.LLC_HIT=0
MEM_LOAD_UOPS_RETIRED.LLC_MISS=280,000
OFFCORE_REQUESTS.DEMAND_DATA_RD=134,400,000.

If I understand correctly, MEM_LOAD_UOPS_RETIRED.LLC_HIT+MEM_LOAD_UOPS_RETIRED.LLC_MISS should be the same as OFFCORE_REQUESTS.DEMAND_DATA_RD. But apparently, it's not the case here. It's unlikely data loading can bypass the cache because my test program uses the add instruction to load data and add to a register. I suppose there are some other events that count L3 cache hits and misses.

I think that MEM_LOAD_UOPS_RETIRED.LLC_MISS counts number of load uops which data were not present in LLC cache.Unfortunatly I can not find in VTune manual any description of OFFCORE_REQUESTS.DEMAND_DATA_RD. Can you point me to the source of information?

All events for E5 family are listed in Software Developer’s Manual V3. section 19.4

Regarding my previous post I would like to add that total L3 misses could include also cache miss of instructions. Does your code operate on immediate values only?

I don't know what you mean by immediate values. But my test program reads 1GB data, so the cache misses of instructions should be negligible.

One thing I concern is that MEM_LOAD_UOPS_RETIRED.LLC_MISS excludes unknown data source, as stated in the developer's manual. I don't know what is considered as unknown data source.

Immediate data  will be this instruction mov eax,1000h non immediate value will be stored in memory,but this is not your case.Yes I aggree regarding instruction cache misses because they could be mainly rep movss instruction which has high frequency of repeating in your code(some kind of loop)

The question is unknown data source related to profiled  currently executing hardware thread.

Leave a Comment

Please sign in to add a comment. Not a member? Join today