I'm trying to use vtune to understand the performance of my test program by measuring L3 cache misses. I use Xeon E5-4620. Intel software developer's manual lists all performance events supported by the E5 family, but I can't find one that measure L3 cache misses. If I understand correctly, OFFCORE_REQUESTS or OFFCORE_RESPONSE measures the number of requests sent to uncore or the number of responses from uncore. I also tried MEM_LOAD_UOPS_RETIRED.LLC_MISS, but I'm not sure if it's equivalent to the number of L3 cache misses.
My test program reads an array of 1GB sequentially, so there should be a lot of cache misses (I have turned off prefetch in BIOS), but MEM_LOAD_UOPS_RETIRED.LLC_MISS shows that there are very few cache misses. I wonder if it's the problem of my test program or MEM_LOAD_UOPS_RETIRED.LLC_MISS is a wrong event for L3 cache misses. Any comments?