Trying to implement performance counting on my bare-metal hypervisor. Particularly I am interested in the L3 cache misses for an intel i7 06_1Eh processor. There are two methods that should give me the same result, the non-architectural MEM_LOAD_RETIRED.L3_MISS performance event and the uncore UNC_l3_MISS.ANY performance event. I assign either of these events to the IA32_PERFEVTSEL0, set the USR, OS, and EN bits, or the MSR_UNCORE_PERFEVTSEL0 setting the EN bits. And then I set their respective enable bits in their respective global performance control MSRs. However when I do a back to back read of the performance monitor counter, the count value increases (No loads have been performed between the reads of the monitors). How do I check the number of L3 cache misses reliably? Does anyone know of any examples? I want to be able to check the counter back to back, show no increase in count, read >8MB addresses and check the counters again and see that the counter did register a L3 cache miss.
For more complete information about compiler optimizations, see our Optimization Notice.