I am using the PCM tool for some experiments on a 4 Socket (WestmereEX) machine. I am puzzled about a discrepancy that I observe when counting the L3 Cache Misses on two different ways...
I started with the PCM tool as it is and used it to monitor L3 Cache Misses (L3MISS coloumn) in 1 second intervals. Looking at the code, I know that the following event is counted: MEM_LOAD_RETIRED_L3_MISS. (The description for this event from the Software Developer Manual: "Counts number of retired loads that
miss the L3 cache. The load was satisfied by a remote socket, local memory or an IOH".)
I focused on the "per socket" statistics as I was interested in socket 0 only. Doing so, I ran two workloads with the rough result that workload A hast about 10 times as many L3MISS events than workload B.
Next I was interested in the (MESIF) state that cache lines are in, when they are read (shared or exclusive). I figured that this information is only provided in uncore counters so I programmed several uncore counters following the guide (Intel Xeon Processor E7 Family (Westmere EX) Uncore Performance Monitoring Programming Guide). Precisely, I programmed all C-Boxes to count the LLC_MISSES (event 0x14 on page 2-25). The LLC_MISSES for all 10 C-Boxes are summed up in the tool and I was expecting the result to reflect the number of L3 misses for a certain socket.
Not looking at any cache line states yet, I was performing a sanity check to see whether the L3 Cache Misses are equal in both setups. Surprisingly, I see quite different results. Not only differ they quantitatively (which I could understand taken the different ways to measure into account), but they also differ qualitative. My problem: workload A now only has about half of the L3 misses that workload B has (it used to be 10 times more).
Having thought about this for a while: Am I missing something? Do these two ways to count L3 misses actually count different events? Is one way counting a subset of the events that the other way does?
Any help will be appreciated! Thanks!
PS: I also compard L3 HITS counted with the core counters and with the uncore (C-Box) counters and although they differ by factor 5, they at least show the same qualitative results.