Just in time for the weekend!
I am trying to figger out the
memory access problems with an FFT code and can't quite decide what some
of the Westmere EP (Xeon X5650, 06_2C) performance monitor events actually mean....
L1
Writeback events show up in four different performance monitor events
in this processor (Intel arch sw dev guide vol 3B, document 325384-042,
table 19.9):
(1) Event 28, Masks 1/2/4/8, L1D_WB_L2."*"_STATE,
Counts number of L1 Writebacks to the L2 where the cache line to be
written is in the "*" state.
(2) Event B0, Mask 40, OFFCORE_REQUESTS.L1D_WRITEBACK, Counts number of L1D Writebacks to the uncore.
(3) Event F0, Mask 10 , L2_TRANSACTIONS.L1D_WB Counts L1D, Writeback operations to the L2.
(4)
Event 51, Mask 04/08, L1D.M_*_EVICT, Counts the number of modified
lines evicted from the L1 data cache due to replacement (04) or snoop
HITM intervention (08)
For Event 28, it would probably be more
clear if the text ended with "...is in the * state in the L2 cache".
The L1 line will usually be in the M state (though I don't know how
Intel handles "O" state lines).
For this code I sort of expect
strange results because the power-of-2 strides in the FFT are likely to
cause lots of cache conflicts, but I am not sure enough about the
meaning of the counters to know if I am seeing evidence of this or
not....
Normalizing the events to "counts per FFT element per
FFT" gives reasonable numbers to look at. The raw values are in the
range of 1 billion writebacks per execution of the code and are
extremely stable across runs.
Event F0, Mask 10 gives 3.79 writebacks to the L2
Event 28, Mask 0F gives 3.79 writebacks to the L2
Mask 01 gives 0.36 writebacks to I state lines (9.5%)
Mask 02 gives 0.00 writebacks to S state lines (0.0%)
Mask 04 gives 2.82 writebacks to E state lines (74.4%)
Mask 08 gives 0.59 writebacks to M state lines (15.7%)
Event B0, Mask 40 gives 3.19 writebacks to the uncore (84.1% of the WB to the L2 given by Event F0)
Event 51, Mask 04 gives 3.79 L1 M state evictions due to replacement
Event 51, Mask 08 gives 3.79 L1 M state evictions due to snoop HITM
Questions:
(a) What causes a writeback to an I state line in the L2?
(I am running a single threaded workload pinned to a single core with HT disabled)
(b) What causes a writeback to the uncore?
(c) Is an L1 WB to the uncore a subset of writebacks to L2 or is it additive?
(d)
Do counts in Event 51, Mask 04 imply *L1 replacement" (i.e., a
"capacity" miss), or is the event more general (e.g., due to L2 or L3
replacements forcing L1 invalidation)?
(e) Do counts in Event 51,
Mask 08 imply that there is something other than an L1 capacity miss
happening? (Note that this is a single threaded workload pinned to a
single core, so no interventions will come from other processor cores,
but interventions could come from the L2 or L3.)
I am hoping that
these counters give information that can be used to determine the
number of L1 writebacks caused by L3 replacements causing L2/L1
invalidation (perhaps Event 51/Mask 08) and the number of L1 writebacks
caused by L2 replacements causing L1 invalidation (perhaps Event B0/Mask
40).
It is more likely that the explanation is some combination
of misinterpretation on my part and counters that don't count exactly
what they are supposed to count, but I always like to learn --- maybe I
can use these counters to learn something even more interesting than
what I was originally looking for....


