I've got several questions about the events on Ivy Bridge.
(1) Based on the SDM Table 19-5 (Non-Architectural Performance Events In the Processor Core of 3rd Generation Intel® Core™ i7, i5, i3 Processors), the Ivy Bridge has counters named CYCLE_ACTIVITY.CYCLES_LDM_PENDING, CYCLE_ACTIVITY.CYCLES_L1D_PENDING and CYCLE_ACTIVITY.CYCLES_L2_PENDING. However, in the Intel 64 and IA-32 Architectures Optimization Reference Manual Appendix B.3.2.3 all the events mentioned are STALLS events, which are not even mentioned in the SDM actually, rather than CYCLES events. And the question is what are they supposed to be, CYCLES or STALLS? And which ones should I use to do the memory bound characterization as B.3.2.3 mentioned?
(2) There are some formulas in the Appendix B.3.2.3 mentioned above about how to calculate the bound on different level of memory subsystem. One thing I found confusing is that when I did measurement using STALLS events with PAPI (it provides both CYCLES events and STALLS events which also makes me very confusing) mentioned above, I got larger number on STALLS_L2_PENDING than STALLS_L1D_PENDING, while there is a formula in that section shows:
%L2 Bound = (CYCLE_ACTIVITY.STALLS_L1D_PENDING - CYCLE_ACTIVITY.STALLS_L2_PENDING) / CLOCKS
Does this mean my measurement is wrong? If not, then how could I calculate %L2 Bound since it would be above zero.
Could someone help me with this? Thanks so much!