I was looking through the "Performance Tuning Techniques For Intel® Microarchitecture Code Name Sandy Bridge" section from the "Optimization Reference Manual" (July 2013) when I got a bit puzzled by the CYCLE_ACTIVITY.CYCLES_NO_EXECUTE monitoring event.
I could not find this event for Sandy Bridge (my platform is Xeon E5-1650 (06_2DH)) in the SDM, however CYCLE_ACTIVITY.CYCLES_NO_DISPATCH seems to be the same as CYCLE_ACTIVITY.CYCLES_NO_EXECUTE on Ivy Bridge, which has the same event num. (0xA3) and umask(0x04). Is this Correct?
The next thing I was wondering about in respect to the above event is: what is it actually counting? I would assume, that all cycles in which no execution port is busy are being counted. However, I have some measurements that resulted in CYCLES_NO_DISPATCH > CPU_CLK_UNHALTED.CORE.
This would suggest, that either my assumption is wrong or the core is actually doing less than no work.
As an alternative I tried to use UOPS_DISPATCHED w/ cmask 0x01 and the INV bit set. This gives a more realistic count for the number in which no uop was dispatched. Is there a way to use this instead of CYCLES_NO_EXECUTE?
Note in particular the distinction I am making between dispatching and being busy (i.e. would a never ending stream of DIVs be counted on every tick or just when the uop is dispatched (so roughly 1/10 of all clock cycles)).