TITLE: Back End Bound Due To Latency Caused By L1 Data Cache
This metric describes the cycles the back end was bound on the L1 data cache. The L1 cache typically has the shortest latency. However, in certain cases like loads blocked on older stores, a load might suffer a high latency even though it is being satisfied by the L1. There are no fill-buffers allocated for L1 hits so instead we use the load matrix (LDM) stalls sub-event as it accounts for any non-completed load.
L1 Bound: (CYCLE_ACTIVITY.STALLS_LDM_PENDING - CYCLE_ACTIVITY.STALLS_L1D_PENDING)/ CPU_CLK_UNHALTED.THREAD
The LDM_PENDING sub-event is new for Intel microarchitecture codename IvyBridge and not only identifies when these stalls matter, but supplies an upper bound of the overall L1 possible stalls should there be a block type that is not covered by legacy counters.
EQUATION: (CYCLE_ACTIVITY.STALLS_LDM_PENDING-CYCLE_ACTIVITY.STALLS_L1D_PENDING) / CPU_CLK_UNHALTED.THREAD