| November 18, 2008 11:00 PM PST | |
Identify dominant sources of performance bottlenecks accumulated by the BE_L1D_FPU_Bubble event, which accumulates stall cycles caused by the micropipelines associated with the L1D and FPU stalling the core pipeline at the DET stage.
The goal is first to determine the dominant source of the performance bottlenecks contributing to this event. Then apply the appropriate optimizations to remove those execution inefficiencies. The stall cycles accumulated by this counter are dominated by memory access stalls that have a different architectural basis than those accumulated by the BE_EXE_Bubble event. The stalls accumulated in this event have to do with blockages in the transfer of data and not just longer latencies than the compiled code could absorb.
Use the Intel® VTune™ Performance Analyzer to analyze the subevents of BE_L1D_FPU_Bubble. The most common contributions are the Data Cache Unit (DCU) Recirculating (subevent L1D_DCURECIR).
The table below is organized as a hierarchy:
- The total, ALL, is divided into L1D and FPU components.
- The L1D component is divided into a large number of subevents, of which five dominate the contributions encountered by applications.
Keep in mind that subevents are not prioritized, and that a given cycle can be double-counted between subevents, but the sums working down the tree structure are usually close. The following table summarizes BE_L1D_FPU_Bubble subevents:
| Extension | PMC.umask(19:16) | Description |
| ALL | b0000 | Back-end was stalled by L1D or FPU. |
| FPU | b0001 | Back-end was stalled by FPU. |
| L1D | b0010 | Back-end was stalled by L1D. This includes all stalls caused by the L1 pipeline (created in the L1D stage of the L1 pipeline which corresponds to the DET stage of the main pipe). |
| L1D_FULLSTBUF | b0011 | Back-end was stalled by L1D due to store buffer being full. |
| L1D_DCURECIR | b0100 | Back-end was stalled by L1D due to DCU recirculating. |
| L1D_HPW | b0101 | Back-end was stalled by L1D due to Hardware Page Walker. |
| --- | b0110 | (* count is undefined *) |
| L1D_FILLCONF | b0111 | Back-end was stalled by L1D due a store in conflict with a returning fill. |
| L1D_DCS | b1000 | Back-end was stalled by L1D due to DCS requiring a stall. |
| L1D_L2BPRESS | b1001 | Back-end was stalled by L1D due to L2 Back Pressure. |
| L1D_TLB | b1010 | Back-end was stalled by L1D due to L2DTLB to L1DTLB transfer. |
| L1D_LDCONF | b1011 | Back-end was stalled by L1D due to architectural ordering conflict. |
| L1D_LDCHK | b1100 | Back-end was stalled by L1D due to architectural ordering conflict. |
| L1D_NAT | b1101 | Back-end was stalled by L1D due to L1D data return needing recirculated NaT generation. |
| L1D_STBUFRECIR | b1110 | Back-end was stalled by L1D due to store buffer cancel needing recirculate. |
| L1D_NATCONF | b1111 | Back- end was stalled by L1D due to ld8.fill conflict with st8.spill not written to unat. |
Note that many of the subevents of BE_L1D_FPU_Bubble usually have negligible contributions to the stall cycles for compiled code. This is because they deal with access conflicts associated with NAT bits, application registers, control registers, and load.acq/st.rel memory fencing instructions. These have not been seen to contribute significantly for compiled high-level language code.
The L1D_DCS, L1D_LDCONF, L1D_LDCHK, L1D_NAT, and L1D_NATCONF subevents fall into these categories. As compiler technology advances, more aggressive use of speculation may result in these events accumulating significant values. If you notice that these events contribute significantly to stall cycles, contact your Intel representative or use the https://premier.intel.com support Web page to get assistance.
Introduction to Microarchitectural Optimization for Itanium® Processors
For more complete information about compiler optimizations, see our Optimization Notice.

