Pipeline Speak, Part 2: The Second Part of the Sandy Bridge Pipeline

Last week I posted a blog explaining the front-end of the pipeline on Intel® Microarchitecture Codename Sandy Bridge. Today's blog completes the discussion of the pipeline by explaining the back-end, and then why it's helpful to know this stuff in general.

The Back-End
The back-end of the pipeline is responsible for executing the micro-operations the front-end generates. In order to make the best use of its resources, the back-end uses it's own bookkeeping system to keep track of each micro-operation, the pieces of data it requires, and its execution status. Then it executes the micro-operations in any order - according to when a micro-operations has all its data ready and when the execution resources are available. The bookkeeping and scheduling of micro-operations is fairly complex, and requires many dedicated queuing structures. It is when some of these queuing structures are full that the back-end cannot accept new micro-operations from the front-end - a situation referred to as "back-end bound pipeline slots" in the Sandy Bridge tuning methodology we recommend (see more on this below).

The execution resources that the back-end is keeping track of are called execution units. Each microarchitecture might have a slightly different layout and number of execution units available. These are the pieces of logic on the processor that perform specific functions, such as adding, dividing, logical shifting, loading from memory, etc. Once micro-operations have finished using the execution units and have all data loaded or stored, they are "retired" - meaning that they have finished their time in the pipeline. These uops are never explicitly converted back into instructions. An instruction is considered "retired" when all of its uops have retired, but this is really an abstraction of what happens - the back-end of the pipeline is dealing with uops only.


Knowing a little bit about a processor's microarchitecture, including a general knowledge of the pipeline, can be very useful in performance analysis. Particularly with Intel® Microarchitecture Codename Sandy Bridge, because, for the first time for x86 processors, there are performance events available that neatly map to various pipeline scenarios and form a cohesive methodology. Using these events you can characterize how well your application utilized the pipeline while it ran, and then tune based on this information. Intel's software performance analysis product, VTune™ Amplifier XE, incorporates this methodology to make it easy to identify pipeline utilization and areas for improvement on Sandy Bridge. For more information on the methodology and the Sandy Bridge pipeline, check out this video! Or download the tuning guide that walks you through using VTune Amplifier XE to identify and tune software performance issues on Sandy Bridge.

For more complete information about compiler optimizations, see our Optimization Notice.