Last week I posted a blog explaining the front-end of the pipeline on Intel® Microarchitecture Codename Sandy Bridge. Today's blog completes the discussion of the pipeline by explaining the back-end, and then why it's helpful to know this stuff in general.
The back-end of the pipeline is responsible for executing the micro-operations the front-end generates. In order to make the best use of its resources, the back-end uses it's own bookkeeping system to keep track of each micro-operation, the pieces of data it requires, and its execution status. Then it executes the micro-operations in any order - according to when a micro-operations has all its data ready and when the execution resources are available. The bookkeeping and scheduling of micro-operations is fairly complex, and requires many dedicated queuing structures. It is when some of these queuing structures are full that the back-end cannot accept new micro-operations from the front-end - a situation referred to as "back-end bound pipeline slots" in the Sandy Bridge tuning methodology we recommend (see more on this below).
The execution resources that the back-end is keeping track of are called execution units. Each microarchitecture might have a slightly different layout and number of execution units available. These are the pieces of logic on the processor that perform specific functions, such as adding, dividing, logical shifting, loading from memory, etc. Once micro-operations have finished using the execution units and have all data loaded or stored, they are "retired" - meaning that they have finished their time in the pipeline. These uops are never explicitly converted back into instructions. An instruction is considered "retired" when all of its uops have retired, but this is really an abstraction of what happens - the back-end of the pipeline is dealing with uops only.
Knowing a little bit about a processor's microarchitecture, including a general knowledge of the pipeline, can be very useful in performance analysis. Particularly with Intel® Microarchitecture Codename Sandy Bridge, because, for the first time for x86 processors, there are performance events available that neatly map to various pipeline scenarios and form a cohesive methodology. Using these events you can characterize how well your application utilized the pipeline while it ran, and then tune based on this information. Intel's software performance analysis product, VTune™ Amplifier XE, incorporates this methodology to make it easy to identify pipeline utilization and areas for improvement on Sandy Bridge. For more information on the methodology and the Sandy Bridge pipeline, check out this video! Or download the tuning guide that walks you through using VTune Amplifier XE to identify and tune software performance issues on Sandy Bridge.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserverd for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804