As I'm sure you know, modern processors employ a technique called pipelining to increase instruction throughput. In a pipeline, various dedicated pieces of hardware on the processor each perform particular functions needed to process an instruction, on different instructions at the same time. For example, while one part of the pipeline is executing instruction A, another part will be fetching instruction B, and another part might be committing (writing results to memory) instruction C. This allows the processor to be working on multiple instructions at once, and helps smooth out waits for data and other long-running operations.
The pipeline isn't really a physical structure created on the chip - but it is encoded into the way the chip processes instructions. Specialized hardware and queues are set up all along the pipeline to keep things moving in the right direction. The number of stages and implementation specifics of a pipeline may change from one CPU model to another. The pipeline for Intel® Microarchitecture Codename Nehalem is different than the pipeline for Intel® Microarchitecture Codename Sandy Bridge, for example. (In fact, the layout of the pipeline - number of stages, the specific layout of each stage, etc, is one of the main features we are referring to with each new named microarchitecture.)
Despite it's abstract and changing nature, we have developed a few general terms we use to describe parts of a pipeline. Being familiar with pipelining and its terminology can be helpful for performance tuning, so I will use this blog and the next one to give you a short high-level overview. Most pipelines, including the Sandy Bridge pipeline, are thought of as having 2 main parts - the front-end and the back-end. In this blog I will describe the front-end, and then after the US Thanksgiving holiday I will post my next blog on the back-end.
The Front-End of the Pipeline
The initial part of the pipeline is usually responsible for providing a stream of work for the back-end to operate on. For Intel x86-based processors specifically (including processors based on Sandy Bridge), the front-end is "in-order" while most of the back-end is not. This means that, at the beginning of the pipeline, instructions are processed in the same order as they are found in the program being run. Also on Intel x86 processors, the front-end is working on instructions (in assembly language), where as the back-end is working on micro-operations. Intel x86 processors use a CISC architecture, which means that within the pipeline, assembly instructions are broken down into smaller pieces, which we call micro-operations, or uops. It is the front-end's job to do this breakdown - called "decoding" the instructions.
So for x86-based processors, the front-end does two main things - fetch instructions (from where program binaries are stored in memory or the caching system), and decode them into micro-operations. As part of the fetching process, the front-end must also predict the targets of branch instructions (if-type statements) when they are encountered, so that it knows where to grab the next instruction from. All sorts of specialized logic and hardware work together to do these functions - a branch predictor, a specialized micro-operation cache, particular decoders for both simple and complex instructions, and more. All these bits of hardware contribute toward the front-end's main goal of supplying work - in the form of micro-operations - to the back-end. The Sandy Bridge Front-end is capable of delivering 4 uops per cycle (or processor clock-tick) to the back.
To read what happens in the second part of the pipeline, and why it's important, stay tuned for next week's blog!
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804