Advanced computer concepts for the (not so) common Chef
In Pipeline and the Efficient Chef (Part 1), we showed how the basic pipeline is equivalent to what our Chef does when following one step in his recipe. To say it differently, the execution of one machine language instruction is equivalent to our Chef performing one step of a complicated recipe.
Notice that our Chef doesn’t just execute one step at a time. Being the experienced and sophisticated cook that he is, he simultaneously takes care of multiple steps. For example, while a sauce is simmering, he may also be dicing carrots and onions for the next step in the recipe.
The CPU does this same thing. See Figure PIPELINING. If an instruction is using the EX (execution) circuitry, it (probably) isn’t using the ID (instruction decode) circuitry. Similarly, if the CPU is decoding an instruction (ID), it isn’t using the IF (Instruction Fetch) circuitry.
FIGURE PIPELINING The basic computer pipeline executes several instructions simultaneously.
What the modern computer pipeline does is to speed up the execution of instructions by what is called pipelining. While one instruction (say at time t=3) is in its EX/execution stage, the next instruction (at t=2) is accessing memory for the data it will need, and the instruction after that (at t=1) is being decoded, and the instruction after that (t=0) is being fetched from memory, etc. See Figure PIPELINING.
So what does this buy you? Let’s look at the phases of our basic instruction. It has 5 phases, IF, ID, EX, MEM and WB. Each of these steps typically takes 1 cycle of our computer’s clock. So each instruction takes a total of 5 cycles to execute. But if we can execute 5 instructions, each using circuitry not needed by the executing phases of other instructions, we can execute 5 instructions simultaneously as shown in Figure PIPELINING. In effect, though each instructions takes 5 cycles, we are executing 5 instructions at the same time, though in different phases. Saying this differently, we are completing 1 instruction per cycle so it appears that each instruction only takes 1 cycle to execute each instruction (vs the 5 cycles it actually takes).
(That got a little confusing, even to me. Let's use another analogy. Say you own a detailing shop. Detailing requires five stages: (1) wash the car's exterior, (2) vacuum the interior, (3) shampoo the interior, (4) wax the exterior, and (5) clean the windows. If each stage takes 10 minutes, then it takes a total of 50 minutes to detail one car. But if you detail 5 cars simultaneously, but with each car undergoing different detailing stages, you finish a car every 10 minutes, even though it still takes 50 minutes to detail each individual car. In techno geek language, we call this exploiting parallelism. This is exactly what CPU pipelining is doing.)
Let’s go back to our kitchen and Chef. Figure PHYSICAL from the first blog is our pipeline. At the top are the stages of our pipeline, and at the bottom are analogous operations performed by our Chef. Our Chef overlaps the different steps in the recipe. As he stir fries some onions, garlic and mushrooms, he’s putting the sauce he just prepare in the refrigerator, and dicing up some fresh pepperoni and tomatoes for the next step. Just as with our computer’s pipeline, our gourmet Chef is faster than a novice cook like me, not just because he can perform each step faster (and better), but because he has the experience to know how to perform several steps at once.
What does this mean in a typical kitchen? It means that whereas I, as a decent but only moderately experienced chef, would take 2 hours to prepare my pizza, our gourmet chef would only take 40 minutes. The chef is not just faster than I am at each step in the recipe, he also is able to “pipeline” the parts of multiple recipe steps simultaneously.
In summary, pipelining in a CPU is performing several instructions simultaneously but offset so that each instruction is using a different part of the processor. This is much as our gourmet cook can perform several recipe steps simultaneously as long as those steps use different appliances (stove, oven, food processor, etc.).
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804