After introducing this series of blogs, we established some basic processor and threading terminology. In the last blog, we laid the foundation of our kitchen analogy. We noted that a program is equivalent to a recipe, and that the different architectural features of a modern processor, e.g., pipeline, memory and microcode, function much like the components of our gourmet kitchen, e.g., the pantry, appliances and counter tops. Indeed, even our Chef is equivalent to some components in our modern processor, such as the microcode execution unit.
In this blog, we look at memory hierarchy. Modern computers have 3 types of memory: registers, cache and bulk addressable memory. Bulk addressable memory is also commonly referred to as RAM (Random Access Memory), DDR (Double Data Rate)++, main memory, or just plain “memory”. It’s called bulk because, well, there’s a lot of it compared to the other types, i.e. registers and cache.
‘Data’ in computer science terms are the variables, data structures, buffers, and so on that programs use to do what they do. In more concrete terms, data includes the internal representation of what you see on the computer screen, including your spread sheets, the documents that you read, even the programs themselves such as your word processor and browser.
In our kitchen analogy, ‘data’ are the ingredients that the recipe calls for, intermediate products such as sauces, and even our final gourmet dish. For example, salt, pepper, sugar, beef cutlets, lettuce and chives are all ‘data’.
Figure SCALE. Relative sizes and access times to different types of memory.
From the standpoint of programs we write, where we keep our data depends upon how frequently we use that data and how big it is. If we use it all the time, we keep it as close as we can to the CPU, i.e., in the registers if possible. The size and number of registers available is very small, but accessing them is very fast. See Figure SCALE and Table SPEED. If we use the data frequently (or all the time but is too large to hold in the registers), it is automatically placed into our cache memory. Data access to cache is quite a bit slower than that to registers, but it is still significantly faster than to DDR. And there are several layers of cache, generally 2 and sometimes 3 (called L1, L2 and L3). The program leaves data that it uses infrequently in DDR, i.e., main memory. Access is slow, see Figure SCALE, but the program doesn’t need the data all that often.
1 cycle (0.5 nsec)
4 cycles (2 nsec)
65 cycles (33 nsec)
60 nsec (120 cycles)+
Table SPEED. Access times for memory hierarchy.
Analogously, in our kitchen, how frequently we need an ingredient and how quickly determines where we place the ingredient. See Figure HIERARCHY.
Figure HIERARCHY. Kitchen analogy to the modern computer memory hierarchy.
You can think of the bank of registers as the ingredients you are actively using. A chef may be adding salt, sugar and diced carrots at a certain stage in a recipe, and will place them within reach of the stove. He simply has to reach out and use the ingredient. Similarly, a program may actively need your loan balance, interest rate and bank fees to do some forecasting, and will place that data in registers. Analogously to the limited number of registers, there is not that much space next to the stove so the Chef needs to be very selective in what he decides to place there.
If you use the ingredient often but not all the time, you leave it where it is accessible but not right next to you, e.g., somewhere on one of the counter tops. This is equivalent to a computer’s cache. Our chef needs to add thyme, a sauce he prepared earlier, and crumbled feta cheese at several points in his preparation of the entree. He may have to take a few steps to an island countertop but it is still within easy reach. In our computer example, our program may need to often update and reuse various bank and other financial data in its preparation for an investment portfolio. This data will be (automatically) placed in the processor cache. There is also a lot more space on the counter tops, as there is in cache. (See Figure SCALE.)
Most ingredients a chef uses in the preparation of a meal aren’t needed except at certain steps in the recipe. For example, if the Chef is preparing the entrée, he does not need to have at hand the cream cheese, graham cracker crumbs and blueberry compote he’ll later need for the dessert. He’ll have them stored in the pantry or refrigerator until needed. The pantry is several steps away, perhaps even in an adjacent room, but then he doesn’t go there often. Similarly, our program places in bulk addressable memory, a.k.a. RAM or DDR, the sections of a spreadsheet that it doesn’t need until a later stage in the program. Similarly, pantries are large and generally not full, with places to put ingredients and other equipment that is not needed frequently. Of course, pantries can fill up but there are almost always unneeded and expired items which need to be periodically collected and thrown in the garbage. (Ominous foreshadowing: In a later blog, I will cover something called “garbage collection”.)
Chef and kitchen
Register / cooking surface
The Chef uses salt, sugar and finely ground fresh parmesan cheese continuously to season to taste
The accounting program uses interest and exchange rates continuously to convert dollars to euros.
Cache / counter tops
Spices and sauces that are often used but not all the time
Spreadsheet records of transactions over the last and next few days
DDR / pantry or refrigerator
Meats, earlier prepared dishes, etc., that either are rarely used or will be used in a later step
The previous month of transactions that have been finished and the next month that will be processed next
Table ANALOGY Summary of how a computer’s memory hierarchy and data, can be mapped to the ingredients, sauces and other prepared items in our gourmet kitchen.
“Automatically placed in the processor cache”
I’m sure you noticed that I used “automatically” along with cache memory a few times. This is because the processor pipeline (i.e., our chef) doesn’t really decide explicitly what to put into the cache. What goes into the cache is decided for him by the cache management circuitry. This management circuitry automatically guesses what data the processor needs frequently and places that data into cache. For example, it will notice that some lines in a bank rate table are used frequently and will then move them into the cache for faster access. This guessing is imperfect in that the manager will sometimes get it wrong, ejecting a data item that is frequently used and replacing it with another item that seems to be used often but is not. This is also why organizing your data to work with a processor’s cache structure is often very important for your program’s performance.
In our kitchen example, think of this as the Chef having an assistant that will automatically place ingredients that the Chef will need next within easy reach. The assistant is the cache management circuitry. Just like the cache manager, and to the great irritation of the Chef, the assistant can make mistakes as he isn’t perfect.
+I am sure you noticed that instead of “cycles (seconds),” memory is expressed in “seconds (cycle)”. There is a reason to this madness. Memory access is generally given in nsec (nanoseconds) instead of cycles. This is because memory is external to the processor and controlled by different circuitry running on a different clock. Coordinating these two different clocks so memory and processor can talk to each other requires timing to be specified using a well-defined and absolute external reference unit, e.g., nanoseconds.
++DDR is shorthand for the oh-so-much-more informative name, “Double Data Rate Synchronous Dynamic Random-Access Memory” or DDR SDRAM. It is called double data rate because it has nearly twice the data rate (how much data can be moved per second) as the previous technology, SDR (Single Data Rate) SDRAM.