There is little conceptual difference between storing data in a computer and storing things in a home.
Whether you are getting the guest room ready or running the computer application predicting tomorrow’s weather, you are trying to get something done before you need it done, at as low a cost as possible, given the circumstances.
You try to minimize the space required to store things, the time moving things, the idle time waiting for things to arrive, and the difficulties involved in using things once they arrive.
A typical modern computer can store a few hundred bytes of data in registers within each core of the processor. The data in the registers may be explicitly moved to or from more distant devices, or may be created by the core and lost when overwritten. The movement starts with reading or writing a register from/to an L1 (Level one) cache. Each core usually has a private L1 cache that can contain tens of thousands of bytes (32 KB is a typical size).
Other private and shared caches are usually located on the path between the L1 cache and main memory (although non-temporal loads and stores can bypass them). These caches range in size from the private 256-KB caches and many-MB shared caches in processors, to the many-GB caches stored in Multi-channel DRAM (MCDRAM) and High-Bandwidth Memory (HBM) memories or in dual inline-memory modules (DIMMs).
The MCDRAM and HBM memories of Intel® Xeon Phi™ processors can be used as caches for more distant DIMMs, and these caches contain on the order of 16 to 60 GB of data.
The main memory of a personal computer or server tends to be in the 4-GB to 1500-GB range.
Future 3D XPoint™ DIMMs may make it practical for main memory to hold terabytes – 6 TB (6000 GB) is predicted. 3D XPoint DIMMs will probably have a slower bandwidth than double data rate (DDR) DIMMs, perhaps with their contents cached in MCDRAM, HBM memory to compensate for this. Such DDR DIMM caches could be about 10% of the capacity of the main memory, so these caches can be 600 GB in size – a far cry from the 4-KB main memory on the machines from the early 1970s.
The fundamental difference between caches and main memory is:
Data can survive in registers, caches, and main memory while the computer has power.
Beyond the main memory are devices where data can survive when a computer loses power. These devices can usually hold more data than main memory. Hard drives, solid-state drives (SSDs), and removable media such as DVDs are all examples of such devices for persisting data.
If these devices were faster than main memory, you would use them as main memory. You do not, because the processor cannot access the data on them as quickly as it can from main memory – it is hindered by latency and bandwidth.
3D XPoint technology, and any DIMMs built using it, may keep contents across power failures, so you can use them as both main memory and as a persistent data store. 3D XPoint technology may be used in Optane SSDs.
Latency is how long it takes from when you start a request for data until the data arrives.
It is hard to measure latency in many situations because both the compiler and the hardware reorder many operations, including requests to fetch data. They also reorder instructions to do other things while waiting for data to arrive. They may even predict what the fetched data is going to be and act on that prediction, so the arrival of the data simply makes the results of this prediction visible. Of course, if the prediction is wrong, all that work must be redone. Because of this, latency matters most when the compiler and hardware cannot find useful work to do while waiting.
Bandwidth is the rate at which the data arrives, however long that is after it is requested. The usual example involves sending a ship carrying 1 million DVDs across the ocean every day. The latency might be six days, but the bandwidth is tens of GB per second.
As for density, many data transfers occur in 4 bytes, 8 bytes, or even more bytes between the core or memory devices. But not all this data may be used at the destination.
Assuming you have a large processor (about 16 cores), the following summarizes, for 2016, approximate data totals present in and moving through the system.
|L1 cache||32 KB||1 nanosecond||1 TB/second|
|L2 cache||256 KB||4 nanoseconds||1 TB/second
Sometimes shared by two cores
|L3 cache||8 MB or more||10x slower than L2||>400 GB/second|
|MCDRAM||2x slower than L3||400 GB/second|
Main memory on DDR DIMMs
|4 GB-1 TB||Similar to MCDRAM||100 GB/second|
Main memory on Intel Omni-Path Fabric
|Limited only by cost||Depends on distance||Depends on distance and hardware|
|I/O devices on memory bus||6 TB||100x-1000x slower than memory||25 GB/second|
|I/O devices on PCIe bus||Limited only by cost||From less than milliseconds to minutes||
GB-TB/hour Depends on distance and hardware
The previous article, Modern Memory Subsystem Benefits for Database Codes, Linear Algebra Codes, Big Data, and Enterprise Storage, aligned new memory subsystem hardware technologies with the needs of applications. This article provides a deeper understanding of hardware capabilities when used for usual variables and heap allocated data of an application – data that did not exist before the application started and evaporates when it ends. The next article, Data Persistence in a Nutshell, introduces using non-volatile memory to replace the use of files for keeping data from one execution of an application to the next.
Bevin Brett is a Principal Engineer at Intel Corporation, working on tools to help programmers and system users improve application performance. He was born and raised in New Zealand, where he earned a B.Sc (Hons) in Mathematics before moving to Australia and then New Hampshire, pursuing first an education and then a career in software engineering.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804