For many years computer applications organized their data between two tiers: memory and storage. The new generation of persistent memory from Intel, based on groundbreaking Intel® Optane™ technology, has introduced a third tier. Learn about the technology, how it is exposed to applications, and why there is so much excitement around enabling persistent memory.
Persistent memory technologies allow development of products with the attributes of both storage and memory. The products are persistent, like storage, meaning they hold their content across power cycles, and they are byte-addressable, like memory, meaning programs can access data structures in place.
What really makes Intel persistent memory technology stand out is that it’s fast enough for the processor to access directly without stopping to do the block I/O required for traditional storage.
The main reason for the excitement around persistent memory is the ability to provide improved performance over existing storage devices. If you compare a modern NAND-based SSD, which plugs into the PCIe* bus and communicates using the NVM Express* protocol, you can see the time it takes to read a block is over 80 µs. In the graphic below, notice how most of the time is spent accessing the media, as indicated by the blue area. The software stack is a small percentage of the overall access time—we could work on making the driver faster, and the difference would be hardly noticeable.
Figure 1. Application latency comparison
The Intel® Optane™ SSD also plugs into the PCIe bus but uses Intel® Optane™ storage technology, so the time spent accessing the media is significantly reduced, and the overhead of the software stack and PCIe protocol become a significant portion of the overall latency. To get the most out of the Intel Optane technology, it now makes sense to tackle the overhead of both software and the interconnect. That’s where persistent memory comes in.
By connecting the media to the memory bus, the CPU can access the data directly, without any driver or PCIe overhead. And since memory is accessed in 64-byte cache lines, the CPU reads only what it needs to read, instead of rounding every access up to a block size, like storage. In Figure 1, you can see how low latency a 64-byte read is here.
With persistent memory, applications have a new tier available for data placement: in addition to the memory and storage tiers, the persistent memory tier offers greater capacity than DRAM and significantly faster performance than storage. Applications can access persistent memory–resident data structures in place, as they do with traditional memory, eliminating the need to page blocks of data back and forth between memory and storage.
To get this low-latency direct access, we need a software architecture that allows applications to connect up with ranges of persistent memory.
The storage stack is shown here at a very high level. These basic blocks that make up the stack haven’t changed much over decades of use. Applications use standard file APIs to open files on a file system, and the file system does block I/O as necessary through a driver or set of drivers. All accesses to the storage happens in blocks, typically over an interconnect like PCIe.
Figure 2. The storage stack
From an operating system perspective, support for basic file APIs like open/close and read/write have existed for a few decades. Developers writing applications in higher level languages may be programming with libraries that provide more convenient APIs. Those libraries will eventually call these APIs internally.
Both Windows* and Linux* support memory-mapped files, a feature which has been around for a long time but is not commonly used. For persistent memory, the APIs for memory mapping files are very useful; in fact, they are at the heart of the persistent memory programming model published by the Storage Networking Industry Association (SNIA).
Memory mapping a file is only allowed after the file is already opened, so the permission checks have already happened by the time an application calls
MapViewOfFile on Windows, or
mmap on Linux.
Figure 3. Memory-mapped files
Once those calls are made, the file appears in the address space of the application, allowing load/store access to the file contents. An important aspect of memory-mapped files is that changes, done by store instructions, are not guaranteed to be persistent until they are flushed to storage. On Windows, this is done using
FlushFileBuffers; on Linux, it is either
This is where the power of the memory-mapped file API really benefits persistent memory programming.
The persistent memory programming model allows byte-level access to non-volatile media plugged into the memory bus, shown here by the common industry term NVDIMM, which is short for non-volatile dual in-line memory module. You can see that once the mappings are set up, the application has direct access, provided by the MMU’s virtual-to-physical mappings. The ability to configure these direct mappings to persistent memory is a feature known as direct access (DAX). Support for this feature is what differentiates a normal file system from a persistent memory-aware file system. DAX is supported today by both Windows and Linux.
Figure 4. Persistent memory programming model
PMDK is a collection of open source libraries and tools that are available today for both Linux and Windows. For more information, please visit pmem.io, the Persistent Memory Programming web site. PMDK facilitates persistent memory programming adoption with higher level language support. Currently, C and C++ support is fully validated and delivered on Linux, and available as early access on Windows.
Figure 5. Persistent Memory Developer Kit (PMDK) Library Block Diagram
Persistent memory is a game changer and the programming model described here provides access to this emerging technology. PMDK Libraries provide support for transactional operations to keep the data consistent and durable. There are still a lot of unknowns, and there continues to be a lot of exciting work in this field.
To learn more, check out the Persistent Memory Programming video series.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804