New non-uniform memory access (NUMA) technologies are spreading across processors populating the modern computing world – whether those processors are in individual servers designed to run small applications, or in massive dedicated MPI clusters:
- On-package memory – Multi-Channel DRAM (MCDRAM) and High-Bandwidth Memory (HBM) – that provides higher memory bandwidth than dual inline-memory modules (DIMMs)
- Non-volatile DIMMs (NVDIMMs) that provide faster storage than solid-state drives (SSDs) and hard disk drives (HDDs)
- Intel® Omni-Path Fabric (Intel® OP Fabric), the next generation of memory fabric that enables load and store instructions to access data anywhere, whether that data is in a nearby cache or in the memory of a server on another continent
These new technologies intensify a problem found on older servers, and on modern tablets, laptops, and desktops: The time to execute load and store instruction varies wildly depending on where data is currently cached. This NUMA memory behavior results in huge performance differences between apparently similar application versions, where one version works well with the memory subsystem and another version does not.
Programmers who do not understand how to effectively use the memory subsystem on older or simpler few-core processors often create code with a 10x or worse performance slowdown. On many-core and multi-core processors, it is easy to create code with a 100x or worse performance slowdown.
This article mentions three important changes happening in modern computers. The next article, MCDRAM and HBM, provides an overview of the first of these changes. Other articles include:
- How Memory Is Accessed, which explains how data is loaded and stored (the underlying cause of the challenge described in this article)
- An introductory series, starting with Performance Improvement Opportunities with NUMA Hardware, which covers the basics of efficiently using these new technologies
About the Author
Bevin Brett is a Principal Engineer at Intel Corporation, working on tools to help programmers and system users improve application performance. He spent many happy hours in his high school library reading about the computers of the 1940s and 1950s, and their unusual memory technologies. He is not surprised to see ongoing innovation in this area.