Optimizing Software Applications for NUMA: Part 2 (of 7)

Modern Processors

Modern multiprocessor systems mix these basic architectures as seen in the following diagram:

In this complex hierarchical scheme, processors are grouped by their physical location on one or the other multi-core CPU package or “node”. Processors within a node share access to memory modules as per the UMA shared memory architecture. At the same time, they may also access memory from the remote node using a shared interconnect, but with slower performance as per the NUMA shared memory architecture.

Server platforms like Intel® Xeon using the Intel® Core i7 processors provide an example of this complex memory architecture, and for this reason our discussion will center on it henceforth. Note that such platforms employ a fast interconnect technology known as Intel® QuickPath Interconnect (QPI) to mitigate (but not eliminate) the problem of slower remote memory performance.
For more complete information about compiler optimizations, see our Optimization Notice.