Optimizing Software Applications for NUMA: Part 1 (of 7)

1. The Basics of NUMA

NUMA, or Non-Uniform Memory Access, is a shared memory architecture that describes the placement of main memory modules with respect to processors in a multiprocessor system. Perhaps the best way to understand NUMA is to compare it with its cousin UMA, or Uniform Memory Access.

In the UMA memory architecture, all processors access shared memory through a bus (or another type of interconnect) as seen in the following diagram:

UMA gets its name from the fact that each processor must use the same shared bus to access memory, resulting in a memory access time that is uniform across all processors. Note that access time is also independent of data location within memory. That is, access time remains the same regardless of which shared memory module contains the data to be retrieved.

In the NUMA shared memory architecture, each processor has its own local memory module that it can access directly and with a distinctive performance advantage. At the same time, it can also access any memory module belonging to another processor using a shared bus (or some other type of interconnect) as seen in the diagram below:

What gives NUMA its name is that memory access time varies with the location of the data to be accessed. If data resides in local memory, access is fast. If data resides in remote memory, access is slower. The advantage of the NUMA architecture as a hierarchical shared memory scheme is its potential to improve average case access time through the introduction of fast, local memory.

For more complete information about compiler optimizations, see our Optimization Notice.

1 comment

jimdempseyatthecove's picture


I am glad to see a discussion on NUMA systems.

Could we discuss this off-line?
My email address is in my profile (bottom of Bio).

Jim Dempsey

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.