4,580 Posts served
11,094 Conversations started
- Academic

- Android

- Art, Music, & Animation

- Embedded Computing

- Events

- Game Development

- Graphics & Media

- Intel SW Partner Program

- Intel® AppUp Developer Program

- Manageability & Security

- Mobility

- Open Source

- Parallel Programming

- Performance and Optimization

- Power Efficiency

- Server

- Site News & Announcements

- Software Tools

- Ultrabook

- Association for Computing Machinery TechNews (ACM)
- Go Parallel! (Dr. Dobbs)
- HPCwire (Tabor Communications, Inc.)
- insideHPC (John West)
- Joe Duffy's Weblog (Microsoft)
- Microsoft Parallel Programming Development Center (Microsoft Germany)
- MultiCoreInfo.com
- scalability.org (Scalable Informatics)
- Software Dev Blog (Intel Germany)
- Soft Talk Blog (Intel United Kingdom)
- The Moth (Microsoft)
Optimizing Software Applications for NUMA: Part 1 (of 7)
By David Ott (Intel) (34 posts) on April 28, 2011 at 8:20 am
1. The Basics of NUMA
NUMA, or Non-Uniform Memory Access, is a shared memory architecture that describes the placement of main memory modules with respect to processors in a multiprocessor system. Perhaps the best way to understand NUMA is to compare it with its cousin UMA, or Uniform Memory Access.
In the UMA memory architecture, all processors access shared memory through a bus (or another type of interconnect) as seen in the following diagram:
UMA gets its name from the fact that each processor must use the same shared bus to access memory, resulting in a memory access time that is uniform across all processors. Note that access time is also independent of data location within memory. That is, access time remains the same regardless of which shared memory module contains the data to be retrieved.
In the NUMA shared memory architecture, each processor has its own local memory module that it can access directly and with a distinctive performance advantage. At the same time, it can also access any memory module belonging to another processor using a shared bus (or some other type of interconnect) as seen in the diagram below:
What gives NUMA its name is that memory access time varies with the location of the data to be accessed. If data resides in local memory, access is fast. If data resides in remote memory, access is slower. The advantage of the NUMA architecture as a hierarchical shared memory scheme is its potential to improve average case access time through the introduction of fast, local memory.
Categories: Performance and Optimization
Tags: Memory, NUMA, optimization, performance
For more complete information about compiler optimizations, see our Optimization Notice.
Comments (1)
Trackbacks (5)
- Optimizing Software Applications for NUMA: Part 1 (of 7)
May 9, 2011 2:59 PM PDT - Optimizing Software Applications for NUMA:... | Performance Tuning and Intel | Syngu
July 13, 2011 4:29 AM PDT - Intercept SMS | Android Spy Software
December 30, 2011 1:00 PM PST - Sexting | Android Spy Software
December 30, 2011 8:34 PM PST - Child Guard | Android and Accessories
March 3, 2012 3:32 AM PST




jimdempseyatthecove
77,429
I am glad to see a discussion on NUMA systems.
Could we discuss this off-line?
My email address is in my profile (bottom of Bio).
Jim Dempsey