A Brief Survey of NUMA (Non-Uniform Memory Architecture) Literature

This document presents a list of articles on NUMA (Non-uniform Memory Architecture) that the author considers particularly useful. The document is divided into categories corresponding to the type of article being referenced. Often the referenced article could have been placed in more than one category. In this situation, the reference to the article is placed in what the author thinks is the most relevant category. These articles were obtained from the Internet and, though every attempt was made to identify useful and informative material, Intel does not provide any guarantees as to the veracity of the material. It is expected that the reader will use their own experience and knowledge to challenge and confirm the material in these references.

Where beneficial, some comments (indented and in italics) as to the usefulness and content of an article is included.











Lameter, Christoph. (August 2013). NUMA (Non-Uniform Memory Access): An Overview, ACM Queue, Vol. 11, no. 7. Retrieved on September 1st, 2015 from http://queue.acm.org/detail.cfm?id=2513149.

Comment: Linux focused with a moderate list of references.

Panourgias, Iakovos. (September 9th, 2011). NUMA effects on multicore, multi socket systems, MSc Thesis, University of Edinburgh. Retrieved on September 1st, 2015 from http://static.ph.ed.ac.uk/dissertations/hpc-msc/2010-2011/IakovosPanourgias.pdf.

Comment: HPC benchmark focused; discussed from a programming perspective (vs an OS administrative); comprehensive.

Non-uniform memory access, Wikipedia. Retrieved September 1st, 2015 from https://en.wikipedia.org/wiki/Non-uniform_memory_access.

Comment: Good set of references.

Manchanda, Nakul, and Karan Anand. (May 5th, 2010). "Non-Uniform Memory Access (NUMA)", Class thesis. New York University. Retrieved on September 1st, 2015 from http://cs.nyu.edu/~lerner/spring10/projects/NUMA.pdf.

Yatendra Sharma. (February 10th, 2014). NUMA (Non-Uniform Memory Access): An Overview, Blog. Retrieved on September 1st, 2015 from http://yattutime.blogspot.com/2014/02/numa-non-uniform-memory-access-overview.html.


Müller, Daniel. (9th December, 2013). Memory and Thread Management on NUMA Systems, Diploma Thesis, Technische Universität Dresden. Retrieved on September 1st, 2015 from http://os.inf.tu-dresden.de/papers_ps/danielmueller-diplom.pdf.

Comment: Comprehensive and more technical.

Denneman, Frank. (February 27th, 2015). Memory Deep Dive: NUMA and Data Locality, Blog. Retrieved on September 1st, 2015 from http://frankdenneman.nl/2015/02/27/memory-deep-dive-numa-data-locality.

Comment: Part of a larger series on memory systems.


Bolosky, William J., Robert P. Fitzgerald, Michael L. Scott. (1989). Simple But Effective Techniques for NUMA Memory Management, ACM SIGOPS Oper. Syst. Rev., Vol. 23, No. 5, pp. 19-31. Retrieved on September 1st, 2015, from http://www.cs.berkeley.edu/~prabal/resources/osprelim/BFS89.pdf.

Comment: Seminal paper.


Linux Operating System. (August 8, 2012). NUMA(7) Manpage. Retrieved on September 1st, 2015 from http://man7.org/linux/man-pages/man7/numa.7.html.

Drepper, Ulrich. (October 17th, 2007). Memory part 4: NUMA support, LWN.net. Retrieved on September 1st, 2015 from http://lwn.net/Articles/254445.

Comment: LWN is Linux focused.

Sourceforge. (November 20th, 2002). Linux Support for NUMA Hardware. Retrieved on September 1st, 2015 from http://lse.sourceforge.net/numa.

Microsoft Corporation. NUMA Support (Windows), Windows Dev Center. Retrieved on September 1st, 2015 from https://msdn.microsoft.com/en-us/library/windows/desktop/aa363804(v=vs.85).aspx.

Comment: Programming focused with API support.


McCurdy, Collin, and Jeffrey Vetter, (March 2010). Memphis: Finding and Fixing NUMA-related Performance Problems on Multi-core Platforms, ISPASS-2010: 2010 IEEE International Symposium on Performance Analysis of Systems and Software, March 28-30, 2010, White Plains, NY.

Levinthal, David. Performance Analysis Guide for Intel® Core™ i7 Processor and Intel® Xeon® 5500 processors , v1.0, Intel Developer Zone. Retrieved on September 1st, 2015 from https://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdf.

Comment: Use of VTune to look at NUMA.

Lachaize, Renaud, Baptiste Lepers, and Vivien Quéma. (June 2012), MemProf: a Memory Profiler for NUMA Multicore Systems, 2012 USENIX Annual Technical Conference, June 13-15, 2012, Boston, MA.

Zickus, Don. (May 31st, 2013). Dive deeper in NUMA systems, Red Hat Developer Blog. Retrieved on September 1st, 2015 from http://developerblog.redhat.com/2013/05/31/dive-deeper-in-numa-systems.

Intel Corporation (March 1st, 2010). Detecting Memory Bandwidth Saturation in Threaded Applications, Intel Developer Zone. Retrieved on September 1st, 2015 from https://software.intel.com/en-us/articles/detecting-memory-bandwidth-saturation-in-threaded-applications/.


Ott, David. (November 2nd, 2011). Optimizing Applications for NUMA, Intel Developer Zone. Retrieved September 1st, 2015 from https://software.intel.com/en-us/articles/optimizing-applications-for-numa.

Comment: There is a considerably older version of this article (2004) that is still accessible.

Hently, David. (June 2012). Multicore Memory Caching Issues – NUMA. Series from Channel Cscsch, Centro Svizzero di Calcolo Scientifico. Presented at the PRACE Summer School 21-23 June 2012 - Summer School on Code Optimisation for Multi-Core and Intel MIC Architectures at the Swiss National Supercomputing Centre in Lugano, Switzerland. Video retrieved on September 1st, 2015 from https://www.youtube.com/watch?v=_cmViSD6Quw&index=17&list=PLAUXS_xuCc_rjvp-lJliGFtBPWpKNAY-y.

Mario, Joe and Don Zickus. (August 2013). NUMA - Verifying it's not hurting your application performance, Redhat Developer Exchange, August 27, 2013, Boston, MA, USA. Retrieved September 1st, 2015 from http://developerblog.redhat.com/2013/08/27/numa-hurt-app-perf/.


Leis, Viktor, Peter Boncz, Alfons Kemper and Thomas Neumann. (June 2014). Morsel-Driven Parallelism: A NUMA-Aware Query, Evaluation Framework for the Many-Core Age, SIGMOD’14, June 22–27, 2014, Snowbird, UT, USA. Retrieved September 1st, 2015 from http://www-db.in.tum.de/~leis/papers/morsels.pdf.

Li, Yinan, Ippokratis Pandis Rene Mueller, Vijayshankar Raman and Guy Lohman. (January 2013). NUMA-aware algorithms: the case of data shuffling, 6th Biennial Conference on Innovative Data Systems Research (CIDR’13), January 6-9, 2013, Asilomar, California, USA. Retrieved September 1st, 2015 from http://www.cidrdb.org/cidr2013/Papers/CIDR13_Paper121.pdf.






Taylor Kidd is an engineer and frequent contributor to the Intel Developer Zone. He currently works on the Intel® Xeon Phi™ Scale Engineering Team producing developer facing content, and answering a variety of developer questions. Taylor has worked in a variety of fields in the past, including HPC, embedded systems, research and teaching.



For more complete information about compiler optimizations, see our Optimization Notice.