Optimizing Software Applications for NUMA: Part 6 (of 7)

3.3 Data Placement Using Explicit Memory Allocation Directives

Another approach to data placement in NUMA-based systems is to make use of system APIs that explicitly configure the location of memory page allocations. An example of such APIs is the libnuma library for Linux.[1]

Optimizing Software Applications for NUMA: Part 5 (of 7)

3.2. Data Placement Using Implicit Memory Allocation Policies

In the simple case, many operating systems transparently provide support for NUMA-friendly data placement. When a single-threaded application allocates memory, the processor will simply assign memory pages to the physical memory associated with the requesting thread’s node (CPU package), thus insuring that it is local to the thread and access performance is optimal.

Optimizing Software Applications for NUMA: Part 1 (of 7)

1. The Basics of NUMA

NUMA, or Non-Uniform Memory Access, is a shared memory architecture that describes the placement of main memory modules with respect to processors in a multiprocessor system. Perhaps the best way to understand NUMA is to compare it with its cousin UMA, or Uniform Memory Access.

In the UMA memory architecture, all processors access shared memory through a bus (or another type of interconnect) as seen in the following diagram:

Threading Fortran applications for parallel performance on multi-core systems

Advice and background information is given on typical issues that may arise when threading an application using the Intel Fortran Compiler and other software tools, whether using OpenMP, automatic parallelization or threaded libraries.
  • Apple OS X*
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Fortran
  • Compilatore Fortran Intel®
  • performance
  • mult-core
  • OpenMP*
  • Ottimizzazione
  • Threading
  • Iscriversi a performance