Optimizing Software Applications for NUMA: Part 6 (of 7)

3.3 Data Placement Using Explicit Memory Allocation Directives

Another approach to data placement in NUMA-based systems is to make use of system APIs that explicitly configure the location of memory page allocations. An example of such APIs is the libnuma library for Linux.[1]

Using the API, a programmer may be able to associate virtual memory address ranges with particular nodes, or simply to indicate the desired node within the memory allocation system call itself. With this capability, an application programmer can insure the placement of a particular data set regardless of which thread allocates it or which thread accesses it first. This may be useful, for example, in schemes where complex applications make use of a memory management thread acting on behalf of worker threads. Or, it may prove useful for applications that create many short-lived threads, each of which have predictable data requirements. Pre-fetching schemes are another area that could benefit considerably from such control.

The downside of this scheme, of course, is the management burden placed on the application in handling memory allocations and data placement. Misplaced data may cause performance that is significantly worse than default system behavior. Explicit memory management also presupposes fine-grained control over processor affinity throughout application use.

Another capability available to the application programmer through NUMA-based memory management APIs is memory page migration. In general, migration of memory pages from one node to another is an expensive operation and something to be avoided. Not only is there the cost of migrating the data, but all associated memory references must be discovered and modified to observe the new mapping. As the remapping is taking place, pages must be removed from operating system page lists and detached from normal swapping mechanisms.

Having said this, given an application that is both long-lived and memory intensive, migrating memory pages to re-establish a NUMA-friendly configuration may be worth the price.[3] Consider, for example, a long lived application with various threads that have terminated and new threads that have been created but reside on another node. Data is now no longer local to the threads that need it and sub-optimal access requests now dominate. Application-specific knowledge of a thread’s lifetime and data needs can be used to determine whether an explicit migration is in order.

Finally, the API may provide functions for obtaining page residency or for examining memory access behavior under the current configuration. Such tools may provide the means to implement a monitoring scheme that makes explicit migration adjustments when memory accesses within the NUMA context fall below a defined threshold.

[1] Drepper, Ulrich. “What Every Programmer Should Know About Memory”. November 2007.
[3] Lameter, Christoph. “Local and Remote Memory: Memory in a Linux/NUMA System”. June 2006.

Para obtener información más completa sobre las optimizaciones del compilador, consulte nuestro Aviso de optimización.