How NUMA Affects your Workloads: Intel® VTune™ Amplifier


Many modern multi-socket systems are based on non-uniform memory access (NUMA), where access latency and bandwidth depend on the location of the physical memory relative to its use. The art of memory object placement in a NUMA system is in finding patterns to drive heuristics. Introducing new memory hierarchy elements (e.g., high-speed, on-package MCDRAM in Knights Landing) introduces an additional NUMA factor for which a performance-minded engineer needs to account. Understanding the effect of memory object placement on the memory subsystem is key to extracting the best performance out of your platform. We will demonstrate how to use Intel® VTune™ Amplifier to analyze memory objects (dynamic, global, and stack), understand the effects of your choices in data placement on an object basis, and extract the best possible performance out of your system.

Download Slides [PDF 1.47MB]





英特尔的编译器针对非英特尔微处理器的优化程度可能与英特尔微处理器相同(或不同)。这些优化包括 SSE2、SSE3 和 SSSE3 指令集和其他优化。对于在非英特尔制造的微处理器上进行的优化,英特尔不对相应的可用性、功能或有效性提供担保。该产品中依赖于微处理器的优化仅适用于英特尔微处理器。某些非特定于英特尔微架构的优化保留用于英特尔微处理器。关于此通知涵盖的特定指令集的更多信息,请参阅适用产品的用户指南和参考指南。

通知版本 #20110804