In today’s complex data center and enterprise deployments, conditions may arise where memory bandwidth is constrained under heavy system load, for instance when consolidating many virtual machines onto a server system. In other cases, the performance or responsiveness of certain applications might depend on having a given amount of memory bandwidth available to perform at an acceptable level, and the presence of other consolidated applications may lead to performance interference effects.
Intel® Resource Director Technology (Intel® RDT) provides a series of technologies to provide increased monitoring insight into how shared system resources such as the last-level cache and memory bandwidth are used and controls over the last-level cache. Previously available Intel® RDT features do not provide control over memory bandwidth, however.
The Intel® Xeon® Scalable processors introduce Memory Bandwidth Allocation (MBA), which provides new levels of control over how memory bandwidth is distributed across running applications. MBA enables improved prioritization, bandwidth management and is a valuable tool to help control data center noisy neighbors.
This article describes the fundamental architectural principles of MBA, including software enabling and usage considerations. It includes a variety of resources to consult for further information, including sample data from the latest platforms.
The MBA feature provides approximate and indirect per-core control over memory bandwidth. A new programmable bandwidth controller has been introduced between each core and the shared high-speed interconnect which connects the cores in Intel® Xeon® processors. As shown in Figure 1, this enables bandwidth downstream of shared resources such as memory bandwidth to be controlled.
Figure 1. MBA high-level overview showing programmable request rate controller between the cores and high-speed interconnect
MBA is complementary to existing Intel RDT features such as Cache Allocation Technology (CAT), also shown in Figure 1. For instance, CAT may be used to control the last-level cache, while MBA may be used to control memory bandwidth.
The MBA feature extends the shared resource control infrastructure introduced with CAT, described in a series of articles beginning with Introduction to Cache Allocation Technology.
The CAT architecture defines a per-software-thread tag called a Class of Service (CLOS or COS), which enables running threads, applications or VMs to be mapped to a particular bandwidth setting in the underlying hardware, as shown in Figure 2. The concept of a CLOS and the per-thread IA32_PQR_ASSOC MSR which maintains this association are described at length in a prior article describing CAT usage models and software interfaces.
Figure 2. Class of Service (CLOS) enable flexible control over threads, apps, VMs, or containers, given software support.
Through CPUID-based enumeration as described in the Intel Software Developers Manuals (SDM), the presence of the MBA feature can be confirmed on a specific processor. Once enumerated as present, details such as the number of supported classes of service and MBA feature specifics such as throttling modes supported may be enumerated.
In typical usages an enabled Operating System (OS), hypervisor or Virtual Machine Monitor (VMM) will maintain the association of threads to CLOS. Typically when a software thread is swapped onto a given logical processor, the IA32_PQR_ASSOC MSR will be updated to reflect the CLOS of the thread (CAT example in Usage Models for Cache Allocation Technology in the Intel® Xeon® E5 processor version 4), which allows hardware to select the correct memory bandwidth limit to apply (Figure 3).
Figure 3. The current class of service (CLOS) for a thread can be specified using the IA32_PQR_ASSOC MSR, which is defined per hardware thread.
MBA bandwidth limits per-CLOS are specified as a value in the range of zero to a maximum supported level of throttling for the platform (available via CPUID), typically up to 90% throttling, and typically in 10% steps. These steps are approximate, and represent a calibrated value mapped to a known bandwidth-intense series of applications to provide bandwidth control. The resulting bandwidth for these calibration points provided may vary across system configurations, generations and memory configurations, so the MBA throttling delay values should be regarded as a hint from software to hardware about how much throttling should be applied.
As with CAT, when the system boots, all threads are associated with CLOS, and CLOS is configured to have no throttling applied, meaning a delay value of zero. Software may update the requested throttling values at any time, for instance, to increase throttling for low-priority virtual machines (VMs) as more system memory bandwidth load is detected.
It should be noted that because MBA throttles accesses to the last-level cache (LLC), care should be taken to not throttle applications which are LLC-intense, but not memory-intense, as this may decrease system efficiency or throughput. The Memory Bandwidth Monitoring (MBM) feature of Intel RDT enables advanced memory monitoring capabilities per-thread/app/VM/container (with corresponding software support) which enables bandwidth-intense applications to be identified for subsequent control. Such bandwidth-intense applications are often called noisy neighbors in the data center environment.
MBA provides per-core controls over bandwidth, and the specific delay value specified to be applied is resolved through the equation shown in Figure 4.
Figure 4. MBA delay value resolution examples
As MBA is a per-core feature, the correct delay value to apply to threads running on the core takes both threads into account when Intel® HyperThreading Technology (Intel®HT Technology) is enabled. In such cases, the maximum delay value is calculated from the delay values specified for the two running threads on the core. This means as in the case of Core 1 in Figure 4, if one thread is mapped into CLOS[1) with delay value 60, and the other thread on the core is mapped into CLOS with delay value 50, the applied delay value will be 60, which provides the best control over noisy neighbor threads. If high and low priority threads are frequently co-scheduled, operating system tuning may be valuable to optimally place high and low priority threads to ensure the correct throttling behavior relative to the priorities specified.
Memory Bandwidth Allocation support is provided in the various OS and VMM software including Linux*, KVM* and Xen*. Under Linux, kernel version 4.18 and newer are recommended, as an advanced software controller is implemented which uses MBM inputs and MBA controls to provide an interface through which administrators may provide a bandwidth capping value. Further details on this software controller are available on Memory bandwidth allocation software controller(mba_sc).
Many of the other Intel RDT features such as Cache Monitoring Technology (CMT) and CAT may be combined in similar closed-loop controllers to provide more advanced and dynamic functionality. Advanced orchestration software, for instance, may be able to combine multiple monitoring inputs and resource controls to improve prioritization capabilities, better meet service level objectives (SLOs) and improve workload job placement.
Another example of Intel RDT software support is the rdtset utility, available on GitHub*, Rdtset. This utility provides the capability to launch new commands with a user-specified CPU affinity mask, and Intel RDT allocation settings such as a CAT bitmask or MBA delay value. This utility leverages the Linux* OS support for managing Classes of service per-thread/app dynamically and managing the associated bandwidth controls.
Figure 5. MBA may be used in concert with other RDT features such as MBA, and software tools such as Intel® Performance Counter Monitor (Intel® PCM) and VTune®
Similarly, the Intel RDT utility, available at GitHub, provides support for all of the Intel RDT features available on a given processor, and verbose feature enumeration capabilities. In cases where OS support is present such as Resource Control Groups (resctrl, User Interface for Resource Allocation in Intel Resource Director Technology, GitHub Resctrl page), the utility provides a convenient interface atop the resctrl file system. In cases where OS support is not present, the utility can use the Model-specific Register (MSR) interfaces to provide direct per-core controls over Intel RDT monitoring and allocation technologies including MBA.
Multiple software tools may be combined to provide both monitoring input and control as shown in Figure 5. For instance, either the resctrl interface, rdtset, or the Intel RDT utility can be used to configure bandwidth limits. Software such as (Intel® PCM)1 may be used to retrieve CMT and MBM monitoring information, or tools such VTune™ may be used to monitor overall memory bandwidth from the memory controller.
This article provides a brief overview of MBA, but many other resources are available including technical articles, whitepapers, example MBA data and software enabling. These resources are organized for Intel RDT at the Useful-links page on GitHub. This page is updated as new content becomes available so checking back periodically is recommended.
The page is hosted within the same repository that hosts the Intel-CMT-CAT utility, which provides valuable sample code, and a way to use Intel RDT on non-enabled OSes, or OSes in which the kernel cannot be modified such as legacy deployments.
The introduction of the MBA feature enables another advanced Intel RDT control feature over shared resources, in this case, memory bandwidth, through a programmable controller located between each core and memory.
MBA enables orchestration software or enabled OSes and VMMs to take charge of how memory bandwidth is allocated, including adjusting the fraction of memory bandwidth available to applications even dynamically at runtime as system conditions change.
Through complementary features in the Intel RDT feature set such as MBM, platform telemetry and insight can be gained about how memory bandwidth is being consumed at runtime, allowing adjustments using MBA, or advanced software controllers to be constructed.
MBA provides another tool to help control noisy neighbors in complex data center environments, ensuring that performance goals can be maintained as system conditions change.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804