Intel's Cache Monitoring Technology (CMT) feature was introduced with the Intel® Xeon® E5 2600 v3 product family in 2014.
CMT is part of a larger series of technologies called Intel(r) Resource Director Technology (RDT). More information on the Intel RDT feature set can be found here, and an animation illustrating the key principles behind Intel RDT is posted here.
This feature enables fine-grained tracking of L3 cache occupancy, enabling detailed profiling and tracking of threads, applications or VMs.
Previous blog posts referenced below provide an overview of various aspects of the feature:
- Product page: https://software.intel.com/en-us/articles/intel-xeon-e5-2600-v3-product-family
- Part 1: Introduction to CMT: https://software.intel.com/en-us/blogs/2014/06/18/benefit-of-cache-monitoring
- Part 2: Discussion of RMIDs and CMT Software Interfaces: https://software.intel.com/en-us/blogs/2014/12/11/intel-s-cache-monitoring-technology-software-visible-interfaces
- Part 3: Use Models and Example Data: https://software.intel.com/en-us/blogs/2014/12/11/intels-cache-monitoring-technology-use-models-and-data
This blog, the fourth in the series, discusses details of available Operating System (OS) support, and software packages which can be used to test the feature.
Key ingredients discussed in this installment include the Linux* operating system, the perf profiling suite and a software package available from Intel which can be used on POSIX Operating Systems to monitor the L3 cache usage of applications (or pinned VMs) on a per-app/VM basis by pinning apps/VMs to cores.
Standalone vs Scheduler based monitoring
Using the CMT capabilities is straightforward from a code development perspective since model specific registers (MSR) provide the interface to set up and query this capability. All modern Operating Systems provide application programming interface (API’s) or tools which enable users with the appropriate privilege to read and write the MSR’s. Linux* provides the msr-tools package which integrates both the readmsr and writemsr commands. Microsoft Windows* provides a similar interface. There are two high level approaches to Cache Monitoring:
- Standalone Cache Monitoring looks at the Last Level Cache usage from a Core or Logical Thread (referred to as CPU hereafter) perspective, regardless of what task is executing. An RMID is statically assigned to a CPU and periodically the occupancy is read back. If the platform has been statically configured and applications have been pinned to resources then this method will yield appropriate results. If system administrators are interested in whether the platform is suitably balanced and there are no misbehaving applications, this approach is quite reasonable.
- Scheduler-based Cache Monitoring involves the operating system scheduler. This is the preferred method in many cases, as when RMIDs are assigned statically, following the Standalone method previously discussed, they do not track the process or thread ID as it may migrate across cores and therefore occupancy cannot be reported accurately on a per-application basis. In order to track the applications' occupancy, scheduler changes are required for optimal accuracy. In such cases, software would be required to assign an RMID to a process, and in turn the scheduler would need to associate the core with the appropriate RMID when the application of interest is scheduled to execute on a CPU. When the application is de-scheduled or migrated to a different core or thread, the scheduler is required to update RMID assignment to make sure that the RMID-to-core mapping accuracy is maintained so occupancy is only tracked when the application is executing. Systems software is also responsible for any remapping across CMT settings which may be required across processor sockets. Since RMIDs are defined to be local to each socket, for instance, if an application with a given RMID is moved to another processor the OS or VMM is responsible for finding an available RMID on the destination socket to track the migrated application (if monitoring is required).
To enable standalone and scheduler based monitoring several software development initiatives are in progress that are described in subsequent sections.
Scheduler Based Monitoring – Linux* Operating System Support Overview
Scheduler based Cache Monitoring makes sure that the application of interest will be tracked with appropriate core and RMID association. Under Linux* this is achieved by integrating CMT into perf and its kernel support which is tightly bound to the Linux* scheduler's functionality.
In supported platforms (where both the processor and OS have support for CMT), perf is used to specify which process or thread is to be monitored and assigns it an RMID. All threads not being monitored will be assigned a default RMID used to capture the occupancy associated with those threads not specifically being monitored. Once perf configures the system for monitoring, context switches for the monitored threads result in a callback into the perf_events subsystem. When the CMT callback from the scheduler occurs (during the ‘context_switch’ kernel function), the perf_events subsystem selects the RMID associated with the thread being scheduled and assigns it to the CPU. The associated RMID may be for explicit monitoring or the default RMID in the case where the scheduled thread has not been configured for monitoring. From this point until the next context switch, the memory read requests and their subsequent cache loads from this logical processor will be assigned to the RMID just set up.
When a process or thread that is tracked for Cache occupancy terminates or the sched_out function call occurs, the perf CMT callback functionality selects a new RMID. In this instance the default RMID will be selected so that cache loads are not counted towards any explicitly monitored thread. After the monitored process terminates the associated RMID will be returned to a pool of unused RMID and will be recycled for new monitoring request. Mainstream support for these capabilities is trending to the Linux* kernel version 3.19.
CMT (Cache Monitoring Technology) Perf Implementation: The Linux* perf application provides an interface into kernel based performance counters. An extension has been developed to support the Cache Monitoring Technology feature. This allows users to monitor last level cache occupancy on a per-process or per-thread basis. The name of the new event is intel_cqm/llc_occupancy/. This new event returns the occupancy in bytes. The patches to perf and Linux* kernel are available from the following:
The Perf driver module will check for CMT hardware availability using the CPUID instruction (see  to learn about it). If CMT has been detected a number of function calls will be registered with Perf. Below are some of the registered events and their functionality:
- Event Handled – Start perf monitoring on a PID or TID
- CMT callback –Allocates and Sets an unique RMID per PID/TID
- Event Handled – Start perf monitoring on a PID or TID after event_init
- CMT callback – Starts the monitoring capability
- Event Handled – on Schedule in of monitoring PID/TID
- CMT callback – Sets the monitoring capability on the scheduled core
- Event Handled – on Schedule out of monitoring PID/TID
- CMT callback – Resets the monitoring capability on the scheduled out core
- Event Handled – Read monitoring counters for the PID/TID
- CMT callback – Read CMT occupancy value from MSR with RMID associated with the PID/TID
- Event Handled – End of perf monitoring on a PID or TID
- CMT callback – Resets the monitoring capability and clears all the allocated RMID
To make sure that the occupancy associated with CPU is accurate the Perf kernel component associates the RMID only with the specific application thread while it is running on the CPU. As explained in the previous section, when the Linux scheduler swaps the process the RMID will no longer be associated with the core. In addition to RMID tracking, Perf also has process or thread inheritance support (any child process will inherit the RMID of its parent).
Basic operation of perf with CMT:
User Space CMT APIs
The motivation for proposing to use a limited set of User Space CMT APIs is to provide easier usage and integration of CMT into applications. This enables developers to use a small subset of API to retrieve cache occupancy information within their applications. Such unified access API implementation methodology would provide better management of shared level platform resources like RMIDs, access to MSRs etc.
Below are proposed functions which would wrap around the perf_event system calls. It will help tracking cache occupancy for task/pids.
- pqos_register_cmt(taskid, cpu) : This API provides pid/tid along with cpuid which needs to be tracked for CMT. Internally, perf will internally take care of RMID assignment, RMID recycling with scheduler implementation.
- pqos_get_cmt_occupancy(taskid, cpu) : This API reports last level cache occupancy for registered task.
- pqos_unregister_cmt(taskid, cpu): This API provides way to unregister tasks and release all associated RMID which were tracking this task to monitor level cache occupancy.
Research is ongoing to provide a user space library that would allow developers or system administrators to take advantage of CMT without the need to consider how many RMIDs may be available and other RMID management task while tracking applications.
The proposed design and placement for the API implementation in depicted in this diagram:
Virtual Machine Monitor Support (KVM & Xen)
Since KVM is a type two hypervisor it inherits the scheduler enhancements discussed in the previous section. Administrator or developers can utilize perf to track the last level cache occupancy of a virtual machine. The process or thread id’s of the virtual machines can be retrieved from the operating system through top or the Qemu monitor.
Since Xen is type 1 hypervisor scheduler enhancements will need to be made to track the last level cache occupancy. Xen 4.5 will be the first version which includes CMT support. The hypervisor implementation associates an RMID with each Domain (DomU or guest VM). Those that have been specified for monitoring will be associated with their own RMID while those not specified will be associated with the default RMID used to collect all non-monitored occupancy data. As the hypervisor schedules each domain on to a CPU and performs the context switch it also writes the RMID to the CPU specific MSR thus associating this CPU with the RMID and its associated domain. So long as the domain continues to run on the CPU the last level cache (LLC) occupancy resulting from domain memory reads from the CPU will be tracked via the RMID specified for the domain. When the next domain is scheduled for this CPU and the current monitored domain is switched out, its associated RMID is replaced on the CPU so no further association exists.
Xen’s xl command tool include a few additions to support CMT. The additions allow users to attach monitoring to a domain, detach monitoring and to show the LLC occupancy information. The command line tool additions have the following form:
$ xl psr-cmt-attach domid
$ xl psr-cmt-detach domid
$ xl psr-cmt-show cache_occupancy
In the above example commands, domid is the id number of the domain (guest VM) of interest.
Multi-OS Support via the Standalone Cache Monitoring Technology Library
This standalone library (available soon at: https://01.org/packet-processing/cache-monitoring-allocation-technology) enables developers to monitor the last level cache occupancy on per CPU basis without the need for OS enabling support (via the Standalone Cache Monitoring technique discussed earlier). When the library / application initially comes up it will check for the Cache Monitoring support. Once initialization is complete the monitoring functionality provides a “top”-like interface listing the last level cache occupancy on a per-CPU basis. The library implements a number of API's that enable developers to take advantage of CMT without the need to setup the MSRs that configure the RMID assignment or retrieval of the last level cache occupancy data. Developers may also utilize the library from within a virtual machine, however either paravirtualization (PV) technique or MSR bitmaps may be required to gain access to the CMT Model Specific Registers, and in general using the library from the host OS is preferred.
Other Operating Systems and VMM
Additional OSes and VMMs will be enabled over time. Check the documentation or feature list for your preferred OS/VMM to determine if CMT is supported on a particular version.
If your preferred OS/VMM doesn’t yet support CMT their customer support organization may be able to track the feature request and provide an estimated time when support will be ready.
Several mainstream OSes and VMMs now include support for Intel's Cache Monitoring Technology (CMT), and for non-enabled OSes a software library will be available via 01.org to enable experimentation, prototyping of resource management heuristics and deployment of the features.
* Other names and brands may be claimed as the property of others