Developer Guide

  • 2021.2
  • 06/11/2021
  • Public
Contents

About the Sample

The cache allocation sample has two components: a real-time workload and an internal noisy neighbor. The noisy neighbor is “internal” because it runs on the same processor core as the workload and is part of the same application.
The sample provides command-line options. You can:
  • Run the workload with the internal noisy neighbor
  • Run the workload with your choice of an external noisy neighbor.

Real-Time Workload

The workload is designed to simulate a real-time task. The workload allocates a buffer, performs a task, and measures the buffer access latency through the Measurement Library.
The workload allocates a buffer in L2 cache, L3 cache, or DRAM based on the latency requirement you provide via the command-line option. This requirement is the same as that of the latency parameter of the cache allocation library.
The workload’s task is a random pointer-chase.
The workload is designed to run on Core 3, which is aligned with the isolcpus kernel boot parameter of the Yocto Project*-Based Image.

Internal Noisy Neighbor

The internal noisy neighbor competes with the workload for memory access. The internal noisy neighbor is a linear pointer-chase.
The internal noisy neighbor also runs on Core 3.

Implementation

When the sample runs the workload and internal noisy neighbor together, the implementation is sequential. The sample runs the workload for a while and then runs the noisy neighbor for a while to evict the cached data.
The main purpose of this implementation is to provide the most visible demonstration of the performance effect of the cache allocation library. This approach simulates a “phased” workload, where the time-critical phase has a deadline and the less critical phase is considered a noisy neighbor. In this case, the less critical phase evicts the cached data of the more critical phase. The implementation may be rare among typical real-time use cases, but highlights the direct effect of the cache allocation library.
In more common cases, the real-time workload is running on an isolated core, and data is evicted from cache by uncontrolled activity on a core that shares the same cache. Cache Allocation Technology (CAT) can be used to mitigate this scenario. The cache allocation library provides an additional capability to lock only critical data so there’s no self-eviction of the real-time workload critical data by less critical data. While CAT may be enough in some scenarios, the cache allocation library is recommended for more explicit control of buffer access latency.
To address the more common cases, the sample also provides an option to run the workload with an external noisy neighbor of your choice. In this case, the workload and noisy neighbor run simultaneously.
The following diagram illustrates the behavior of the internal noisy neighbor vs. an external noisy neighbor:
To understand the effects of each type of noisy neighbor, you will need to know whether the cache being locked is private (each core has its own cache) or shared (multiple cores share the cache). See Cache Architecture.
If the L2 cache is private, you can expect an improvement in maximum latency when running the workload and internal noisy neighbor.
If the L2 cache is shared, you can expect an improvement when running the workload and internal noisy neighbor, or when running the workload and an external noisy neighbor.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.