Developer Guide

  • 2021.2
  • 06/11/2021
  • Public
Contents

Cache Configurator

Before using the data streams optimizer and cache allocation tools on the same system, see the instructions in Compatibility between Data Streams Optimizer and Cache Allocation to avoid possible technical and performance issues.
The cache configurator (
tcc_cache_configurator
) is a command-line tool that enables you to discover and manage cache resources. Low-level system resources, such as cache, memory, or CPU cores, are traditionally managed by operating systems or hypervisors. Having control over the OS or hypervisor is typically a role reserved for system administrators. Therefore, the decision to partition cache or reserve a portion of the cache for software SRAM is the responsibility of the system administrator, or someone who has platform-level visibility and the responsibility to divide up physical resources. Application developers are the consumers of software SRAM buffers, after they have been provisioned on the system.
The tool is intended for system integrators or administrators who have been given a requirement to either:
  • Provide low-latency buffer access (via software SRAM buffers) to real-time applications running on the system (via the cache allocation library API).
  • Provide mechanisms to improve the worst-case execution time (WCET).
  • Minimize the impact the GPU has on real-time applications running on the CPU cores.
  • Partition the shared cache resources among various components using the cache (such as CPU, GPU, or I/O), referred to in this guide as
    caching agents
    .
The tool simplifies techniques that address these requirements, namely software SRAM buffer management and cache partitioning. By using the tool’s interface, you can accomplish these complex tasks without the need to directly configure the low-level details of the cache architecture. You can:
  • Select from a variety of preset cache partitioning schemes. The presets provide varying levels of cache isolation and software SRAM to cover the most typical scenarios and it is highly probable that available presets will be suitable for your use case.
  • Create a custom partitioning scheme. If you need a custom or more flexible setup, the tool offers an interactive interface to guide you through the process of adding or deleting software SRAM, as well as dividing the remaining cache among caching agents.

What Is a Cache Partitioning Scheme?

When a caching agent makes a request to allocate a new cache line into the cache, a victim cache line must be identified, evicted, and the data written back to memory prior to depositing a new cache line. If an application incurs too many cache misses as a result of the activity from other caching agents, then the application will see reduced performance. This is known as the
noisy neighbor effect
.
Creating partitions in the cache to isolate certain agents from others can help to minimize the noisy neighbor effect. By default, most caching agents are configured to use the entire cache, effectively sharing the cache amongst all caching agents without any partitions. This yields maximum peak performance for all of the caching agents. In many real-time designs, the GPU is considered a noisy neighbor and full GPU performance is often not required. Changing how much cache the GPU can use will minimize the GPU as a noisy neighbor.
A
cache partitioning scheme
controls which caching agents can allocate into the cache and more specifically where they can allocate into the cache. A
cache way
is the smallest portion of cache that you can reserve via the tool. The size of one cache way varies depending on processor and cache level. The cache configurator tool enables you to control the allocation of individual cache ways by selecting presets or by allocating them yourself.

Developer Workflow

If you have completed the steps in the Get Started Guide, you applied a preset that has dedicated cache for two real-time workloads and enough software SRAM to run the cache allocation sample.
Before trying different presets, adding software SRAM, or customizing a cache partitioning scheme, Intel recommends the following process:
  1. First, determine how much of the cache should be reserved for software SRAM regions. Once cache space is reserved for software SRAM, it is no longer available to the rest of the system and is only accessible via the Cache Allocation Library.
  2. Determine how to partition the remaining cache between CPU cores, GPU, and I/O. Considerations:
    • Sharing cache between multiple caching agents (CPU cores, GPU, and I/O) generally leads to increased jitter under loaded conditions.
    • Isolating cores, GPU, and I/O will improve the noisy neighbor effect.
    • If App1 and App2 are affinitized to Core 1 and Core 2, respectively, consider using Classes of Service to differentiate the cache space available to each core. Intel supports multiple Classes of Service which enable Core 1 to have a potentially separate, non-overlapping cache region compared with Core 2. If App1 is a real-time application, having dedicated cache space may be desireable to minimize the impact App2 has on App1’s performance.
    • Starting in 11th Gen Intel® Core™ processors on Intel® Core™-based products, real-time I/O traffic (designated via Traffic Class 1) can allocate directly into the L3 cache. If the I/O traffic is time sensitive, it will be faster for the CPU to access the data if it resides in the cache (versus DRAM). Consider allocating a small portion of the cache for I/O traffic.
    • If the integrated GPU is going to be used, consider minimizing the portion of the L3 cache available to the GPU. By default, Intel enables maximum GPU performance by providing access to the entire L3. For real-time designs, maximum GPU performance is often not needed and a smaller portion of the L3 cache can be used. Careful selection of the cache available to the GPU, ensuring no overlap with regions dedicated to real-time applications, will improve the noisy neighbor effect of the GPU.
Intel expects that the tool be used during the development phase to achieve an optimal cache partitioning scheme as determined by the system integrator, with feedback from application developers. If cache partitioning requirements change after a system has been deployed to production, you can specify a new cache partitioning scheme, including software SRAM buffers, simply by rerunning the tool on the target system. Because software SRAM and cache partitioning requirements are communicated through firmware, if a system in production implements security measures that lock the BIOS region, additional steps may be required before the updated configuration can be applied and are not within the scope of the tool.

Cache Configurator and Software SRAM Setting

The usage of the cache configurator tool and the Software SRAM setting in BIOS are decoupled, which means it is possible to program a cache partitioning scheme that has reserved cache space for software SRAM while the BIOS setting is set to Disabled. When this happens, the cache space reserved for software SRAM will not be enabled via any of the cache capacity bitmasks, effectively going unused. Users can reclaim this cache space at runtime by modifying the capacity bitmasks to enable the cache ways previously reserved for software SRAM, or you can re-run the cache configurator tool and apply a new partitioning scheme that does not include software SRAM buffers.
For details, see Software SRAM Setting.

Cache Configurator and Cache Allocation Library

Intel does not place any limits on how developers choose to use software SRAM buffers once they are created. Application developers can use the Cache Allocation Library to programmatically place user-space data into a software SRAM buffer. To accomplish this, application developers specify:
  • Size of the buffer required
  • Worst-case access latency for a single element in the buffer
For the cache allocation library to use cache, there must be an existing software SRAM buffer created that can satisfy the requirements of the buffer request. System integrators and administrators need to determine the location and size of software SRAM buffers required, depending on where the applications that use the cache allocation library are intended to run, and how much memory they may need. This can be a balancing act and may require multiple iterations between application developers and system integrators or administrators.

Dependencies

The tool is dependent on the following underlying software components. They are available as part of the Intel’s best known configuration or as otherwise noted.
  • Real-time configuration driver at the OS level.
  • Real-time configuration manager (RTCM) with the cache reservation library (CRL), or a hypervisor that supports CRL.
  • Real-time configuration data (RTCD) and real-time configuration table (RTCT) at the BIOS level.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.