Figure 1. Steps to optimize for real-time improvements.
Figure 2. Illustration of how the board support package, Intel TCC Mode in the BIOS, and Intel TCC Tools work together as a process.
Allocate Cache for Real-Time Applications
- Uses a command-line tool to discover and manage cache memory resources to add, modify, or delete buffers at varying levels in the cache and memory hierarchy
- Divides the remaining cache resources among various components (such as CPU, GPU, or I/O) without the need to learn the low-level details of the cache architecture
The following buffer will be created:
BUFFER 1 LATENCY(ns): 100 CACHE LEVEL: 2 CPU CORE: 3 BUFFER SIZE(bytes): 262144
Allocate Buffers Effectively Across Platforms
Cache Allocation Library
APIs contained in this library create buffers that meet specified latency requirements.
To use the library, you need to know latency and size of the dataset that your application processes, as well as the hot spots in your application’s code that are the most latency sensitive.
- Malloc replacement for reliable low latency
- Target cache misses and other sources of memory access latency
- Simple, familiar API signature
- Abstracts the complexity of cache architecture
- No code changes needed to achieve the same latency on supported Intel® processors
To create a buffer, specify its size and maximum tolerable latency for access:
/* The example parameters specify a 64-byte buffer and 20-nanosecond latency requirement. */ void *mem = tcc_buffer_malloc(64, 20);
Instrument Your Code to Analyze Performance
Use this lightweight library for instrumenting user space applications to collect latency measurements.
- Measures worst-case execution time (WCET) and other latency statistics in processor clock cycles and time units
- Enables minimal runtime overhead and high measurement precision
- Each measurement adds no more than 610 ns overhead†
- Accurately measures intervals starting from 60 ns*
- Tracks deadline violations
- Stores latency values in a shared memory ring buffer to be processed by an external application
- Uses the Instrumentation and Tracing Technology API (ITT API) to support task visualization and system-wide analysis in tools such as Intel® VTune™ Profiler, which does low-level application performance analysis
- Take advantage of samples to get started, which demonstrate methods for measurement data analysis including latency histograms and deadline monitoring
†Based on specific configurations and workloads.
Use this function at the beginning of the code block you want to analyze:
/* Get the start time of the measured code block from the processor time stamp counter (TSC). The name is a pointer to __itt_string_handle to identify the measurement. */ __itt_task_begin(domain, __itt_null, __itt_null, name);
Use this function at the end of the code block:
/* Get the end time from the TSC and calculate the difference between the start and end times to derive the latency of one iteration. */ __itt_task_end(domain)