Intel® Time Coordinated Computing Tools (Intel® TCC Tools)

Improve Performance of Latency-Sensitive Applications

Overview    Documentation    Download

Time As Performance

Intel® processors are multipurpose and can serve a wide range of use cases including data analysis in the cloud, gaming PCs and traditional office laptops, and edge devices. Intel® Time Coordinated Computing (Intel® TCC) is a new set of features that augments the compute performance of Intel® processors to address the stringent temporal requirements of real-time applications. Intel TCC reduces jitter and improves performance for latency sensitive applications. It helps to maximize efficiency by aggregating time-critical and non-time-constrained applications onto a single board.

 

While Intel TCC features reside in the processor, their full potential is unlocked when the whole solution stack is optimized top to bottom. Intel offers a reference real-time software stack that abstracts these hardware features to accelerate hardware configuration ("tuning") and application development. This solution stack consists of:

 

  • Real-time hardware processors optimized for real-time applications: 
  • Intel Atom® x6000E series processors 
  • 11th Generation Intel® Core™ processors 
  • System software stack:
  • The board support package (BSP) foundation is a Yocto Project* distribution of Linux with PREEMPT_RT patch and other real-time optimizations. 
  • UEFI reference BIOS with Intel® TCC Mode
  • Intel® TCC Tools 
Figure 1. Steps to optimize for real-time improvements.
Figure 2. Illustration of how the board support package, Intel TCC Mode in the BIOS, and Intel TCC Tools work together as a process.

Accelerate and Automate System Tuning

Data Streams Optimizer

  • Automates real-time platform configuration tuning using a command-line tool.
  • Addresses specific workload latency between the CPU, memory, and PCIe end points by optimizing power consumption and compute performance.
  • Focuses on tuning I/O and processor fabric to enhance the transfer of data between two processor subsystems. This tool identifies the various control points between the entities that could be tuned to meet the requirements, and instructs the BIOS to write specific values to registers for these control points. This enables real-time tuning without changing the application code.

To use the tool, you need to know how data flows through the compute module (that is, through which paths or streams), the size of the payload exchanged between end points, and the maximum tolerable latency for such data exchanges.

 

Learn More

Allocate Cache for Real-Time Applications

Cache Configurator

  • Uses a command-line tool to discover and manage cache memory resources to add, modify, or delete buffers at varying levels in the cache and memory hierarchy
  • Divides the remaining cache resources among various components (such as CPU, GPU, or I/O) without the need to learn the low-level details of the cache architecture

Learn More

Example Output

The following buffer will be created:




BUFFER 1

LATENCY(ns): 100

CACHE LEVEL: 2
  
  CPU  CORE: 3

BUFFER SIZE(bytes): 262144

Allocate Buffers Effectively Across Platforms

Cache Allocation Library

APIs contained in this library create buffers that meet specified latency requirements.

To use the library, you need to know latency and size of the dataset that your application processes, as well as the hot spots in your application’s code that are the most latency sensitive.

Benefits include:

  • Malloc replacement for reliable low latency
  • Target cache misses and other sources of memory access latency
  • Simple, familiar API signature
  • Abstracts the complexity of cache architecture
  • No code changes needed to achieve the same latency on supported Intel® processors

Learn More

Example Function

To create a buffer, specify its size and maximum tolerable latency for access:

/* The example parameters specify a 64-byte buffer and 20-nanosecond latency requirement. */

void *mem = tcc_buffer_malloc(64, 20);

Check System Readiness for Real-Time Workloads

Real-Time Readiness Checker

Use this diagnostic tool to check real-time BIOS and operating system configuration readiness.

  • Verifies whether the system has a supported processor, BIOS, and operating system
  • Checks for features that may affect real-time performance, such as Intel® Turbo Boost Technology, Enhanced Intel SpeedStep® Technology, and processor power-saving states
  • Reports CPU and GPU frequencies
  • Operates at the UEFI BIOS or operating system level

Learn More

Instrument Your Code to Analyze Performance

Measurement Library

Use this lightweight library for instrumenting user space applications to collect latency measurements.

  • Measures worst-case execution time (WCET) and other latency statistics in processor clock cycles and time units
  • Enables minimal runtime overhead and high measurement precision
    • Each measurement adds no more than 610 ns overhead
    • Accurately measures intervals starting from 60 ns*
  • Tracks deadline violations
  • Stores latency values in a shared memory ring buffer to be processed by an external application
  • Uses the Instrumentation and Tracing Technology API (ITT API) to support task visualization and system-wide analysis in tools such as Intel® VTune™ Profiler, which does low-level application performance analysis
  • Take advantage of samples to get started, which demonstrate methods for measurement data analysis including latency histograms and deadline monitoring

Based on specific configurations and workloads.

Learn More

Example Functions

Use this function at the beginning of the code block you want to analyze:

/* Get the start time of the measured code block from the processor time stamp counter (TSC). The name is a pointer to __itt_string_handle to identify the measurement. */

__itt_task_begin(domain, __itt_null, __itt_null, name);

Use this function at the end of the code block:

/* Get the end time from the TSC and calculate the difference between the start and end times to derive the latency of one iteration. */

__itt_task_end(domain)

Enable Time Synchronization in Network, I/O, and Compute

Time-Aware GPIO and Ethernet Timestamps Samples

The time-aware GPIO sample applications explain the basics of using hardware-assisted time synchronization on GPIO pins and its advantages over normal software-controlled GPIO. 

The Ethernet timestamps sample application shows the accuracy of hardware-assisted cross-timestamping between the system and network controller clocks, which allows the application to extend precise time synchronization to other devices on the network beyond the compute node.

Learn More

 

Example Output

Compare output period jitter of software-controlled GPIO versus time-aware GPIO. Software GPIO data is represented in blue. TGPIO data is represented in orange. Software GPIO causes higher jitter compared to TGPIO.