Developer Guide

  • 2021.2
  • 06/11/2021
  • Public
Contents

About Intel® Time Coordinated Computing Tools (Intel® TCC Tools)

Intel® processors are multi-purpose and can serve a wide range of use cases from data analysis in the cloud, to gaming PCs and traditional office laptops, to devices at the network edge. On select SKUs of Intel Atom® x6000E Series processors (code name: Elkhart Lake) and 11th Gen Intel® Core™ processors (code name: Tiger Lake UP3), Intel is offering a set of features called Intel® Time Coordinated Computing (Intel® TCC) to augment the compute performance of its processors with ability to address the stringent temporal requirements of real-time applications. The ongoing industry transformation drives the demand for converged solutions capable of satisfying real-time requirements while staying generally power efficient and leaving sufficient performance for other concurrent tasks. Intel® TCC delivers performance improvements for latency-sensitive applications when they are running alongside non-time-constrained applications on the same system.
While Intel® TCC features reside in the processor, their full potential is unlocked when the whole solution stack is optimized top-to-bottom. Intel offers a reference real-time software stack that abstracts these hardware features to accelerate hardware configuration and application development.
The stack consists of:
  • Real-time hardware: See the list of supported processors.
  • System software:
    • Firmware: Addresses processor latency (the duration of time between two events) via Intel® TCC Mode setting that disables power management and enables Intel® TCC features, SMI reductions, and other optimizations.
    • Board support package (BSP): Addresses OS latency via Yocto Project* distribution of Linux* OS with real-time kernel and optimized drivers.
  • Intel® TCC Tools: Addresses the need for further latency reduction or balancing real-time performance, power, and general compute, via C language application programming interfaces (APIs) and tools.
    In this release, Intel® TCC Tools does not support Slim Bootloader.
This stack is also referred to as the best known configuration (BKC).
Intel® TCC Tools offers limited support of Microsoft Windows* OS on the target system for data streams optimizer only. For details, see Examples of Scripts for Windows* OS.
Additionally, Intel® TCC Tools offer host-side tools for development. Requirements for host systems are specified in the Getting Started Guide.

Why Intel® TCC Tools?

System software optimizations can satisfy many real-time use cases
System software has a high impact on real-time performance. The BSP and UEFI reference BIOS contain optimizations such as Linux* kernel build configuration settings and boot parameters, power management settings, and Intel® TCC Mode, that help drastically decrease execution latencies and reduce jitter, at times up to several orders of magnitude. System software optimizations, when used with target processors, are sufficient to satisfy real-time use cases with a broad range of cycle times for the most common real-time applications.
Software optimizations in the BSP and Intel® TCC Mode in BIOS may be all you need to meet your real-time requirements. To learn more about system software tuning, the BSP, and Intel® TCC Mode in BIOS, see the Real-Time Tuning Guide for your target processor.
Intel® TCC Tools address use case-specific optimizations
For some use cases, meeting real-time requirements is not enough. Some must be mindful of system power consumption or residual compute capabilities for data processing or graphical UI. Optimizing only system software may not be sufficient in these cases. For example, enabling Intel® TCC Mode has a strong impact on system power consumption which might not be acceptable.
Optimizations to address the unique demands of some real-time applications or further decrease cycle times require more granular use of Intel® TCC features and special tuning techniques, such as advanced cache management and I/O fabric tuning.
Such granular tuning, if unassisted, requires special knowledge of processor architecture including the intricacies of a particular microarchitectural generation, which is unattainable and unnecessary information for the majority of developers. Intel® TCC Tools facilitate these techniques by offering C APIs and tools.
Furthermore, tuning is usually preceded by debugging and bottlenecks’ identification that involves measurement. Given the nature of real-time applications and their latency sensitivities, conventional measurement tools may not be precise enough and may induce overhead, impacting measurement results. Latency measurement offered by Intel® TCC Tools is free of these limitations and offers high precision while being minimally intrusive.

Fixing Common Sources of Latency and Jitter

The following table offers a quick reference to Intel® software that addresses common sources of latency and jitter for real-time applications:
Problem
Intel® software that addresses the problem
Interrupts
Yocto Project*-based BSP with real-time optimizations, along with the Real-Time Tuning Guide
Power management
Out-of-the-box tuning
: UEFI reference BIOS
Advanced tuning
: Intel® TCC Tools data streams optimizer tool, by improving data movement between processor subsystems, see Data Streams Optimizer
Cache misses
Intel® TCC Tools cache allocation library, by reserving processor cache for routines with data access bottlenecks, see Cache Allocation

Why Linux?

The foundation of the BSP is a Yocto Project*. Although Linux* OS did not originate as a real-time operating system (RTOS), ongoing optimizations have been improving its real-time characteristics. For example, the PREEMPT_RT patch, coupled with the availability of device support, network connectivity choice, filesystems, and user interface support, make Linux* OS a good choice for many modern embedded real-time solutions running latency-sensitive and non-time-constrained applications side-by-side.

Use Case Example

Now let’s look at how the best known configuration (BKC), and specifically Intel® TCC Tools, can help your application meet its temporal requirements.
A simplistic real-time use case involves the following components:
  1. Sensor collects information and communicates data to a compute module.
  2. Compute module receives the data from the sensor and performs needed computation to determine the required action.
  3. Compute module transmits the action to be taken, to an actuator.
This setup is shown in the following figure:
The compute module receives data from the sensor in certain “receive” data structures. It performs the needed calculations and then writes the output to “send” data structures. The sensor and actuator are connected to the compute module through various I/O interfaces, for example, PCIe* ports.
Real use cases may involve hundreds of sensors and actuators and multiple compute nodes combined into scalable automation and control systems. In such systems, individual nodes are interconnected via various network technologies further assisted by time-coordination protocols and techniques to keep the whole setup in a perfect synchronization.
While most real-time use cases follow this generic flow, their particular requirements, such as the size of “receive” and “send” data structures and the maximum allowable time for moving data between processor subsystems or accessing it from memory buffers, may vary significantly. These requirements are a foundational starting point for using Intel® TCC Tools.

Tuning Features

Intel® TCC Tools offers two primary capabilities that optimize the system for real-time applications:
  • Data streams optimizer
  • Cache allocation tools
Data Streams Optimizer
The
data streams optimizer
is a command-line tool that configures I/O and processor fabric settings to optimize the time required to transfer data between two processor subsystems acting as source and destination, based upon specified requirements.
Fabric
refers to the interconnect technology that carries on-chip communications between the different functional components of the processor.
The effect of the data streams optimizer is most visible on applications that are susceptible to latency that exceeds the allowed threshold for data movement. An example of data movement is when data packets need to be transferred between an Ethernet card connected to a PCIe* port and system memory.
In the tuning process, the tool iteratively adjusts the values of multiple registers as it searches for the configuration customized to your application. These register values have direct and indirect impact on many important system characteristics besides real-time, such as power, thermal, and others. The data streams optimizer not only optimizes data movement, but also changes system power consumption or computational resources available for other tasks.
While controlled by software, the tuning process does not alter any of the software ingredients of the solution, but focuses solely on real-time hardware optimizations. For that reason, the data streams optimizer can equally be used with legacy (existing) applications not requiring changes to their source code, and new ones.
Ideally, the developer should know how data flows through the system. If this is not fully known, the tool allows an iterative process to find the ideal tuning.
Cache Allocation
The
cache configurator
and
cache allocation library
help reduce “hotspots,” memory objects (such as arrays) in your real-time application that have a high number of cache misses.
The
cache configurator
is a command-line tool that reserves portions of cache for low-latency buffers. The reserved cache is called
software SRAM
. The cache configurator also partitions the cache to reduce cache eviction by various components, such as the GPU.
The
cache allocation library
contains a set of C APIs that help bound the time needed to access data from a memory buffer to a certain maximum tolerable latency. It does so by using the software SRAM buffers created by the cache configurator. The library is intended for an application that performs periodic computations on the same data set and is sensitive to access timings to this data set, which cannot be satisfied by standard memory allocation techniques like malloc.
To use this tool, you need to know the size of the data set that your application processes and the maximum acceptable latency to access that data set, as well as the hotspots in your application’s code that are most latency sensitive. If you have this information, you may use the cache allocation library directly. Otherwise, consider starting with the
measurement library
–- a support capability of Intel® TCC Tools.
The cache allocation library is critical for keeping cycle times within permissible limits and preventing deadline violations for demanding real-time applications.

Auxiliary Features

Intel® TCC Tools include auxiliary features to help you check the configuration of your real-time system, understand bottlenecks in your code, or learn about time synchronization techniques. These capabilities include:
  • Measurement library
  • Real-time readiness checker
  • Time synchronization sample applications:
    • Time-Aware GPIO sample applications
    • Ethernet timestamps sample application
    • Real-time communication demo
The
measurement library
contains a set of C APIs that help analyze different aspects of your application’s performance and identify bottlenecks, for example, those can then be alleviated by the cache allocation library.
The
real-time readiness checker
is a command-line tool that checks the many attributes that may be affecting real-time performance, such as processor model, BIOS version, BIOS settings, and other dependencies. The tool can be used at the beginning of development to verify initial setup, and later in the product lifecycle as needed for quick checks and debugging.
The
time-aware GPIO sample applications
explain the basics of using hardware-assisted time synchronization on GPIO pins and its advantages over normal software-controlled GPIO. Time-aware GPIO is for use cases requiring precise time synchronization, such as clock alignment between two or more devices or with an external clock, generating control signals at certain times with high precision, or generating time-accurate pulse trains.
The
Ethernet timestamps sample application
shows the accuracy of hardware-assisted cross-timestamping between the system and network controller clocks, which allows the application to extend precise time synchronization to other devices on the network beyond the compute node.
The
real-time communication demo
is a set of example programs and scripts that demonstrate the benefit of combining the Cache Allocation Library with Time-Sensitive Networking (TSN) provided by Intel.

When Should I Use a Feature?

Use of these features will vary by use case. Here is one example of a workflow involving multiple tools:
  • Step 1: Set up your target system with the board support package (BSP), which provides a real-time kernel and optimized drivers. Run your real-time application along with other applications, per your expected use case, under worst-case conditions. Check whether deadlines and system requirements are met. If not, proceed to the next step.
  • Step 2: Enable Intel® TCC Mode in the firmware. Diagnose whether you have additional real-time needs and where your performance bottlenecks are. Then proceed with Intel® TCC Tools, which provides advanced-level tuning by using features in the processor and BIOS.
  • Step 3: Install Intel® TCC Tools. Use the real-time readiness checker to verify the configuration.
  • Step 4: Run your real-time application along with other applications again to recheck deadlines.
  • Step 5: Instrument your code with measurement library APIs. Use VTune™ Profiler or other profiling tools to find hotspots and bottlenecks.
  • Step 6a: If you find that data access latency exceeds requirements, use the cache configurator to create software SRAM buffers. Add cache allocation library APIs in your real-time application to use the software SRAM buffers to improve data access timings. Run your real-time application along with other applications again to recheck deadlines.
  • Step 6b: If you find that data transfer latency exceeds requirements, use the data streams optimizer. The data streams optimizer can also balance real-time performance with system power consumption or computational resources available for other tasks. Run your real-time application along with other applications again to recheck deadlines.

Code Changes Required?

See which features require changes to your real-time application:
Component
Code changes required to use the component?
Real-time readiness checker
No
Data streams optimizer
No
Cache allocation library
Yes, to call API functions that put latency-critical data into pre-allocated software SRAM
Measurement library
Yes, to instrument the application with measurement APIs

Feature Summary

The following tables show the full range of features.
Real-Time Configuration and Optimization
Feature
Description
Intel Atom® x6000E Series Processors
11th Gen Intel® Core™ Processors
Command-line tool that configures I/O and processor fabric to optimize the time required to transfer data between two processor subsystems. System power consumption and general compute characteristics are also changed by this tool.
Sample input files for the data streams optimizer, including a synthetic real-time application and validation script. The application measures Memory-Mapped I/O (MMIO) read latency performance. Use the files to learn how the tool works and see improvements in latency reduction. Copy and modify for your real-time application.
Tools that help bound the time needed to access data from a memory buffer based on your specified latency requirements.
  • Cache allocation library: C APIs that allocate low-latency buffers.
  • Cache configurator: Command-line tool that displays a visual representation of the cache on your system and allocates cache for use by the cache allocation library and other caching agents.
C application that demonstrates the basic flow of the cache allocation library. Offers command-line options and latency measurements to show the before-and-after benefit of the library. Copy and modify source code to add the APIs to your real-time application.
Measurement and Analysis
Feature
Description
Intel Atom® x6000E Series Processors
11th Gen Intel® Core™ Processors
Command-line tools for Linux* OS and UEFI BIOS that check system readiness for real-time applications by detecting and verifying the many attributes that may be affecting real-time performance such as processor model, BIOS version, BIOS settings, and other dependencies.
C APIs that help analyze different aspects of your application’s performance and identify bottlenecks which can then be alleviated (for instance, by the cache allocation library).
C application that demonstrates how to use the measurement library to instrument one part of an application, such as the entire real-time cycle. Run the sample to see library capabilities, such as collecting execution time (min, max, average, and jitter) and deadline monitoring. Copy and modify source code to add the APIs to your real-time application.
C application that demonstrates how to use the measurement library to instrument multiple parts of an application, such as the entire cycle and various parts of it to pinpoint latency sources more precisely. Demonstrates a lighter-weight application that does not print or analyze its own data. Intended to be run with the measurement analysis sample for data collection and visualization. Copy and modify source code to add the APIs to your real-time application.
Universal tool, written in Python* programming language, that collects and analyzes data from any application instrumented with the appropriate measurement library APIs. Performs two types of analysis: “post-process analysis” (after the application closes), such as creation of histograms, and “stream monitoring” (while the application is running). Source code is provided for possible adaptation.
C application that monitors measurements generated by the single measurement sample, which serves as a proxy real-time application. Demonstrates “stream monitoring” – the monitoring sample runs at the same time as the single measurement sample and prints latency measurements and deadline violations as they occur. Intended for those who want to integrate such monitoring with other parts of their environment, such as validation applications or scripts, where the C programming language is preferable as a common denominator that can be added everywhere. Copy and modify source code to tailor the monitoring sample to your real-time application.
Time Synchronization and Communication
Feature
Description
Intel Atom® x6000E Series Processors
11th Gen Intel® Core™ Processors
C applications that demonstrate how to start working with TGPIO. Run the info sample to get a report of the TGPIO capabilities of the target system. Run the input and output samples to compare the enhanced precision of hardware-assisted time synchronization vs. normal software-controlled GPIO. Copy and modify source code for your real-time application.
C application that demonstrates more advanced TGPIO input and output scenarios with a logic analyzer, such as input jitter, period jitter, and two-signal synchronization. Run the samples to compare the enhanced precision of hardware-assisted time synchronization vs. normal software-controlled GPIO. Copy and modify source code to add hardware-assisted time synchronization to your real-time application.
C application that demonstrates how to use TGPIO to achieve a desired frequency. Run the sample with a logic analyzer to see how phase error is kept close to zero. Copy and modify source code for your real-time application.
C application that demonstrates the accuracy of hardware-assisted cross-timestamping between the system clock (CLOCK_REALTIME) and a Precision Time Protocol Hardware Clock (PHC). Copy and modify source code for your real-time application.
Set of example programs and scripts that demonstrate the benefit of combining the Cache Allocation Library with Time-Sensitive Networking (TSN) provided by Intel. Copy and modify source code for your real-time application.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.