Developer Guide

  • 2021.2
  • 06/11/2021
  • Public
Contents

Measurement Library

The measurement library is a set of C APIs that enable you to analyze the behavior of real-time applications. You can use the library to instrument your code and gather various latency statistics in CPU clock cycles, nanoseconds, or microseconds. In addition, you can react to deadline violations and store latency values in a shared memory ring buffer to be processed by an external application.
The library is intended specifically for analysis of
isochronous cyclic workloads
. An
isochronous cyclic workload
is a code sequence in your real-time application that runs repeatedly (“cyclic workload”) and has a deadline that every cycle must meet (“isochronous”). Also, you can analyze parts of the workload that have their own deadlines, such as data input, processing, and output. For one-time measurements, the library is not recommended.
The following sections provide a brief overview of key concepts. More information is provided later in this guide.

Instrumenting Code

The library works in conjunction with the Instrumentation and Tracing Technology API (ITT API). The ITT APIs generate and control the collection of trace data during application run. You will use the ITT APIs to instrument your code.
First, you will identify the tasks that you want to measure in your application. In the example below,
cycle()
represents one such task.
/* Initialize the ITT domain */ domain = __itt_domain_create("TCC"); /* Initialize the ITT handlers to collect performance data */ cycle_handler = __itt_string_handle_create(cycle_name); for (int i = 0; i < iterations; ++i) { /* Start cycle measurement */ __itt_task_begin(domain, __itt_null, __itt_null, cycle_handler); /* Run cycle */ cycle(); /* End cycle measurement */ __itt_task_end(domain); }
Then, you will add
__itt_task_begin
to mark the beginning of your task and
__itt_task_end
to mark the end of the task. The begin function uses the CPU’s timestamp counter (TSC) to collect the start time of the task. The end function uses the TSC to collect the end time of the task, and calculates the difference between the start and end timestamps to get the latency of the task.
The begin and end functions together add runtime overhead in the hundreds of nanoseconds range (see Overhead and Precision for details). To reduce the relative cost of the functions, you can run the measured sequence multiple times. This can be useful, for example, when your data processing code contains an inner loop with multiple iterations of the measured sequence. In this case, latency is the sum of multiple iterations of the measured sequence.
Each instrumented block of code is a
measurement instance
. For each measurement instance, a unique
__itt_string_handle
should be created and used. Your application can have multiple measurement instances, including nested instances. For example, if your application contains multiple consecutive stages, you can create a measurement instance for each stage and an additional measurement instance for the entire workload. See Instrument the Code for example code. This can help to isolate the biggest source of latency among the stages.

Collecting Latency Data

The ITT APIs are implemented in a static library
libittnotify.a
, which forwards the calls to a shared library called a
collector
. A collector performs data collection and processing.
Your application can access the following collectors:
  • Measurement library collector: You can use this collector to access the measurement results from your instrumented application and store results in a shared memory buffer or in a file with simple format. The collector implements certain ITT API and measurement library API calls to access collected data from the instrumented application.
  • VTune™ collector: VTune™ Profiler provides the VTune™ collector, which collects data for visualization in that tool. For example, you can analyze the relationship between tasks in your code relative to other CPU and GPU tasks.
You can use only one collector at a time.
The collector is selected and loaded at runtime based on the environment variable
INTEL_LIBITTNOTIFY64
.
The measurement library uses a handle to a structure called
tcc_measurement
to store raw and processed latency data and the measurement state. The structure is created implicitly when the ITT APIs call measurement collector functions.
The measurement library uses environment variables for data collection control. Use of environment variables allows flexible configuration of each application separately, without requiring changes in application code. For example:
  • Each measurement structure contains a reference to a corresponding measurement buffer, which has zero size by default. When the buffer size is zero, only aggregated statistics, without per-iteration data, will be collected. When the buffer size is higher than zero, the collector library will put collected per-iteration latency measurements into the buffer. Buffer size and other attributes can be configured through the use of the environment variable
    TCC_MEASUREMENTS_BUFFERS
    .
  • You can also specify whether to use a shared memory ring buffer or a local buffer through the
    TCC_USE_SHARED_MEMORY
    environment variable. Shared memory allows other applications to access the measurements, which can be used for data monitoring, storage, and analysis.
For details, see Control Data Collection.

Analyzing Latency Data from the Measurement Library Collector

You can access the measurement structure in your application by calling
tcc_measurement_get()
when the measurement library collector is currently loaded.
You can do the following analysis of latency data:
  • Analyze measurements in your application:
    • Measure and print the minimum, maximum, and average latencies of a workload
    • Set a deadline and run a custom callback function every time an iteration exceeds the deadline
    • Convert measurement results to CPU clock cycles, microseconds, or nanoseconds
  • Post-process measurement results:
    • Store the raw measurement results in a dump file for post-process analysis by a separate application
    • Print measurements to the console or in JSON format
    • Visualize the data, for example, build histograms
  • Stream measurements to a monitoring application:
    • Create a separate application to track measurements generated from the real-time application and perform actions on those measurements, such as print various statistics and react to deadline violations

Analyzing Latency Data from the VTune™ Collector

When the VTune™ collector is enabled, you can visualize the latency data in VTune™ Profiler. Use of VTune™ Profiler is not required, but can offer rich data about your application. For example, you can see the sequence and duration of tasks in your application, along with CPU and GPU tasks, on a consolidated timeline.

Libraries

The following table shows which header files to use to access corresponding APIs.
Library Name
Description
Header File
Instrumentation and Tracing Technology API (ITT API)
A static library for instrumentation of code. ITT is supported by various software toolkits and VTune™ Profiler.
ittnotify.h
Measurement library collector
A dynamic library for runtime data collection.
Measurement library
Shared and static libraries for accessing and analyzing the results.
tcc/measurement.h; tcc/measurement_helpers.h

Example of Using Measurement Library

The following diagram demonstrates the flow for an example scenario which uses ITT APIs, measurement library collector, and measurement library static library. The same workflow is described in more detail in Analyze Measurements in Your Workload.
Starting on the left side, the diagram shows that the real-time application is instrumented with ITT APIs and it is linked against the ITT Notify static library (
libittnotify.a
). At runtime, the static library reads the environment variable
INTEL_LIBITTNOTIFY64
and loads the measurement library collector (
libtcc_collector.so
), a dynamic library. The measurement library collector initializes the structures for data collection and stores the latency measurements there.
In addition, from the right side of the diagram, the real-time application uses measurement library functions to access the data structures. In this case, the application is linked against the measurement library (
libtcc_static.a
), a static library. The measurement library reads the environment and loads the measurement library collector (
libtcc_collector.so
). As a result, the application can access the data structures created in the measurement library collector. The
libtcc.so
shared library is linked by the measurement library collector and real-time application (through
libtcc_static.a
) to handle internal function calls.
The measurement library has different options for data collection and analysis. The diagram above demonstrates use of the
TCC_MEASUREMENTS_DUMP_FILE
environment variable which controls the printing of data to a dump file when the application ends.
The
__itt_task_begin()
and
__itt_task_end()
implementation in the measurement library collector is not thread-safe. You should not create and use measurement instances from multiple threads simultaneously when using the measurement library collector.
You can use the measurement library during development of your application and disable it for production deployment. Use the
-DNO_TCC_MEASUREMENT -DINTEL_NO_ITTNOTIFY_API
compilation option to compile your application without measurement library calls. Reasons for disabling the library for production deployment include eliminating measurement overhead and security risks of using environment variables.

Overhead and Precision

The APIs have minimal runtime overhead and high measurement precision.
11th Gen Intel® Core™ processors:
  • Each measurement adds no more than 102 ns overhead (52 ns average)
  • Accurately measures intervals starting from 15 ns
Intel Atom® x6000E Series processors:
  • Each measurement adds no more than 608 ns overhead (220 ns average)
  • Accurately measures intervals starting from 60 ns
Results may vary. Testing conducted November 12, 2020. Configuration:
  • 11th Gen Intel® Core™ processor:
    • Hardware: QVD5 (B2)
    • BSP: TGL_external_ER57 + Intel® TCC dependencies layer
    • BIOS: TGLIFUI1.R00.3455.A02.2011240812
    • Intel® TCC Mode enabled.
    • No software SRAM buffers.
  • Intel Atom® x6000E Series processor:
    • Hardware: QV3J (B0 fuse rev.11) + 44698-201 customer reference board
    • BSP: EHL_external_Beta3 + Intel® TCC dependencies layer
    • BIOS: EHLSFWI1.R00.2463.A12.2012141439
    • Intel® TCC Mode enabled.
    • No software SRAM buffers.
Methodology:
  1. Set real-time settings:
    • Scheduler: 99 FIFO
    • CPU 3
    • Interrupts disabled
  2. Determining the minimum interval: Run start and stop measurements without anything else. Calculate minimum, average, and maximum.
  3. Determining the overhead: Measure start and stop. Calculate minimum, average and maximum.

Usage Model

To summarize, follow these steps to analyze your workload:
  1. Instrument your code using ITT APIs: See Instrument the Code. For examples of instrumenting the code, see Single Measurement Sample and Multiple Measurements Sample.
  2. Set up the collector using environment settings and run your instrumented application: See Control Data Collection.
  3. Analyze the results:

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.