User Guide

Contents

Anomaly Detection Analysis (preview)

Use Anomaly Detection to identify performance anomalies in frequently recurring intervals of code like loop iterations. Perform fine-grained analysis at the microsecond and nanosecond level.
Application performance can occasionally be hampered by the presence of performance anomalies. A performance anomaly is any short-lived, sporadic issue that causes unrecoverable consequences. These issues may not be statistically discernible but they create a poor user experience and can be very expensive to fix. When the performance of your application requires varying amounts of work for instances of the same task or when it displays variations in a single/few iterations of a loop, these are symptoms of anomalous behavior in your application.
Use Anomaly Detection analysis to identify performance anomalies in your application that are otherwise difficult to isolate. This analysis type uses Intel® Processor Trace (Intel® PT) technology to perform trace data collection and fine-grained time and event measurement. Intel® PT is an extension of Intel® Architecture that captures information about software execution using dedicated hardware. The hardware causes only minimal performance perturbation to the software being traced.
This is a
PREVIEW FEATURE
. A preview feature may or may not appear in a future production release. It is available for your use in the hopes that you will provide feedback on its usefulness and help determine its future. Data collected with a preview feature is not guaranteed to be backward compatible with future releases. Please send your feedback to parallel.studio.support@intel.com or to intelsystemstudio@intel.com.
The control flow trace feature in Intel® PT generates a variety of packets that, when combined with the binaries of a program by a post-processing tool, can be used to produce an exact execution trace. The packets record flow information such as instruction pointers (IP), indirect branch targets, and directions of conditional branches within contiguous code regions (basic blocks). For descriptions of key concepts in Intel® PT, see Chapter 35 of the Intel Software Developer's Manual (Volume 3C):System Programming Guide.
To detect software performance anomalies using
VTune
Profiler
, you use the Instrumentation and Tracing Technology (ITT) API to designate specific code regions of interest and then run Anomaly Detection analysis.

Common Performance Anomalies

These are typical examples of performance anomalies in a software application.
  • Financial transactions that take an unusually long time to process.
  • Glitches in the UI of a video game like slow or skipped video frames.
  • Packet losses in large applications that have SPDK/DPDK loops.
  • High frequency applications where processing speed is critical and some iterations run slower than others.
Run Anomaly Detection in one of these situations where observed application behavior deviates from expected behavior in some iterations.

Causes for Performance Anomaly

  • Change in control flow:
    Different instances of the same task require different amounts of work.
  • Uncommon observations:
    Expensive handling of errors or memory/storage reallocation.
  • Context switches:
    Synchronization or preemption.
  • Unexpected kernel activity:
    Interrupts or page faults.
  • Micro-architectural issues:
    Cache misses or incorrect branch predictions.
  • Frequency drops:
    Low CPU utilization, cooling issues, or the inclusion of Intel® Advanced Vector Extensions (Intel® AVX) instructions in the code.

The Anomaly Detection Analysis Workflow

When you observe anomalies in your application performance, use Anomaly Detection for a detailed investigation.
  1. Prepare your application for analysis.
  2. Define parameters that break your code into smaller regions of interest. Decide how long you want to simulate each region.
  3. Run Anomaly Detection.
  4. Review anomalies in detail:
    1. Load trace data for the processor for each anomaly in the Bottom-up view to examine code regions of interest.
    2. Open trace data to see frequency information for a specific region.
    3. Examine source and assembly views to see the number of loop iterations.

Configure and Run Analysis

Prepare your Application
Large applications can generate huge volumes of data through a profiling run. This in turn can cause significant delay in processing results. You may only want to focus on anomalies in a particular operation in your code. Mark this section by defining it as a Code Region of Interest. Use the ITT API for this purpose.
  1. Register the name of the code region you plan to profile:
    __itt_pt_region region = __itt_pt_region_create(“region”);
  2. Mark the target loop in your application with this name:
    for(…;…;…) { __itt_mark_pt_region_begin(region); <code processing your task> __itt_mark_pt_region_end(region); }
Run the Analysis
  1. On the Welcome screen, click
    Configure Analysis
    .
  2. In the Analysis Tree, select the
    Anomaly Detection
    analysis type in the
    Parallelism
    group.
  3. In the
    WHAT
    pane, specify your application and any relevant application parameters.
  4. In the
    HOW
    pane, specify parameters for the analysis.
    Parameter
    Description
    Range
    Recommended Value
    Maximum number of code regions for detailed analysis
    Specify number of code regions for your application.
    10-5000
    For faster loading of details, pick a value not more than1000.
    Maximum duration of analysis per code region
    Specify the duration of analysis time (ms) to be spent on each code region.
    0.001-1000
    Any value under 1000 ms.
    Configuration options for Anomaly Detection
  5. Click the Start button to run the analysis.
To run Anomaly Detection from the command line, use the Command Line button at the bottom.

View Data

Once the analysis is complete,
VTune
Profiler
displays results in the
Summary
window.
  • Elapsed Time
    indicates the total time spent on all code regions of interest.
  • Code Region of Interest Duration Histogram
    plots the number of instances of performance-critical tasks against specified duration (or latency). See specific code regions in the Fast and Slow regions to understand why the duration changed.
  • Collection and Platform Info
    displays relevant details about the system, data about the collection platform, and the resulting set size.
View Data on a Different System
The above procedure is useful when you process analysis results on the same system where you collected data. If you want to transfer the collected data onto a different system before you view it, run the
archive
command after data collection to copy essential binaries to the results folder. You must complete this step before transferring results to the new system to load collection details without problems.
To run the
archive
command:
  1. Collect results as described above.
  2. At the command line, type:
vtune.exe --archive -r r001ad
where
r001ad
is an example of an analysis result.
To view collected data on a different system, you must copy all binaries including system and compiler runtime binaries that are linked to your main binary and were accessed during the collection. The
archive
command is useful for this purpose since it is not easy to copy these binaries manually.

Next Steps

See the Anomaly Detection view for information on interpreting collected data in these ways:
  • Load trace details for each analysis in the
    Bottom-up
    window.
  • Look for unexpected kernel activity. See if applications entered certain kernels that should not have been activated during the analysis.
  • Use the source and assembly views to compare code regions of interest in fast and slow regions of the histogram.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804