User Guide

  • 2020 Update 2
  • 07/28/2020
  • Public Content

Introducing Application Performance Snapshot

Intel® VTune™
Application Performance Snapshot for a quick view into different aspects of compute intensive applications' performance, such as MPI and OpenMP* usage, CPU utilization, memory access efficiency, vectorization, I/O, and memory footprint. Application Performance Snapshot displays key optimization areas and suggests specialized tools for tuning particular performance aspects, such as
Intel VTune
and Intel® Advisor. The tool is designed to be used on large MPI workloads and can help analyze different scalability issues.
You can download Application Performance Snapshot for free from the Intel® Developer Zone at The tool is also available pre-installed as part of these products:
  • Intel® VTune™ Profiler
    (formerly, Intel® VTune™ Amplifier)
  • Intel® Parallel Studio XE
  • Intel® oneAPI Base Toolkit
  • Intel® System Bring-up Toolkit
Starting with the 2018 Beta release, the updated Application Performance Snapshot for Linux* OS includes most of the functionality previously available in the MPI Performance Snapshot. MPI Performance Snapshot is no longer available as a separate tool.

What's New

This User's Guide documents Application Performance Snapshot for Linux* OS.
This is a change log for the current and previous product releases:

Application Performance Snapshot 2020 Update 1

  • Added metrics to explore GPU compute efficiency for Intel Graphics. The metric set includes GPU Time, GPU IPC, GPU Utilization and OpenMP* offload efficiency metrics like offload region overhead and data transfer cost. The application has to be compiled with Intel® C/C++ Compiler (Beta) 2021.1 - Beta 05 available in several Intel® oneAPI Toolkits (Beta), such as the Intel® oneAPI HPC Toolkit (Beta).

Application Performance Snapshot 2020

  • Max and Bandwidth metrics to better estimate the efficiency of DRAM, MCDRAM, Intel® persistent memory and Intel® Omni-Path Architecture usage.
  • Easier diagnostics of MPI communication patterns with the rank-to-rank communication diagram of Application Performance Snapshot shown by message volume or communication time.
  • Full-featured OpenMPI application support.
  • Streamlined vectorization metrics.

Application Performance Snapshot 2019 Update 5

  • DRAM Bandwidth information in Memory Stalls metric now includes Peak and Bound metrics. These metrics inform about memory bandwidth use, particularly in applications which execute in phases that have varying memory requirements.

Application Performance Snapshot 2019 Update 4

  • Ability to collect internal IDs of communicators provided by Intel MPI. This feature is supported for versions of Application Performance Snapshot as well as Intel MPI that are 2019 Update 4 or newer.

Application Performance Snapshot 2019 Update 3

  • Ability to generate HTML-based rank-to-rank communication diagram by message volume to better visualize MPI application communication patterns.

Application Performance Snapshot 2019 Update 2

  • Full-featured OpenMPI* support
  • Improved vectorization efficiency metrics
  • MPI Imbalance time is no longer calculated on the default stat level 1 to minimize collection overhead on that level
  • aps-report: added option to display statistics only for the selected set of MPI functions
  • MPI collector general optimizations

Application Performance Snapshot 2019 Update 1

  • MPI Imbalance collection extended with a mode that enables measuring pure application imbalance. This mode is applicable to MPI implementations binary compatible with the MPICH. If required, you can switch off the imbalance collection to minimize collection overhead.
  • MPI tracing overhead improvements with a noticeable impact on cases with a large number of ranks.

Application Performance Snapshot 2019

  • Intel® Omni-Path Architecture Interconnect Bandwidth and Packet rate metrics added to explore MPI communication bottlenecks.
  • Added an HTML-based rank-to-rank communication diagram to better visualize MPI application communication patterns.

Application Performance Snapshot 2018 Update 3 and 2019 Beta Update

  • The
    utility added the
    option, which allows the report to be generated in either text (*.txt) or comma-separated (*.csv) format. The CSV format can be useful for report processing automation or export to spreadsheet programs such as Microsoft Excel*.
  • The Rank-to-Rank data transfers report was enriched with an aggregated communication time column.
  • MPI trace file size was compacted with compression and minimal statistic level set by default. Some reports generated by the
    utility will be inapplicable with minimal statistic level. See Controlling Amount of Collected Data for more information.
  • Report generation time with the
    utility was significantly improved.

Application Performance Snapshot 2018 Update 2

Application Performance Snapshot 2018 Update 1

  • Removed restrictions for
    region numbers.

Application Performance Snapshot 2018

  • The tool is now invoked as
    rather than
  • Result directory change from

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804