Getting Started with Intel® VTune™ Amplifier 2018

Learn about how to start analyzing system performance and the key features of Intel VTune Amplifier.

Intel® VTune™ Amplifier can be installed on Windows*, macOS*, and Linux* platforms and used for analysis of local and remote target systems. Use this tool to analyze the algorithm choices, find serial and parallel code bottlenecks, understand where and how your application can benefit from available hardware resources, and speed up the execution. The Find Your Analysis guide, available from the VTune Amplifier Welcome page, is a great place to discover the best analysis type to run for your use case.

VTune Amplifier is available as a standalone product as well as part of the following suites:

Visit the VTune Amplifier training page for videos, webinars, and more to help you get started.

Note

Starting with Intel VTune Amplifier 2018 version, product help, tutorials, and Release Notes are available online only from the Intel Software Documentation Library in the Intel Developer Zone (IDZ). You can also download an offline version of the product help either from IDZ or from the Intel® Software Development Products Registration Center.

Select Your Host System to Get Started

Click the button for your host system to learn more about system-specific features for Windows*, Linux*, or macOS*.

Click for Getting Started steps on Windows* hostClick for Getting Started steps on Linux* hostClick for Getting Started steps on macOS* host

Key Features

ALGORITHM ANALYSIS

MICROARCHITECTURE ANALYSIS

  • Run General Exploration analysis to triage hardware issues in your application. This type collects a complete list of events for analyzing a typical client application.

    See tutorials for Linux host - C++ sample code | Windows host - C++ sample code.

  • Use Memory Access analysis to identify memory-related issues, like NUMA problems and bandwidth-limited accesses, and attribute performance events to memory objects (data structures), which is provided due to instrumentation of memory allocations/de-allocations and getting static/global variables from symbol information.

    See the tutorial for Linux Host - C sample code.

  • For systems with Intel® Software Guard Extensions (Intel SGX) feature enabled, run SGX Hotspots analysis to identify performance-critical program units inside security enclaves. This analysis type uses the INST_RETIRED.PREC_DIST hardware event that emulates precise clockticks which is mandatory for the analysis on the systems with the Intel SGX enabled.

  • For the Intel processors supporting Intel® Transactional Synchronization Extensions (Intel TSX), run the TSX Exploration and TSX Hotspots analysis types to measure transactional success and analyze causes of transactional aborts.

PLATFORM ANALYSIS

GPU Analysis

  • Run System Overview analysis to review general behavior of a target Linux* or Android* system and correlate power and performance metrics with the interrupt request (IRQ).

  • Run CPU/GPU Concurrency analysis to identify code regions where your application is CPU or GPU bound.

  • Use GPU Hotspots analysis to identify GPU tasks with high GPU utilization and estimate the effectiveness of this utilization.

  • For GPU-bound applications running on Intel HD Graphics, collect GPU hardware events to estimate how effectively the Processor Graphics are used.

  • Collect data on Ftrace* events on Android and Linux targets and Atrace* events on Android targets.

  • Analyze hot Intel® Media SDK programs and OpenCL™ kernels running on a GPU. For OpenCL application analysis, use the Architecture Diagram to explore GPU hardware metrics per GPU architecture blocks.

  • Run Disk Input and Output analysis to monitor utilization of the disk subsystem, CPU and processor buses. This analysis type provides a consistent view of the storage sub-system combined with hardware events and an easy-to-use method to match user-level source code with I/O packets executed by the hardware.

    See the tutorial for Linux Host - C++ sample code.

COMPUTE-INTENSIVE APPLICATIONS ANALYSIS

  • Run HPC Performance Characterization analysis to identify how effectively your high-performance computing application uses CPU, memory, and floating-point operation hardware resources. This analysis type provides additional scalability metrics for applications that use OpenMP or Intel MPI runtime libraries.

  • Run an Algorithm analysis type with the Analyze OpenMP regions option enabled to collect OpenMP or MPI data for applications using OpenMP or MPI runtime libraries. Note that HPC Performance Characterization analysis has the option enabled by default.

  • For OpenMP applications, analyze the collected performance data to identify inefficiencies in parallelization. Review the Potential Gain metric values per OpenMP region to understand the maximum time that could be saved if the OpenMP region is optimized to have no load imbalance assuming no runtime overhead.

  • For hybrid OpenMP and MPI applications, explore OpenMP efficiency metrics by MPI processes laying on the critical path.

    See the tutorial for Linux Host - OpenMP and MPI hybrid sample code.

SOURCE ANALYSIS

  • Double click a hotspot function to drill down to the source code and analyze performance per source line or assembler instruction. By default, the hottest line is highlighted.

  • For help on an assembly instruction, right-click the instruction in the Assembly pane and select Instruction Reference from the context menu.

MANAGED CODE ANALYSIS

Configure target options for managed code analysis in the native, managed, or mixed mode:

  • Windows host only: Event-based sampling (EBS) analysis for Windows Store C/C++, C# and JavaScript applications running in the Attach or System-wide mode;

  • EBS or user-mode sampling and tracing analysis for Java* applications running in the Launch Application or Attach mode;

  • Basic Hotspots and Locks and Waits analysis for Python* applications running in the Launch Application and Attach to Process modes.

CUSTOM ANALYSIS

  • Create a copy of a current analysis type and modify the collection options to create your own analysis configurations.

  • Run your own custom collector from the VTune Amplifier to get the aggregated performance data, from your custom collection and VTune Amplifier analysis, in the same result.

  • Import performance data collected by your own or third-party collector into the VTune Amplifier result collected in parallel with your external collection. Use the Import from CSV button to integrate the external data to the result.

  • Collect data from a remote virtual machine by configuring KVM guest OS profiling, which makes use of the Linux Perf KVM feature. Select Analyze KVM guest OS from the Advanced options on your Linux host system.

For the detailed list of product features, see Intel VTune Amplifier Help.

Remote Collection Modes

You can collect data on your Linux, Windows, or Android system using any of the following modes:

  • (Linux and Android targets) Remote analysis via SSH/ADB communication with VTune Amplifier graphical and command line interface (amplxe-cl) installed on the host and VTune Amplifier target package installed on the remote target system. Recommended for resource-constrained embedded platforms (with insufficient disk space, memory, or CPU power).

    See the tutorial for Linux host - Android target | Windows host - Android target | Linux host - embedded Linux target system

  • (Android targets) Disconnected analysis via SSH/ADB communication with VTune Amplifier installed on the host and the VTune Amplifier target package installed on the remote Android system. The analysis is initiated from the host system, but data collection does not begin until the device is unplugged from the host system. The results are finalized after the device is reconnected to the host system.

  • (Linux and Windows targets) Native performance analysis with the VTune Amplifier graphical or command line interface installed on the target system. Analysis is started directly on the target system.

  • (Linux and Windows targets) Native hardware event-based sampling analysis with the VTune Amplifier's Sampling Enabling Product (SEP) installed on the target embedded system.

Legal Information

Intel, the Intel logo, and VTune are trademarks of Intel Corporation in the U.S. and/or other countries.

* Other names and brands may be claimed as the property of others.

Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission from Khronos.

© Intel Corporation

For more complete information about compiler optimizations, see our Optimization Notice.