User Guide


What's New in
Intel® VTune™

Intel® VTune™

This version of
Intel® VTune™
contains improvements and additions in these areas:
  • Algorithm:
    • Anomaly Detection Analysis Type for Performance Anomalies
      This release introduces the Anomaly Detection analysis type in the Algorithm group. Use this analysis to detect performance anomalies in frequently recurring code intervals including loop iterations. Anomaly Detection uses Intel® Processor Trace (Intel® PT) technology to perform detailed analysis at the microsecond level.
  • Parallelism:
    • Support for OpenMP Offload in HPC Analysis
      The HPC Performance Characterization analysis type now supports the offload of OpenMP regions. The support includes these additions to the analysis type:
      • The summary pane now includes a breakdown of OpenMP offload time by
        Data Transfer
        , and
        . You can also see a table with the top five OpenMP target regions sorted by offload time.
      • The bottom-up pane now allows grouping by
        OpenMP Offload Region
        . With this grouping active, the grid displays several new columns. The timeline shows scale markers that indicate the span of OpenMP offload regions and OpenMP operations internal to those regions.
  • GPU Accelerators:
    • Windows Support for oneAPI Level Zero Specification for DPC++ Applications
      This release extends support for the oneAPI Level Zero API specification to Windows systems when you run GPU analyses (GPU Offload analysis and GPU Compute/Media Hotspots) on DPC++ applications on these systems. Previously, support for this specification existed on Linux systems only. For these DPC++ applications,
      Intel® VTune™
      supports version 0.91.10 of the oneAPI Level Zero API.
    • Issue Markers in Memory Hierarchy Diagram
      The Memory Hierarchy Diagram of the GPU Compute/Media Hotspots analysis now displays the same markers to highlight metrics as the ones used to indicate performance or data issues in the Summary and Grid displays. This provides a consistent look and feel to the diagram and helps you correlate metrics between both displays.
  • Connection Types:
    • Remote Linux (SSH) Connection Type
      The Remote Linux (SSH) connection type has been improved to make automated target package deployment more transparent. Now
      checks for the presence of the target package on the remote system and offers to deploy the package automatically with a single click of a button if the package is not found.
  • Support for DPC++ Applications
    • Demangling of Lambda Functions
      This release implements the demangling of DPC++ lambda function names, which are used as DPC++ kernel names.
  • Analysis Configuration:
    • Wrapper Script Option for Quick Profiling Environment Setup
      The Wrapper script is a new feature that enables you to automatically run a custom set of commands to prepare the profiling environment before you start analysis in the environment. For example, you can create a script with a custom set of commands that sets environment variables. Include the custom set in the
      pane when you configure the analysis. The commands get executed on the target system before the analysis begins. You can also provide the wrapper script through the command-line interface by using the

Intel® VTune™
2020 Update 2

This version of
Intel® VTune™
contains improvements and additions in these areas:
  • Performance:
    • Performance Snapshot Analysis Type for Quick Summary
      This release introduces the Performance Snapshot analysis type. Start with this analysis and get a quick overview of issues that affect your application performance. Performance Snapshot provides recommendations for next steps to help you select other analyses for deeper profiling. It also characterizes the workload on the system.
  • Platform Analysis:
    • Input and Output Metrics for Individual Devices
      Platform I/O metrics can now be attributed to individual devices managed by Intel® VMD technology.
  • Input/Output Analysis:
    • Enhancement for Sky Lake and Cascade Lake servers
      I/O Analysis has been enhanced for servers based on Intel® processor microarchitectures codenamed Skylake and Cascade Lake by highlighting code that potentially performs MMIO reads.
  • Documentation:
    • PDF version of User Guide
      This release introduces a PDF version of the
      Intel® VTune™
      User Guide. Click
      Download as PDF
      at the top of this page to use the PDF version.

Intel® VTune™

This version of
Intel® VTune™
contains improvements and additions in these areas:
  • GPU Accelerators:
    • Support for oneAPI Level Zero Specification for DPC++ Applications
      When you run the GPU Offload analysis and GPU Compute/Media Hotspots analysis on DPC++ applications, there exists some support for applications that use the oneAPI Level Zero API in the back end. For these DPC++ applications, Intel VTune Profiler supports version 0.91.10 of the oneAPI Level Zero API. The support is available on Linux systems only. See the GPU analysis types for a complete description of supported features.
    • Update to IP Architecture diagram
      The IP Architecture Diagram of the GPU Compute/Media Hotspots analysis has been renamed to Memory Hierarchy Diagram. The diagram has been re-designed to improve the look and feel, and can help make the understanding of metrics more intuitive.
  • Platform Analysis:
    • New metrics in Input and Output Analysis
      The Input and Output analysis type features new
      Inbound PCIe Read/Write L3 Hit/Miss Ratio
      metrics that show the utilization efficiency of Intel® Data Direct I/O (Intel® DDIO) hardware technology. There are new metrics for Intel® Xeon® Scalable processors that allow data break down by PCIe devices.
  • Connection:
    • New TCP/IP Communication Agent
      This release features a new TCP/IP communication agent as a connection type, intended for profiling embedded systems running real-time operating systems. Use this connection type to profile the kernel of an arbitrary real-time operating system and the applications running on it. This requires the development of a custom agent (Analysis Communication Agent). A reference solution based on Linux OS is available through the Analysis Communication Agent GitHub* repository. Detailed information on developing an agent for a specific real-time operating system is available in the ACA documentation.
  • Cloud and Containerization:
    • Display of Container Name
      This release extends container profiling capabilities to display the container name instead of its ID for ease of identification.

Intel® VTune™

This version of
Intel® VTune™
contains improvements and additions in these areas:
  • Platform Analysis:
    • New metrics in Hardware Tracing mode.
      The Hardware Tracing mode in the System Overview analysis has been extended to include these metrics:
      • OS Kernel Activity
      • OS Scheduling
      The metrics help identify anomaly issues caused by unexpected kernel activity or preemptions.

Intel® VTune™
2021.1-beta05 and 2020 Update 1

This version of
Intel® VTune™
contains improvements and additions in these areas:
  • GPU Accelerators:
    • SIMD utilization metrics at kernel level.
      The GPU Compute/Media Hotspots analysis in the Dynamic Instruction Count mode now includes SIMD utilization metrics at the kernel and instruction level. These metrics help identify instructions in the OpenCL kernel that utilize SIMD poorly.
    • GPU metrics in APS and HPC Analysis type.
      The GPU utilization analysis in Application Performance Snapshot (APS) and the HPC Performance Characterization analysis now includes these GPU computation metrics:
      • GPU Time
      • GPU IPC
      • GPU Utilization
      • Percentage of stalled and idle EUs
    • GPU metrics in Application Performance Snapshot (APS).
      The GPU Compute metric set of Application Performance Snapshot has been enhanced with OpenMP Offload Efficiency metrics, including offload region overhead. These metrics are available for binaries compiled with the Intel® C/C++ Compiler included in several Intel® oneAPI Toolkits (Beta) 2021.1-beta05 or newer.
    • Simplified dependency on Intel® Metrics Discovery API library.
      There is now a simplified dependency on the Intel® Metrics Discovery API library to collect GPU hardware statistics on Linux* systems.
      Intel® VTune™
      now automatically selects the latest
      available in runtime to satisfy the GPU analysis requirements. For older versions of the product, follow procedures to enable manual configuration.
  • Platform Analysis:
    • Improvements to CPU/FPGA Interaction analysis.
      The CPU/FPGA Interaction analysis type can now process data sources collected with AOCL Profiler (new mode) in addition to OpenCL Profiling API (legacy mode). You can now specify the application and its parameters directly using new configuration options added to the analysis type.
    • Module Entry Point grouping in Hardware Tracing mode.
      The Hardware Tracing mode in the System Overview analysis also contains a new Module Entry Point grouping. The grouping shifts the focus to precise CPU time spent within sys calls, interrupts or within particular API of the runtime library.
    • New metrics for kernel mode switches.
      Two new metrics represent the number of kernel mode switches and their frequency (switches per second). The CPU Time metric is now divided into:
      • User time
      • Kernel time
      The new metrics make the analysis more kernel-aware.
  • Software support:
    • The Microarchitecture Exploration analysis type now supports Intel platforms code name Ice Lake.
    • Intel® VTune™
      now supports version 13 and older versions of these platforms:
      • OpenJDK* (Hotspots and Hardware event-based analysis types) (Windows/Linux OS)
      • Oracle* Java Virtual Machine (Linux OS)

Intel® VTune™

This version of
Intel® VTune™
includes the following updates to the previous version:
  • GPU Accelerators:
    • Simplified system configuration requirements for GPU analysis. GPU utilization analysis is now available without a prerequisite of rebuilding the Linux kernel. For systems that do not support Ftrace* technology, GPU Utilization statistics is collected based on hardware events and available only for the Render and GPGPU Engine. To collect detailed per-engine GPU Utilization statistics, make sure to either rebuild the kernel or configure and rebuild module i915 to enable i915 Ftrace* event collection.
  • Platform analysis:
    • Hardware Tracing mode in the System Overview analysis optimized to include user/kernel metrics, Thread/Hardware grouping, and module entry points

Intel® VTune™ Profiler 2020 and Intel® VTune™ Profiler 2021.1-beta03

Intel® VTune™ Amplifier has been renamed to Intel® VTune™ Profiler starting with its version for Intel® oneAPI Base Toolkit (Beta). You can still use a standalone version of the VTune Profiler, or its versions integrated into Intel Parallel Studio XE or Intel System Studio.
These versions of VTune Profiler include the following updates to the previous versions of VTune Amplifier:
  • GPU Accelerators:
    • New GPU Offload analysis added to explore and correlate code execution across CPUs and GPUs, and identify a kernel of interest for GPU-bound applications to be explored further with GPU Compute/Media Hotspots analysis
    • GPU Compute/Media Hotspots analysis extended with GPU in-kernel analysis for OpenCL™ code and an option to filter by a kernel of interest
    • Command line report scope extended to support GPU analysis types. You can apply the
      groupings to your collected data to focus on time-consuming computing tasks.
    • Dynamic instruction count collection available as part of the GPU Compute/Media Hotspots analysis improved to provide better accuracy for basic block Assembly analysis
    • Support for Intel® Processor Graphics Gen11
  • Platform Analysis:
    • System Overview analysis updated to serve as an entry point to platform analysis assessing your system (IO, accelerators and CPU) performance and providing guidance for further analysis steps
    • New Hardware Tracing mode in the System Overview analysis enabling application analysis on the micro-second level and helping you identify causes of the latency issues
  • HPC Analysis:
    • Max and Bound Bandwidth metrics added to the Application Performance Snapshot to better estimate the efficiency of the DRAM, MCDRAM, Intel Persistent Memory and Intel® Omni-Path usage
  • Energy Analysis:
    • New Throttling analysis added to identify causes for system throttling, like exceeding safe thermal or power limits
    • Options for Energy analysis, based on the Intel SoC Watch data collector, extended to monitor processor package energy consumption over time and identify how it correlates with CPU throttling
    • Overview and Memory views extended with new metrics for analyzing Non-Uniform Memory Access (NUMA) behavior
    • User authentication and authorization added to enable access control to your data
    • New option added to choose or modify the location of Platform Profiler data files
  • Cloud and Containerization:
    • Containerization support extended with an option to install and run the VTune Profiler in a Docker* container and profile targets both inside the same container as well as outside the container
    • Profiling support for applications running in Amazon Web Services* (AWS) EC2 Instances based on Intel microarchitecture code name Cascade Lake X
  • Fabric Profiler, a new performance tool, added to VTune Profiler in the Preview mode. Use the Fabric Profiler to identify detailed characteristics of the runtime behavior for an OpenSHMEM application.
  • Quality and usability improvements:
    • Symbol resolution for effective source-level analysis enabled for crossgen (Ahead-of-JIT compilation) functions on Linux* systems
    • Interactive
      Help Tour
      available from the Welcome page and guiding you through the product interface using a sample project
    • The third-party components updated to the most recent versions to include functional and security changes. You are recommended to update your product to the latest version.
  • New hardware/operating systems/IDEs support:
    • 10th Gen Intel® Core™ processors
    • Ubuntu* 19.10
    • Red Hat* Enterprise Linux* 8
    • Microsoft* Windows* 10, November 2019 Update
    For a full list of supported platforms, see the VTune Profiler Release Notes.
As part of Intel oneAPI Base Toolkit (Beta), VTune Profiler provides these features:
  • Support for Data Parallel C++ (DPC++) code profiling added across CPUs and multiple accelerator architectures, including GPUs and FPGAs
  • GPU Offload and GPU Compute/Media Hotspots types extended to support profiling DPC++ code and OpenMP* code offloaded to the GPU
  • CPU/FPGA Interaction analysis extended with FPGA device-side metrics, like Stalls, Global Bandwidth and Occupancy, and mapping FPGA kernel performance data to the source code
  • GPU Time and Utilization metrics added to the Application Performance Snapshot to help you triage your performance issues and identify whether your code is CPU or GPU bound

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804