User Guide

  • 2020
  • 05/20/2020
  • Public Content

What's New in
Intel® VTune™

Intel® VTune™

This version of
Intel® VTune™
contains improvements and additions in these areas:
  • Platform Analysis:
    • New metrics in Hardware Tracing mode.
      The Hardware Tracing mode in the System Overview analysis has been extended to include these metrics:
      • OS Kernel Activity
      • OS Scheduling
      The metrics help identify anomaly issues caused by unexpected kernel activity or preemptions.

Intel® VTune™
2021.1-beta05 and 2020 Update 1

This version of
Intel® VTune™
contains improvements and additions in these areas:
  • GPU Accelerators:
    • SIMD utilization metrics at kernel level.
      The GPU Compute/Media Hotspots analysis in the Dynamic Instruction Count mode now includes SIMD utilization metrics at the kernel and instruction level. These metrics help identify instructions in the OpenCL kernel that utilize SIMD poorly.
    • GPU metrics in APS and HPC Analysis type.
      The GPU utilization analysis in Application Performance Snapshot (APS) and the HPC Performance Characterization analysis now includes these GPU computation metrics:
      • GPU Time
      • GPU IPC
      • GPU Utilization
      • Percentage of stalled and idle EUs
    • GPU metrics in Application Performance Snapshot (APS).
      The GPU Compute metric set of Application Performance Snapshot has been enhanced with OpenMP Offload Efficiency metrics, including offload region overhead. These metrics are available for binaries compiled with the Intel® C/C++ Compiler included in several Intel® oneAPI Toolkits (Beta) 2021.1-beta05 or newer.
    • Simplified dependency on Intel® Metric Discovery API library.
      There is now a simplified dependency on the Intel® Metric Discovery API library to collect GPU hardware statistics on Linux* systems.
      Intel® VTune™
      now automatically selects the latest
      available in runtime to satisfy the GPU analysis requirements. For older versions of the product, follow procedures to enable manual configuration .
  • Platform Analysis:
    • Improvements to CPU/FPGA Interaction analysis.
      The CPU/FPGA Interaction analysis type can now process data sources collected with AOCL Profiler (new mode) in addition to OpenCL Profiling API (legacy mode). You can now specify the application and its parameters directly using new configuration options added to the analysis type.
    • Module Entry Point grouping in Hardware Tracing mode.
      The Hardware Tracing mode in the System Overview analysis also contains a new Module Entry Point grouping. The grouping shifts the focus to precise CPU time spent within sys calls, interrupts or within particular API of the runtime library.
    • New metrics for kernel mode switches.
      Two new metrics represent the number of kernel mode switches and their frequency (switches per second). The CPU Time metric is now divided into:
      • User time
      • Kernel time
      The new metrics make the analysis more kernel-aware.
  • Software support:
    • The Microarchitecture Exploration analysis type now supports Intel platforms code named Ice Lake.
    • Intel® VTune™
      now supports version 13 and older versions of these platforms:
      • OpenJDK (Hotspots and Hardware event-based analysis types) (Windows/Linux OS)
      • Oracle Java Virtual Machine (Linux OS)

Intel® VTune™

This version of
Intel® VTune™
includes the following updates to the previous version:
  • GPU Accelerators:
    • Simplified system configuration requirements for GPU analysis. GPU utilization analysis is now available without a prerequisite of rebuilding the Linux kernel . For systems that do not support Ftrace* technology, GPU Utilization statistics is collected based on hardware events and available only for the Render and GPGPU Engine. To collect detailed per-engine GPU Utilization statistics, make sure to either rebuild the kernel or configure and rebuild module i915 to enable i915 Ftrace* event collection.
  • Platform analysis:
    • Hardware Tracing mode in the System Overview analysis optimized to include user/kernel metrics, Thread/Hardware grouping, and module entry points

Intel® VTune™ Profiler 2020 and Intel® VTune™ Profiler 2021.1-beta03

Intel® VTune™ Amplifier has been renamed to Intel® VTune™ Profiler starting with its version for Intel® oneAPI Base Toolkit (Beta). You can still use a standalone version of the VTune Profiler, or its versions integrated into Intel Parallel Studio XE or Intel System Studio.
These versions of VTune Profiler include the following updates to the previous versions of VTune Amplifier:
  • GPU Accelerators:
    • New GPU Offload analysis added to explore and correlate code execution across CPUs and GPUs, and identify a kernel of interest for GPU-bound applications to be explored further with GPU Compute/Media Hotspots analysis
    • GPU Compute/Media Hotspots analysis extended with GPU in-kernel analysis for OpenCL™ code and an option to filter by a kernel of interest
    • Command line report scope extended to support GPU analysis types. You can apply the
      groupings to your collected data to focus on time-consuming computing tasks.
    • Dynamic instruction count collection available as part of the GPU Compute/Media Hotspots analysis improved to provide better accuracy for basic block Assembly analysis
    • Support for Intel® Processor Graphics Gen11
  • Platform Analysis:
    • System Overview analysis updated to serve as an entry point to platform analysis assessing your system (IO, accelerators and CPU) performance and providing guidance for further analysis steps
    • New Hardware Tracing mode in the System Overview analysis enabling application analysis on the micro-second level and helping you identify causes of the latency issues
  • HPC Analysis:
    • Max and Bound Bandwidth metrics added to the Application Performance Snapshot to better estimate the efficiency of the DRAM, MCDRAM, Intel Persistent Memory and Intel® Omni-Path usage
  • Energy analysis:
    • New Throttling analysis added to identify causes for system throttling, like exceeding safe thermal or power limits
    • Options for Energy analysis , based on the Intel SoC Watch data collector, extended to monitor processor package energy consumption over time and identify how it correlates with CPU throttling
    • Overview and Memory views extended with new metrics for analyzing Non-Uniform Memory Access (NUMA) behavior
    • User authentication and authorization added to enable access control to your data
    • New option added to choose or modify the location of Platform Profiler data files
  • Cloud and containerization:
    • Containerization support extended with an option to install and run the VTune Profiler in a Docker* container and profile targets both inside the same container as well as outside the container
    • Profiling support for applications running in Amazon Web Services* (AWS) EC2 Instances based on Intel microarchitecture code name Cascade Lake X
  • Fabric Profiler , a new performance tool, added to VTune Profiler in the Preview mode. Use the Fabric Profiler to identify detailed characteristics of the runtime behavior for an OpenSHMEM application.
  • Quality and usability improvements:
    • Symbol resolution for effective source-level analysis enabled for crossgen (Ahead-of-JIT compilation) functions on Linux* systems
    • Interactive
      Help Tour
      available from the Welcome page and guiding you through the product interface using a sample project
    • The third-party components updated to the most recent versions to include functional and security changes. You are recommended to update your product to the latest version.
  • New hardware/operating systems/IDEs support:
    • 10th Gen Intel® Core™ processors
    • Ubuntu* 19.10
    • Red Hat* Enterprise Linux* 8
    • Microsoft* Windows* 10, November 2019 Update
    For a full list of supported platforms, see the VTune Profiler Release Notes.
As part of Intel oneAPI Base Toolkit (Beta), VTune Profiler provides the following features:
  • Support for Data Parallel C++ (DPC++) code profiling added across CPUs and multiple accelerator architectures, including GPUs and FPGAs
  • GPU Offload and GPU Compute/Media Hotspots types extended to support profiling DPC++ code and OpenMP* code offloaded to the GPU
  • CPU/FPGA Interaction analysis extended with FPGA device-side metrics, like Stalls, Global Bandwidth and Occupancy, and mapping FPGA kernel performance data to the source code
  • GPU Time and Utilization metrics added to the Application Performance Snapshot to help you triage your performance issues and identify whether your code is CPU or GPU bound