What’s New

Intel® VTune™ Amplifier 2019 Beta

  • First-use experience improvements:

    • Basic Hotspots and Advanced Hotspots analyses extended with an option to show additional performance insights such as hardware usage efficiency and vector register utilization. Use this data to identify next steps for your performance analysis.

    • Simplified analysis configuration workflow providing you with pre-selected target and collection options available in the same view

  • Input and Output analysis on Linux* extended to profile DPDK and SPDK IO API. Use this data to correlate CPU activity with the network data plane utilization, visualize PCIe bandwidth utilization per NIC, estimate UPI bandwidth on multi-socket systems, and identify bottlenecks.

  • Analysis on embedded platforms and accelerators:

    • New CPU/FPGA Interaction analysis (PREVIEW) to assess the balance between the CPU and FPGA on systems with a discrete Intel® Arria® 10 FPGA running OpenCL™ applications

    • New Graphics Rendering analysis (PREVIEW) for CPU/GPU utilization of your code running on the Xen* virtualization platform installed on a remote embedded target

    • Support for the sampling command-line analysis on remote QNX* embedded systems via ethernet connection

    Note

    A PREVIEW FEATURE may or may not appear in a future production release. It is available for your use in the hopes that you will provide feedback on its usefulness and help determine its future. Data collected with a preview feature is not guaranteed to be backward compatible with future releases. Please send your feedback to parallel.studio.support@intel.com or to intelsystemstudio@intel.com.

  • HPC workload profiling improvements:

    • CPU Utilization metric refined to differentiate the utilization on logical vs. physical cores, which is particularly important for HPC applications running on Intel® Xeon® processor family processors

  • Managed runtime analysis improvements:

    • Extended JIT profiling for server-side applications running on the LLVM* or HHVM* PHP servers to support the event-based sampling analysis in the attach mode

    • Extended Java* code analysis with support for OpenJDK* 9 and Oracle* JDK 9

    • Enabled Advanced Hotspots analysis for .NET* Core applications running on Linux and Windows systems in the Launch Application mode

  • Microarchitecture analysis improvements:

  • Containerization support improvements:

    • Algorithm analysis types support (Basic Hotspots, Concurrency, and Locks and Waits) added for Docker* container targets

    • Profiling support for targets running in the Singularity* containers

    • Profiling native and Java applications in the Docker and LXC containers

  • Application Performance Snapshot improvements:

    • Added uncore-based metrics for DRAM/MCDRAM memory analysis, which helps identify whether your application is bandwidth bound

    • Added the ability to pause/resume collection with MPI_Pcontrol and itt API. The -start-paused option was added to exclude application execution from collection from the start to the first collection resume occurrence.

    • Enabled selection of which data types are collected to reduce overhead. The choices include MPI tracing, OpenMP tracing, hardware counter based collection, or a combination of the three.

    • Exposed the CPU Utilization metric by physical cores on processors that support proper hardware events.

    • Significantly reduced MPI tracing overhead when there are a large number of ranks.

    • Enriched MPI statistics generated by the aps-report utility by showing information about communicators used in the application and to group and filter collective operations by the communicators.

    • Improved integration with Intel® Trace Analyzer and Collector by adding the ability to generate profiling configuration files with the aps-report option.

  • Quality and usability improvements:

    • Hardware event-based analysis supported for targets running in the Hyper-V* environment on Windows* 10 Fall Creators Update (RedStone3)

    • Default finalization mode set to Fast to minimize post-processing overhead if the number of collected samples exceeds the threshold

    • The Data of Interest type of metric used for the hotspot navigation in the Source view replaced with the explicit metric selection in the grid and applying the Use for Hotspot Navigation context menu command

Intel® VTune™ Amplifier 2018 Update 2

  • Analysis on embedded platforms and accelerators:

    • New CPU/FPGA Interaction analysis (PREVIEW) to assess the balance between the CPU and FPGA on systems with a discrete Intel® Arria® 10 FPGA running OpenCL™ applications

    • New Graphics Rendering analysis (PREVIEW) for CPU/GPU utilization of your code running on the Xen* virtualization platform installed on a remote embedded target

    • Support for the sampling command-line analysis on remote QNX* embedded systems via ethernet connection

    Note

    A PREVIEW FEATURE may or may not appear in a future production release. It is available for your use in the hopes that you will provide feedback on its usefulness and help determine its future. Data collected with a preview feature is not guaranteed to be backward compatible with future releases. Please send your feedback to parallel.studio.support@intel.com or to intelsystemstudio@intel.com.

  • HPC workload profiling improvements:

    • CPU Utilization metric refined to differentiate the utilization on logical vs. physical cores, which is particularly important for HPC applications running on Intel® Xeon® processor family processors

  • Managed runtime analysis improvements:

    • Extended JIT profiling for server-side applications running on the LLVM* or HHVM* PHP servers to support the event-based sampling analysis in the attach mode

    • Extended Java* code analysis with support for OpenJDK* 9 and Oracle* JDK 9

    • Enabled Advanced Hotspots analysis for .NET* Core applications running on Linux and Windows systems in the Launch Application mode

  • Application Performance Snapshot improvements:

    • Added the ability to pause/resume collection with MPI_Pcontrol and itt API. The -start-paused option was added to exclude application execution from collection from the start to the first collection resume occurrence.

    • Enabled selection of which data types are collected to reduce overhead. The choices include MPI tracing, OpenMP tracing, hardware counter based collection, or a combination of the three.

    • Exposed the CPU Utilization metric by physical cores on processors that support proper hardware events.

    • Significantly reduced MPI tracing overhead when there are a large number of ranks.

    • Enriched MPI statistics generated by the aps-report utility by showing information about communicators used in the application and to group and filter collective operations by the communicators.

    • Improved integration with Intel® Trace Analyzer and Collector by adding the ability to generate profiling configuration files with the aps-report option.

  • Quality and usability improvements:

  • Support for new operating systems and IDEs including:

    • Fedora*

    • Ubuntu* 17.10

Intel® VTune™ Amplifier 2018 Update 1

  • HPC workload profiling improvements:

    • Application Performance Snapshot extended to use the VTune Amplifier sampling driver and Perf* system-wide profiling capability for reducing collection overhead and enabling Average DRAM and MCDRAM bandwidth measurement

    • Application Performance Snapshot's MPI tracing extended to cover applications using MPI_Abort

  • GPU analysis improvements:

  • Quality and usability improvements:

    • New amplxe-self-checker.sh script introduced to validate VTune Amplifier deployment on Linux*. The script launches several representative collections on sample applications to check how your system matches the VTune Amplifier requirements, and shows the diagnostics.
    • Improved accuracy of the Perf*-based driverless sampling collection running on the target system under Xen Hypervisor via enabling the usage of integrated Perf sampling interval

    • Better management of the EBS collection result size via configuration of the CPU sampling interval. Increasing the sampling interval may be useful for profiles with long durations or profiles that create large results. The Duration time estimate option is deprecated.

    • Optimized support for performance profiling on embedded devices with the Yocto Project* without prerequisite installation of the Intel System Studio or a complete version of the VTune Amplifier.

Intel® VTune™ Amplifier 2018

New VTune Amplifier 2018 product combines features originally provided by the Intel VTune Amplifier XE and Intel VTune Amplifier for Systems and also introduces the following new options targeted for both host-based and embedded remote target analysis:

  • Application Performance Snapshot providing a quick look at your application performance and helping understand where your application will benefit from tuning.

    • Performance metrics include MPI and OpenMP* parallelism, memory access, FPU utilization, and I/O efficiency with recommendations on further in-depth analysis.

    • New MPI metrics that help identify top 5 MPI functions by average consumed time and that show resident and virtual memory footprints per MPI rank and per compute node.

    • Support for multiple platforms, including Intel® Xeon® processors code named Skylake.

  • Performance analysis for targets (native and Java* services and daemons) continuously running in LXC*, Docker* and Mesos* containers via the Attach profiling mode and the Advanced Hotspots analysis

  • HPC workload profiling improvements:

    • Enhanced MPI metrics for HPC Performance Characterization analysis that expose scalability bottlenecks for hybrid applications

    • Summary view extended to show top 5 OpenMP* hotspots (functions and loops) executing serially in the master thread outside any parallel regions

    • Improved insight into parallelism inefficiencies for applications using Intel Threading Building Blocks (Intel TBB) with extended classification of high Overhead and Spin time

    • Increased detail and structure for vector efficiency metrics based on FLOP counters in the FPU Utilization section

    • New MPI Imbalance metric based on MPI Busy Wait time and parallel efficiency for a most awaited rank in the CPU Utilization section

    • New section presenting the data on the hottest loops and functions with arithmetic operations, which enables you to identify which loops/functions with FPU Usage took the most CPU Time

    • Optimized command line analysis flow for the hpc-performance with the summary report that shows metrics for CPU, Memory and FPU performance aspects including performance issue descriptions for metrics that exceed the predefined threshold. To hide issue descriptions in the summary report, use a new report-knob show-issues option.

  • Microarchitecture analysis improvements:

    • Fullscale driverless Memory Access analysis that now provides Average Latency statistics

    • Support for locator hardware event metrics for the General Exploration analysis results in the Source/Assembly view that enable you to filter the data by a metric of interest and identify performance-critical code lines/instructions

    • Summary view of the General Exploration analysis extended to explicitly display the measure for the hardware metrics: Clockticks vs. Piepline Slots

    • Detailed presentation of bandwidth bottlenecks with the Memory Access summary command line report that now includes new metrics on bandwidth utilization, such as the platform maximum bandwidth, maximum bandwidth observed during analysis, average bandwidth utilization, and % of Elapsed Time with high bandwidth utilization

    • More accurate DRAM Bandwidth Bound metric based on uncore events used to display memory usage statistics for the Memory Access and HPC Performance Characterization analyses

  • Managed runtime analysis improvements:

    • New Memory Consumption analysis for native and Python* Linux targets that monitors RAM consumption over time and helps identify memory objects allocated and released within the analysis run

    • Support for the mixed Python* and native code in the Locks and Waits analysis including call stack collection

  • GPU analysis improvements:

    • GPU Hotspots analysis extended to detect hottest computing tasks bound by GPU L3 bandwidth or DRAM bandwidth

    • New GPU In-kernel Profiling that helps analyze GPU kernel execution per code line and identify performance issues caused by memory latency or inefficient kernel algorithms

    • GPU Hotspots Summary view extended to provide the Packet Queue Depth and Packet Duration histograms for the analysis of DMA packet execution

    • New Full Compute event group added to the list of predefined GPU hardware event groups collected for Intel® HD Graphics and Intel Iris® Graphics. This group combines metrics from the Overview and Compute Basic presets and allows to see all detected GPU stalled/idle issues in the same view.

  • Support for performance analysis of a guest Linux* operating system via Kernel-based Virtual Machine (KVM) from a Linux host system with the KVM Guest OS option

  • Profiling Guided Optimization report generated with the amplxe-pgo-report.sh utility for the Intel® C++ compiler (Linux* only), GCC* and Clang* compiler to improve code optimization

  • Usability improvements:

    • New user-friendly GUI design for the Timeline pane, analysis and target configuration windows

    • Support for hotspot navigation and filtering of stack sampling analysis data by the Total type of values in the Source/Assembly view

    • Automated installation of the VTune Amplifier collectors on a remote Linux target system. This feature is helpful if you profile a target on a shared resource without VTune Amplifier installed or on an embedded platform where targets may be reset frequently.

  • Documentation improvements:

    • VTune Amplifier product help, tutorials, and Release Notes are available online only from the Intel Software Documentation Library in the Intel Developer Zone (IDZ). You can also download an offline version of the product help either from IDZ or from the Intel Software Development Products Registration Center.

    • New Find Your Analysis guide that helps pick your starting point for analysis based on your use case. The guide is available both online - from the product help - and offline - from the product Welcome page.

    • New performance analysis cookbook that contains recipes of identifying and solving the most popular performance problems with the help of VTune Amplifier's analysis types

  • Support for new Intel processors including:

    • Intel® Xeon Phi™ processors (code name Knights Landing and Knights Mill)

    • Intel® Xeon® Processor Scalable family

    • Intel® Atom™ processors codenamed Apollo Lake and Denverton

    • Intel processors codenamed KabyLake

    A full list of supported processors is available from the Release Notes.

  • Support for new operating systems and IDEs including:

    • Ubuntu* 17.04

    • Fedora* 26

    • Debian* 9.0

    • Microsoft Windows* 10 Creators Update (RS2)

    • Microsoft Visual Studio* 2017

    • Support for cross-OS analysis to all license types. Download installation packages for additional operating systems from registrationcenter.intel.com.

    A full list of supported operating systems is available from the Release Notes.

Optimization Notice: 
For more complete information about compiler optimizations, see our Optimization Notice.