• 2018
  • 04/03/2018
  • Public Content

Intel® VTune™ Amplifier 2018 Update 2
  • Analysis on embedded platforms and accelerators:
    • New CPU/FPGA Interaction analysis (PREVIEW) to assess the balance between the CPU and FPGA on systems with a discrete Intel® Arria® 10 FPGA running OpenCL™ applications
    • New Graphics Rendering analysis (PREVIEW) for CPU/GPU utilization of your code running on the Xen* virtualization platform installed on a remote embedded target
    • Support for the sampling command-line analysis on remote QNX* embedded systems via ethernet connection
    Note
    A PREVIEW FEATURE may or may not appear in a future production release. It is available for your use in the hopes that you will provide feedback on its usefulness and help determine its future. Data collected with a preview feature is not guaranteed to be backward compatible with future releases. Please send your feedback to parallel.studio.support@intel.com or to intelsystemstudio@intel.com.
  • HPC workload profiling improvements:
    • CPU Utilization metric refined to differentiate the utilization on logical vs. physical cores, which is particularly important for HPC applications running on Intel® Xeon® processor family processors
  • Managed runtime analysis improvements:
    • Extended JIT profiling for server-side applications running on the LLVM* or HHVM* PHP servers to support the event-based sampling analysis in the attach mode
    • Extended Java* code analysis with support for OpenJDK* 9 and Oracle* JDK 9
    • Enabled Advanced Hotspots analysis for .NET* Core applications running on Linux and Windows systems in the
      Launch Application
      mode
  • Application Performance Snapshot improvements:
    • Added the ability to pause/resume collection with
      MPI_Pcontrol
      and
      itt API
      . The
      -start-paused
      option was added to exclude application execution from collection from the start to the first collection resume occurrence.
    • Enabled selection of which data types are collected to reduce overhead. The choices include MPI tracing, OpenMP tracing, hardware counter based collection, or a combination of the three.
    • Exposed the CPU Utilization metric by physical cores on processors that support proper hardware events.
    • Significantly reduced MPI tracing overhead when there are a large number of ranks.
    • Enriched MPI statistics generated by the
      aps-report
      utility by showing information about communicators used in the application and to group and filter collective operations by the communicators.
    • Improved integration with Intel® Trace Analyzer and Collector by adding the ability to generate profiling configuration files with the
      aps-report
      option.
  • Quality and usability improvements:
  • Support for new operating systems and IDEs including:
    • Fedora*
    • Ubuntu* 17.10
Intel® VTune™ Amplifier 2018 Update 1
  • HPC workload profiling improvements:
    • Application Performance Snapshot extended to use the VTune Amplifier sampling driver and Perf* system-wide profiling capability for reducing collection overhead and enabling Average DRAM and MCDRAM bandwidth measurement
    • Application Performance Snapshot's MPI tracing extended to cover applications using
      MPI_Abort
  • GPU analysis improvements:
  • Quality and usability improvements:
    • New
      amplxe-self-checker.sh
      script introduced to validate VTune Amplifier deployment on Linux*. The script launches several representative collections on sample applications to check how your system matches the VTune Amplifier requirements, and shows the diagnostics.
    • Improved accuracy of the Perf*-based driverless sampling collection running on the target system under Xen Hypervisor via enabling the usage of integrated Perf sampling interval
    • Better management of the EBS collection result size via configuration of the CPU sampling interval. Increasing the sampling interval may be useful for profiles with long durations or profiles that create large results. The Duration time estimate option is deprecated.
    • Optimized support for performance profiling on embedded devices with the Yocto Project* without prerequisite installation of the Intel System Studio or a complete version of the VTune Amplifier.
Intel® VTune™ Amplifier 2018
New VTune Amplifier 2018 product combines features originally provided by the Intel VTune Amplifier XE and Intel VTune Amplifier for Systems and also introduces the following new options targeted for both host-based and embedded remote target analysis:
  • Application Performance Snapshot providing a quick look at your application performance and helping understand where your application will benefit from tuning.
    • Performance metrics include MPI and OpenMP* parallelism, memory access, FPU utilization, and I/O efficiency with recommendations on further in-depth analysis.
    • New MPI metrics that help identify top 5 MPI functions by average consumed time and that show resident and virtual memory footprints per MPI rank and per compute node.
    • Support for multiple platforms, including Intel® Xeon® processors code named Skylake.
  • Performance analysis for targets (native and Java* services and daemons) continuously running in LXC*, Docker* and Mesos* containers via the Attach profiling mode and the Advanced Hotspots analysis
  • HPC workload profiling improvements:
    • Enhanced MPI metrics for HPC Performance Characterization analysis that expose scalability bottlenecks for hybrid applications
    • Summary view extended to show top 5 OpenMP* hotspots (functions and loops) executing serially in the master thread outside any parallel regions
    • Improved insight into parallelism inefficiencies for applications using Intel Threading Building Blocks (Intel TBB) with extended classification of high Overhead and Spin time
    • Increased detail and structure for vector efficiency metrics based on FLOP counters in the FPU Utilization section
    • New MPI Imbalance metric based on MPI Busy Wait time and parallel efficiency for a most awaited rank in the CPU Utilization section
    • New section presenting the data on the hottest loops and functions with arithmetic operations, which enables you to identify which loops/functions with FPU Usage took the most CPU Time
    • Optimized command line analysis flow for the
      hpc-performance
      with the
      summary
      report that shows metrics for CPU, Memory and FPU performance aspects including performance issue descriptions for metrics that exceed the predefined threshold. To hide issue descriptions in the
      summary
      report, use a new
      report-knob show-issues
      option.
  • Microarchitecture analysis improvements:
    • Fullscale driverless Memory Access analysis that now provides Average Latency statistics
    • Support for locator hardware event metrics for the General Exploration analysis results in the Source/Assembly view that enable you to filter the data by a metric of interest and identify performance-critical code lines/instructions
    • Summary view of the General Exploration analysis extended to explicitly display the measure for the hardware metrics: Clockticks vs. Piepline Slots
    • Detailed presentation of bandwidth bottlenecks with the Memory Access summary command line report that now includes new metrics on bandwidth utilization, such as the platform maximum bandwidth, maximum bandwidth observed during analysis, average bandwidth utilization, and % of Elapsed Time with high bandwidth utilization
    • More accurate DRAM Bandwidth Bound metric
      based on uncore events
      used to display memory usage statistics for the Memory Access and HPC Performance Characterization analyses
  • Managed runtime analysis improvements:
    • New Memory Consumption analysis for native and Python* Linux targets that monitors RAM consumption over time and helps identify memory objects allocated and released within the analysis run
    • Support for the mixed Python* and native code in the Locks and Waits analysis including call stack collection
  • GPU analysis improvements:
    • GPU Hotspots analysis extended to detect hottest computing tasks bound by GPU L3 bandwidth or DRAM bandwidth
    • New GPU In-kernel Profiling that helps analyze GPU kernel execution per code line and identify performance issues caused by memory latency or inefficient kernel algorithms
    • GPU Hotspots Summary view extended to provide the Packet Queue Depth and Packet Duration histograms for the analysis of DMA packet execution
    • New Full Compute event group added to the list of predefined GPU hardware event groups collected for Intel® HD Graphics and Intel Iris® Graphics. This group combines metrics from the Overview and Compute Basic presets and allows to see all detected GPU stalled/idle issues in the same view.
  • Support for performance analysis of a guest Linux* operating system via Kernel-based Virtual Machine (KVM) from a Linux host system with the KVM Guest OS option
  • Profiling Guided Optimization report generated with the
    amplxe-pgo-report.sh
    utility for the Intel® C++ compiler (Linux* only), GCC* and Clang* compiler to improve code optimization
  • Usability improvements:
    • New user-friendly GUI design for the Timeline pane, analysis and target configuration windows
    • Support for hotspot navigation and filtering of stack sampling analysis data by the
      Total
      type of values in the Source/Assembly view
    • Automated installation of the VTune Amplifier collectors on a remote Linux target system. This feature is helpful if you profile a target on a shared resource without VTune Amplifier installed or on an embedded platform where targets may be reset frequently.
  • Documentation improvements:
    • VTune Amplifier product help, tutorials, and Release Notes are available online only from the Intel Software Documentation Library in the Intel Developer Zone (IDZ). You can also download an offline version of the product help either from IDZ or from the Intel Software Development Products Registration Center.
    • New Find Your Analysis guide that helps pick your starting point for analysis based on your use case. The guide is available both online - from the product help - and offline - from the product Welcome page.
    • New performance analysis cookbook that contains recipes of identifying and solving the most popular performance problems with the help of VTune Amplifier's analysis types
  • Support for new Intel processors including:
    • Intel® Xeon Phi™ processors (code name Knights Landing and Knights Mill)
    • Intel® Xeon® Processor Scalable family
    • Intel® Atom™ processors codenamed Apollo Lake and Denverton
    • Intel processors codenamed KabyLake
    A full list of supported processors is available from the Release Notes.
  • Support for new operating systems and IDEs including:
    • Ubuntu* 17.04
    • Fedora* 26
    • Debian* 9.0
    • Microsoft Windows* 10 Creators Update (RS2)
    • Microsoft Visual Studio* 2017
    • Support for cross-OS analysis to all license types. Download installation packages for additional operating systems from registrationcenter.intel.com.
    A full list of supported operating systems is available from the Release Notes.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804