What's New in Intel® VTune™ Amplifier

Intel® VTune™ Amplifier 2019 Update 4

  • GPU analysis improvements:

    • The Inline Mode filter bar option added to the GPU In-kernel Profiling viewpoint to display GPU-side call stacks with OpenCL™ inline functions and correctly attribute GPU Cycles statistics per function. By default, the Inline Mode is switched off.

    • New Instruction count profiling mode added to the GPU In-Kernel Profiling to analyze GPU instruction frequency per certain instruction types. This mode helps you compare the performance of the same OpenCL kernel on different hardware or explore instruction count for different implementations of the same algorithms on the same hardware.

    • Source/Assembly analysis available for OpenCL programs created with IL (intermediate language), if the intermediate SPIR-V binary was built with the -gline-tables-only -s <cl_source_file_name> option

  • Hardware event-based collection improvements:

    • Default driverless mode for hardware event-based collections with stacks, such as Hotspots and Threading, on Linux*. Benefit from this solution if you need to run an event-based sampling analysis but do not have administrative permissions to use Intel sampling drivers. You can still switch to the driver-based collection by setting the Stack size option to the unlimited (0) value or by using the Enable driverless collection option in the Custom analysis.

    • The Precise column added to the Summary of the Hardware Events viewpoint to clearly identify precise events. Using precise events in your configurations provides more accurate Assembly analysis with no event skids.

  • Quality and usability improvements:

    • Quick and easy access to VTune Amplifier basic commands and training material via a Welcome page now added to the Microsoft* Visual Studio* IDE interface in addition to the standalone product GUI

    • Overlay help with quick tips for Bottom-up and Configure Analysis windows highlighting important interface elements to efficiently manage collection and analysis data

    • Enhanced interface for the VTune Amplifier experience within Visual Studio IDE with the project and analysis results available from the Intel VTune Amplifier Results tab instead of the Solution Explorer

  • Support for new IDEs:

    • Microsoft Visual Studio* 2019

Intel® VTune™ Amplifier 2019 Update 3

  • Support for Intel® Optane™ DC persistent memory and the latest Intel microarchitecture code named Cascade Lake. This includes new hardware event support and enhanced memory analysis to design and optimize for the new persistent memory technology. See the Frequent DRAM Accesses Cookbook recipe for the usage example.

  • Enhanced PCIe device metrics for I/O traffic in the Input and Output analysis that help you understand the interactions between Cores and Network Interface Cards (NICs).

  • MPI analysis improvements:

    • Easier management of data collection for MPI applications using the standard MPI_PControl API. Collect only the data you need with a few quick changes and no dependency on the ITT API.

    • Easier MPI communication pattern diagnosis with Application Performance Snapshot's rank-to-rank communication diagram by message volume

  • Quality and usability improvements:

    • Friendlier welcome page provides fast access to technical content and project controls.

    • Improved importing process for traces and result files. It is now possible to import whole result directories to a project and use project search directories for symbol and source/assembly resolution.

    • Simplified installation and licensing (serial numbers and license files are no longer required for this product).

Intel® VTune™ Amplifier 2019 Update 2

  • Microarchitecture analysis improvements:

    • Configuration for the Microarchitecture Exploration analysis optimized to provide you with the control over collected hardware metrics and data collection overhead in general. By default, the analysis provides you with a full set of top-level hardware metrics and their sub-metrics that show how your code uses hardware resources. With a new configuration option, you can choose to narrow down the scope and collect sub-metrics only for the selected top-level metrics.

  • System Analyzer tool for monitoring real-time metrics on a target system added to the VTune Amplifier as a PREVIEW feature. See the VTune Amplifier Performance Analysis Cookbook recipe for more details.

  • HPC workload profiling improvements:

  • Supported managed Linux and Windows targets with tiered compilation for .NET* Core 3.0 Preview 1 and .NET Core 2.2

  • Quality and usability improvements:

    • Improved support for standalone command-line results imported into a VTune Amplifier GUI project. Search directories specified in the command line configuration are preserved and applied for proper module resolution in the graphical viewpoints.

Note

A PREVIEW FEATURE may or may not appear in a future production release. It is available for your use in the hopes that you will provide feedback on its usefulness and help determine its future. Data collected with a preview feature is not guaranteed to be backward compatible with future releases. Please send your feedback to parallel.studio.support@intel.com or to intelsystemstudio@intel.com.

Intel® VTune™ Amplifier 2019 Update 1 Patch 1

  • Functional and security updates

Intel® VTune™ Amplifier 2019 Update 1

  • Threading analysis extended with the lower overhead hardware event-based sampling mode. This mode helps analyze an impact of thread preemption and context switching. On Windows*, this analysis configuration requires the sampling driver. On Linux*, the analysis is available both with the sampling driver and with the Linux Perf* collector for kernels 4.4 and higher.

  • Quality and usability improvements:

    • summary command line report for the Hotspots analysis enriched with metrics and Top 5 Hotspots table that is also available from the GUI Summary view.

    • A sample matrix project added to the Project Navigator to help you get started with the product, review a sample pre-collected Hotspots result, and test other analysis types and source view options. A pre-built version of the matrix sample application and associated source files are available installed with VTune Amplifier.

    • Support for Linux Perf* collection extended with VTune Amplifier metrics with a further option to import the Perf trace to the VTune Amplifier GUI and benefit from predefined viewpoints. This solution could be useful for performance analysis in data centers)

Intel® VTune™ Amplifier 2019

  • New Hotspots analysis, combining former Basic Hotspots and Advanced Hotspots analysis configurations, that provides quick understanding of the application performance hotspots and further analysis steps - insights. By default, the Hotspots analysis operates on the user-mode sampling collection mode, but you can enable the lower overhead hardware event-based sampling mode that requires the sampling driver to be installed.

  • New Threading analysis combining and replacing former Concurrency and Locks and Waits analysis types

  • New Intel VTune Amplifier Platform Profiler tool that provides low-overhead, system-wide analysis and insights into overall system configuration performance and behavior. Use the tool to:

    • Identify bottlenecks by monitoring over- or under-utilized subsystems and buses (CPU, storage, memory, PCIe, and network interfaces) and platform-level imbalances

    • Understand a system topology using diagrams annotated with performance data

    • Capture average-case and transient behaviors for data-center applications

  • Microarchitecture analysis improvements:

    • Microarchitecture Exploration (formerly known as General Exploration) analysis configuration split to provide either a lightweight summary analysis or full detailed analysis with all levels of PMU metrics

    • Microarchitecture Exploration analysis view extended with the hardware metric representation that helps easily identify bottlenecks in the hardware usage and benefit from quick insights

  • HPC workload profiling improvements:

    • CPU Utilization metric refined to differentiate the utilization on logical vs. physical cores, which is particularly important for HPC applications running on Intel® Xeon® processor family processors

    • Intel® Omni-Path Architecture Interconnect Bandwidth and Packet rate metrics added to HPC Performance Characterization analysis to identify performance bottlenecks caused by interconnect limits

    • HPC Performance Characterization analysis enriched with a thread affinity report that helps analyze CPU utilization or memory access issues of multithreaded and hybrid MPI and OpenMP* applications

  • GPU Compute/Media Hotspots analysis (formerly known as GPU Hotspots) on Linux updated to use Intel Metric Discovery API library for GPU metric collection, which involves support for kernel 4.14 and higher

  • Input and Output analysis on Linux* extended to profile DPDK and SPDK IO API. Use this data to correlate CPU activity with the network data plane utilization, visualize PCIe bandwidth utilization per NIC, estimate UPI bandwidth on multi-socket systems, and identify bottlenecks.

  • Containerization support improvements:

  • Managed runtime analysis improvements:

    • Extended JIT profiling for server-side applications running on the LLVM* or HHVM* PHP servers to support the event-based sampling analysis in the attach mode

    • Extended Java* code analysis with support for OpenJDK* 9 and Oracle* Java SE Development Kit 9

    • Improved source code analysis for .NET* Core applications running on Linux and Windows systems

  • Analysis on embedded platforms and accelerators:

    • New CPU/FPGA Interaction analysis (PREVIEW) to assess the balance between the CPU and FPGA on systems with a discrete Intel® Arria® 10 FPGA running OpenCL™ applications

    • New GPU Rendering analysis (PREVIEW) for CPU/GPU utilization of your code running on the Xen* virtualization platform installed on a remote embedded target

    • Support for the sampling command-line analysis on remote QNX* embedded systems via ethernet connection

  • KVM guest OS profiling extended to profile both KVM kernel and user space from the host system, which is helpful for a full-scale performance analysis of host and guest systems

  • Application Performance Snapshot improvements:

    • Added uncore-based metrics for DRAM/MCDRAM memory analysis, which helps identify whether your application is bandwidth bound

    • Added the ability to pause/resume collection with MPI_Pcontrol and itt API. The -start-paused option was added to exclude application execution from collection from the start to the first collection resume occurrence.

    • Enabled selection of which data types are collected to reduce overhead. The choices include MPI tracing, OpenMP tracing, hardware counter based collection, or a combination of the three.

    • Exposed the CPU Utilization metric by physical cores on processors that support proper hardware events.

    • Significantly reduced MPI tracing overhead when there are a large number of ranks.

    • Enriched MPI statistics generated by the aps-report utility by showing information about communicators used in the application and to group and filter collective operations by the communicators.

    • Improved integration with Intel® Trace Analyzer and Collector by adding the ability to generate profiling configuration files with the aps-report option.

    • Intel® Omni-Path Architecture Interconnect Bandwidth and Packet rate metrics added to explore MPI communication bottlenecks

    • Added an HTML-based rank-to-rank communication diagram to better visualize MPI application communication patterns

  • Quality and usability improvements:

    • Optimized product graphical interface with a simplified analysis configuration workflow providing you with pre-selected target and collection options available in the same view

    • Hardware event-based analysis supported for targets running in the Hyper-V* environment on Windows* 10 Fall Creators Update (RedStone3)

    • Default finalization mode set to Fast to minimize post-processing overhead if the number of collected samples exceeds the threshold

    • The Data of Interest type of metric used for the hotspot navigation in the Source view replaced with the explicit metric selection in the grid and applying the Use for Hotspot Navigation context menu command

    • CPU Frequency metric provided for the event-based analysis types (using the sampling driver) is improved to display more reliable data based on the P-State collection. The CPU Frequency metric is not provided for the user-mode sampling and tracing analyses and for analyses using the Perf* collector.

    • A list of supported output formats for the command line reports extended to support XML and HTML options

  • Support for new operating systems:

    • SUSE* Linux* Enterprise Server (SLES) 15

    • Red Hat* Enterprise Linux* 7.5

    • Ubuntu* 18.04

    • Fedora* 28

    • Microsoft Windows* 10 RS4

Optimization Notice: 

standard

For more complete information about compiler optimizations, see our Optimization Notice.
Select sticky button color: 
Orange (only for download buttons)