User Guide

Contents

Hotspots Analysis for CPU Usage Issues

Use the Hotspots analysis to understand an application flow and identify sections of code that get a lot of execution time (hotspots). This is a starting point for your algorithm analysis.
Hotspot Analysis Summary
Hotspots analysis has two sampling-based collection modes:
  • User-mode sampling, which incurs higher overhead but does not require sampling drivers for collection. Starting with Intel® VTune™ Amplifier 2019, this mode replaced the former Basic Hotspots analysis.
  • Hardware event-based sampling, which provides minimum collection overhead but needs sampling drivers or Perf* to be installed. Starting with VTune Amplifier 2019, this mode replaced the former Advanced Hotspots analysis.
Intel® VTune™ Amplifier has been renamed to Intel® VTune™ Profiler starting with its version for Intel® oneAPI Base Toolkit (Beta). You can still use a standalone version of the VTune Profiler, or its versions integrated into Intel Parallel Studio XE or Intel System Studio.

How It Works: User-Mode Sampling

VTune
Profiler
uses a low overhead (about 5%) user-mode sampling and tracing collection that gets you the information you need without slowing down application execution significantly. The data collector profiles your application using the OS timer, interrupts a process, collects samples of all active instruction addresses with the sampling interval of 10ms, and captures a call sequence (stack) for each sample.
VTune
Profiler
stores the sampled instruction pointer (IP) along with a call sequence in data collection files, and then analyzes and displays this data in a result tab. Statistically collected IP samples with call sequences enable the
VTune
Profiler
to display a top-down tree (call tree). Use this data to understand the control flow for statistically important code sections.
In the user-mode sampling, the collector does not gather system-wide performance data but focuses on your application only. To analyze system performance, use the
hardware event-based sampling
mode.
VTune
Profiler
displays a list of functions in your application ordered by the amount of time spent in each function. It also captures the call stacks for each of these functions so you can see how the hot functions are called.
A large number of samples collected at a specific process, thread, or module can imply high processor utilization and potential performance bottlenecks. Some hotspots can be removed, while other hotspots are fundamental to the application functionality and cannot be removed.

How It Works: Hardware Event-Based Sampling

The
hardware event-based sampling
mode is based on the hardware event-based sampling collection and analyzes all the processes running on your system at the moment, providing CPU time data on whole system performance.
VTune
Profiler
creates a list of functions in your application ordered by the amount of time spent in each function. By default, the Hotspots analysis in the
hardware event-based sampling
mode does not capture the function call stacks as the hotspots are collected. But you still can analyze stacks for your application modules by selecting the
Collect stacks
option explicitly.
  • If you cannot run the hardware event-based sampling with stacks, disable the
    Collect stacks
    option and run the collection. To correlate the obtained hardware event-based sampling data with stacks, run a separate Hotspots analysis in the User-Mode Sampling mode.
  • On 32-bit Linux* systems, the
    VTune
    Profiler
    uses a driverless Perf*-based collection for the
    hardware event-based sampling
    mode.

Configure and Run Analysis

To configure and run the Hotspots analysis:
Prerequisites
: Create a project.
  1. Click the (standalone GUI)/ (Visual Studio IDE)
    Configure Analysis
    button on the
    VTune
    Profiler
    welcome screen.
  2. In the
    HOW
    pane, select the
    Hotspots
    analysis from the Analysis Tree.
  3. Configure the following options:
    User-Mode Sampling
    mode
    Select to enable the user-mode sampling and tracing collection for hotspots and call stack analysis (formerly known as Basic Hotspots). This collection mode uses a fixed sampling interval of 10ms. If you need to change the interval, click the
    Copy
    button and create a custom analysis configuration.
    Hardware Event-Based Sampling
    mode
    Select to enable hardware event-based sampling collection for hotspots analysis (formerly known as Advanced Hotspots).
    You can configure the following options for this collection mode:
    • CPU sampling interval, ms
      to specify an interval (in milliseconds) between CPU samples. Possible values for the
      hardware event-based sampling
      mode are
      0.01-1000
      .
      1 ms
      is used by default.
    • Collect stacks
      to enable advanced collection of call stacks and thread context switches.
    When changing collection options, pay attention to the
    Overhead
    diagram on the right. It dynamically changes to reflect the collection overhead incurred by the selected options.
    Show additional performance insights
    check box
    Get additional performance insights, such as vectorization, and learn next steps. This option collects additional CPU events, which may enable the multiplexing mode.
    The option is enabled by default.
    Details
    button
    Expand/collapse a section listing the default non-editable settings used for this analysis type. If you want to modify or enable additional settings for the analysis, you need to create a custom configuration by copying an existing predefined configuration.
    VTune
    Profiler
    creates an editable copy of this analysis type configuration.
  4. Click the Start button to run the analysis.
To generate the command line for this configuration, click the
Command Line...
button at the bottom.

View Data

When the data is collected,
VTune
Profiler
opens it in the
Hotspots by CPU Utilization
viewpoint providing the following views for analysis:
  • Summary window displays statistics on the overall application execution to analyze CPU time and processor utilization.
  • Bottom-up window displays hotspot functions in the bottom-up tree, CPU time and CPU utilization per function.
  • Top-down Tree window displays hotspot functions in the call tree, performance metrics for a function only (Self value) and for a function and its children together (Total value).
  • Caller/Callee window displays parent and child functions of the selected focus function.
  • Platform window provides details on CPU and GPU utilization, frame rate, memory bandwidth, and user tasks (if corresponding metrics are collected).

What's Next

  1. Identify the most time-consuming function in the grid and double-click it for source analysis.
  2. Analyze the source of the critical function starting with the highlighted hottest code line and moving further with the Hotspot Navigation options.
  3. Modify your code to remove bottlenecks and improve the performance of your application.
  4. Re-run the analysis and verify your optimization with the comparison mode.
For further steps, explore the
Insights
section provided in the
Summary
window. This section contains information on your target performance against metrics collected in addition to standard hotspots metrics. If there are any performance issues detected, the
VTune
Profiler
flags such a metric value and provides an insight on potential next steps to fix the problem.
Information provided by Hotspots analysis is important for tuning serial applications and it is still useful for tuning the serial sections of parallel applications. The Hotspots analysis data helps you understand what your application is doing and identify the code that is critical to tune. For parallel applications running on multi-core systems you may need additional analyses: Threading or HPC Performance Characterization.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804