Hotspots Analysis for CPU Usage Issues

Use the Hotspots analysis to understand an application flow and identify sections of code that get a lot of execution time (hotspots). This is a starting point for your algorithm analysis.

The Hotspots analysis with Intel® VTune™ Amplifier provides two sampling-based collection modes:

  • User-mode sampling, which incurs higher overhead but does not require sampling drivers for collection. Starting with VTune Amplifier 2019, this mode replaces the former Basic Hotspots analysis.

  • Hardware event-based sampling, which provides minimum collection overhead but needs sampling drivers or Perf* to be installed. Starting with VTune Amplifier 2019, this mode replaces the former Advanced Hotspots analysis.

How It Works: User-Mode Sampling

VTune Amplifier uses a low overhead (about 5%) user-mode sampling and tracing collection that gets you the information you need without slowing down application execution significantly. The data collector profiles your application using the OS timer, interrupts a process, collects samples of all active instruction addresses with the sampling interval of 10ms, and captures a call sequence (stack) for each sample. VTune Amplifier stores the sampled instruction pointer (IP) along with a call sequence in data collection files, and then analyzes and displays this data in a result tab. Statistically collected IP samples with call sequences enable the VTune Amplifier to display a top-down tree (call tree). Use this data to understand the control flow for statistically important code sections.

In the user-mode sampling, the collector does not gather system-wide performance data but focuses on your application only. To analyze system performance, use the hardware event-based sampling mode.

VTune Amplifier displays a list of functions in your application ordered by the amount of time spent in each function. It also captures the call stacks for each of these functions so you can see how the hot functions are called.

A large number of samples collected at a specific process, thread, or module can imply high processor utilization and potential performance bottlenecks. Some hotspots can be removed, while other hotspots are fundamental to the application functionality and cannot be removed.

How It Works: Hardware Event-Based Sampling

The hardware event-based sampling mode is based on the hardware event-based sampling collection and analyzes all the processes running on your system at the moment, providing CPU time data on whole system performance. VTune Amplifier creates a list of functions in your application ordered by the amount of time spent in each function. By default, the Hotspots analysis in the hardware event-based sampling mode does not capture the function call stacks as the hotspots are collected. But you still can analyze stacks for your application modules by selecting the Collect stacks option explicitly.

Note

On 32-bit Linux* systems, the VTune Amplifier uses a driverless Perf*-based collection for the hardware event-based sampling mode.

Configure and Run Analysis

To configure and run the Hotspots analysis:

Prerequisites: Create a project.

  1. Click the (standalone GUI)/ (Visual Studio IDE) Configure Analysis button on the Intel® VTune™ Amplifier toolbar.

    The Configure Analysis window opens.

  2. From HOW pane, click the Browse button and select the Hotspots analysis.

  3. Configure the following options:

    User-Mode Sampling mode

    Select to enable the user-mode sampling and tracing collection for hotspots and call stack analysis (formerly known as Basic Hotspots). This collection mode uses a fixed sampling interval of 10ms. If you need to change the interval, click the Copy button and create a custom analysis configuration.

    Hardware Event-Based Sampling mode

    Select to enable hardware event-based sampling collection for hotspots analysis (formerly known as Advanced Hotspots).

    You can configure the following options for this collection mode:

    • CPU sampling interval, ms to specify an interval (in milliseconds) between CPU samples. Possible values for thehardware event-based sampling mode are 0.01-1000. 1 ms is used by default.

    • Collect stacks to enable advanced collection of call stacks and thread context switches.

    Note

    When changing collection options, pay attention to the Overhead diagram on the right. It dynamically changes to reflect the collection overhead incurred by the selected options.

    Show additional performance insights check box

    Get additional performance insights, such as vectorization, and learn next steps. This option collects additional CPU events, which may enable the multiplexing mode.

    The option is enabled by default.

    Details button

    Expand/collapse a section listing the default non-editable settings used for this analysis type. If you want to modify or enable additional settings for the analysis, you need to create a custom configuration by copying an existing predefined configuration. VTune Amplifier creates an editable copy of this analysis type configuration.

  4. Click the Start button to run the analysis.

Note

You may generate the command line for this configuration using the Command Line... button at the bottom.

View Data

When the data is collected, VTune Amplifier opens it in the Hotspots by CPU Utilization viewpoint providing the following views for analysis:

  • Summary window displays statistics on the overall application execution to analyze CPU time and processor utilization.

  • Bottom-up window displays hotspot functions in the bottom-up tree, CPU time and CPU utilization per function.

  • Top-down Tree window displays hotspot functions in the call tree, performance metrics for a function only (Self value) and for a function and its children together (Total value).

  • Caller/Callee window displays parent and child functions of the selected focus function.

  • Platform window provides details on CPU and GPU utilization, frame rate, memory bandwidth, and user tasks (if corresponding metrics are collected).

What's Next

  1. Identify the most time-consuming function in the grid and double-click it for source analysis.

  2. Analyze the source of the critical function starting with the highlighted hottest code line and moving further with the Hotspot Navigation options.

  3. Modify your code to remove bottlenecks and improve the performance of your application.

  4. Re-run the analysis and verify your optimization with the comparison mode.

For further steps, explore the Insights section provided in the Summary window. This section contains information on your target performance against metrics collected in addition to standard hotspots metrics. If there are any performance issues detected, the VTune Amplifier flags such a metric value and provides an insight on potential next steps to fix the problem.

Information provided by Hotspots analysis is important for tuning serial applications and it is still useful for tuning the serial sections of parallel applications. The Hotspots analysis data helps you understand what your application is doing and identify the code that is critical to tune. For parallel applications running on multi-core systems you may need additional analyses: Threading or HPC Performance Characterization.

For more complete information about compiler optimizations, see our Optimization Notice.
Select sticky button color: 
Orange (only for download buttons)