Hotspots Analysis for CPU Usage Issues
Use the Hotspots analysis to understand an application flow and identify sections of code that get a lot of execution time (hotspots). This is a starting point for your algorithm analysis.

Hotspots analysis has two sampling-based collection modes:
- User-mode sampling, which incurs higher overhead but does not require sampling drivers for collection. Starting with Intel® VTune™ Amplifier 2019, this mode replaced the former Basic Hotspots analysis.
- Hardware event-based sampling, which provides minimum collection overhead but needs sampling drivers or Perf* to be installed. Starting with VTune Amplifier 2019, this mode replaced the former Advanced Hotspots analysis.
Intel® VTune™ Profiler is a new renamed version of the Intel® VTune™ Amplifier.
How It Works: User-Mode Sampling
VTune
uses a low overhead (about 5%)
user-mode sampling and tracing collection that gets you the information you need without slowing down application execution significantly. The data collector profiles your application using the OS timer, interrupts a process, collects samples of all active instruction addresses with the sampling interval of 10ms, and captures a call sequence (stack) for each sample.
Profiler
VTune
stores the sampled instruction pointer (IP) along with a call sequence in data collection files, and then analyzes and displays this data in a result tab. Statistically collected IP samples with call sequences enable the
Profiler
VTune
to display a top-down tree (call tree). Use this data to understand the control flow for statistically important code sections.
Profiler
In the user-mode sampling, the collector does not gather system-wide performance data but focuses on your application only. To analyze system performance, use the
hardware event-based sampling
mode.
VTune
displays a list of functions in your application ordered by the amount of time spent in each function. It also captures the call stacks for each of these functions so you can see how the hot functions are called.
Profiler
A large number of samples collected at a specific process, thread, or module can imply high processor utilization and potential performance bottlenecks. Some hotspots can be removed, while other hotspots are fundamental to the application functionality and cannot be removed.
How It Works: Hardware Event-Based Sampling
The
hardware event-based sampling
mode is based on the
hardware event-based sampling collection and analyzes all the processes running on your system at the moment, providing CPU time data on whole system performance.
VTune
creates a list of functions in your application ordered by the amount of time spent in each function. By default, the Hotspots analysis in the
Profiler
hardware event-based sampling
mode does not capture the function call stacks as the hotspots are collected. But you still can analyze stacks for your application modules by selecting the
Collect stacks
option explicitly.
- If you cannot run the hardware event-based sampling with stacks, disable theCollect stacksoption and run the collection. To correlate the obtained hardware event-based sampling data with stacks, run a separate Hotspots analysis in the User-Mode Sampling mode.
- On 32-bit Linux* systems, theVTuneuses a driverless Perf*-based collection for theProfilerhardware event-based samplingmode.
Configure and Run Analysis
To configure and run the Hotspots analysis:
Prerequisites
:
Create a project.
- Click the
(standalone GUI)/
(Visual Studio IDE)
Configure Analysisbutton on theVTunewelcome screen.Profiler - In theHOWpane, select theHotspotsanalysis from the Analysis Tree.
- Configure the following options:User-Mode SamplingmodeSelect to enable the user-mode sampling and tracing collection for hotspots and call stack analysis (formerly known as Basic Hotspots). This collection mode uses a fixed sampling interval of 10ms. If you need to change the interval, click theCopybutton and create a custom analysis configuration.Hardware Event-Based SamplingmodeSelect to enable hardware event-based sampling collection for hotspots analysis (formerly known as Advanced Hotspots).You can configure the following options for this collection mode:
- CPU sampling interval, msto specify an interval (in milliseconds) between CPU samples. Possible values for thehardware event-based samplingmode are0.01-1000.1 msis used by default.
- Collect stacksto enable advanced collection of call stacks and thread context switches.
When changing collection options, pay attention to theOverheaddiagram on the right. It dynamically changes to reflect the collection overhead incurred by the selected options.Show additional performance insightscheck boxGet additional performance insights, such as vectorization, and learn next steps. This option collects additional CPU events, which may enable the multiplexing mode.The option is enabled by default.DetailsbuttonExpand/collapse a section listing the default non-editable settings used for this analysis type. If you want to modify or enable additional settings for the analysis, you need to create a custom configuration by copying an existing predefined configuration.VTunecreates an editable copy of this analysis type configuration.Profiler - Click the
Start button to run the analysis.
To
generate the command line for this configuration, click the
Command Line...
button at the bottom.
View Data
When the data is collected,
VTune
opens it in the
Profiler
Hotspots by CPU Utilization
viewpoint providing the following views for analysis:
- Summary window displays statistics on the overall application execution to analyze CPU time and processor utilization.
- Bottom-up window displays hotspot functions in the bottom-up tree, CPU time and CPU utilization per function.
- Top-down Tree window displays hotspot functions in the call tree, performance metrics for a function only (Self value) and for a function and its children together (Total value).
- Caller/Callee window displays parent and child functions of the selected focus function.
- Platform window provides details on CPU and GPU utilization, frame rate, memory bandwidth, and user tasks (if corresponding metrics are collected).
What's Next
- Identify the most time-consuming function in the grid and double-click it for source analysis.
- Analyze the source of the critical function starting with the highlighted hottest code line and moving further with the Hotspot Navigation options.
- Modify your code to remove bottlenecks and improve the performance of your application.
- Re-run the analysis and verify your optimization with the comparison mode.
For further steps, explore the
Insights
section provided in the
Summary
window. This section contains information on your target performance against metrics collected in addition to standard hotspots metrics. If there are any performance issues detected, the
VTune
flags such a metric value and provides an insight on potential next steps to fix the problem.
Profiler
Information provided by Hotspots analysis is important for tuning serial applications and it is still useful for tuning the serial sections of parallel applications. The Hotspots analysis data helps you understand what your application is doing and identify the code that is critical to tune. For parallel applications running on multi-core systems you may need additional analyses: Threading or HPC Performance Characterization.