User Guide

Contents

Hotspots View

Identify program units that took the most CPU time (
hotspots
). This viewpoint is available for all analysis results.
To interpret the performance data provided in the Hotspots viewpoint, you may follow the steps below:

Define a Performance Baseline

Start with exploring the Summary window that provides general information on your application execution. Note that the Elapsed time, which includes application time from start to termination, differs from the application CPU time, which is a sum of the active (does not include waiting time) processor time for all the threads that run the application.
Use the Elapsed time value provided in the
Summary
window as a baseline for comparison of versions before and after optimization. Note that while tuning the application, the Elapsed time tends to decrease whereas the CPU time may increase with adding more threads to the application.
If you ran the Hotspots analysis in the
hardware event-based sampling
mode, the analysis metrics in the
Summary
window display the Microarchitecture Usage metric that helps you estimate the code efficiency on the current hardware platform:
If this metric value is flagged as critical, consider running the
Microarchitecture Exploration
analysis that dives deeper into hardware metrics.

Identify the Hottest Function

Start with the
Top Hotspots
section in the
Summary
window to get a list of the most time-consuming functions. Click such a hotspot function to explore its call flow and other related metrics in the Bottom-up view.
By default, the data in the
Bottom-up
view is sorted in the descending order by the CPU Time providing the most time-consuming functions first. Focus on the functions with the largest CPU time. These are your candidates for optimization.
Expand the
CPU Time
column to get more details on how effectively the CPU time was used:
Hotspots by CPU Utilization Viewpoint: Bottom-up Pane
Focus your tuning efforts on the program units with the largest
Poor
value. This means that during the execution of these program units your application underutilized the CPU time. The overall goal of optimization is to achieve
Ideal
(green ) or
OK
(orange ) CPU utilization state and shorten the Poor and Over CPU utilization values.

Identify Algorithm Issues

You can identify issues with the calling sequences in your application and improve performance by revising the way functions are called. The following methods to locate potential issues are available:
  • Top-down Tree pane: Analyze the Total and Self time data for callers and callees of the hotspot function to understand whether this time can be optimized.
  • Call Stack pane: Identify the highest contributing stack for the program unit(s) selected in the
    Bottom-up
    or
    Top-down Tree
    panes. Use the navigation buttons to see the different stacks that called the selected program unit(s). The contribution bar shows the contribution of the currently visible stack to the overall time spent by the selected program unit(s). You can also use the drop-down list in the
    Call Stack
    pane to view data for different types of stacks.
Stack data is available by default for the
user-mode sampling
mode. To have this data for the
hardware event-based sampling
mode, you need to enable the
Collect stacks
option in the Hotspots analysis configuration.

Analyze Source

Double-click the hottest function to view its related source code in the Source/Assembly window. You can open the code editor directly from the
Intel® VTune™
Profiler
and edit your code (for example, minimizing the number of calls to the hotspot function).

What's Next

If you ran the analysis with the default
Show additional performance insights
option, the
Summary
view will include the
Insights
section that provides additional metrics for your target such as efficiency of the hardware usage and vectorization. This information helps you identify potential next steps for your performance analysis and understand where you could focus your optimization efforts.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804