Identify program units that took the most CPU time (hotspots). This viewpoint is available for all analysis results.
To interpret the performance data provided in the Hotspots viewpoint, you may follow the steps below:
Define a Performance Baseline
Start with exploring the Summary window that provides general information on your application execution. Note that the Elapsed time, which includes application time from start to termination, differs from the application CPU time, which is a sum of the active (does not include waiting time) processor time for all the threads that run the application.
Use the Elapsed time value provided in the Summary window as a baseline for comparison of versions before and after optimization. Note that while tuning the application, the Elapsed time tends to decrease whereas the CPU time may increase with adding more threads to the application.
If you ran the Hotspots analysis in the hardware event-based sampling mode, the analysis metrics in the Summary window display the Microarchitecture Usage metric that helps you estimate the code efficiency on the current hardware platform:
If this metric value is flagged as critical, consider running the Microarchitecture Exploration analysis that dives deeper into hardware metrics.
Identify the Hottest Function
Start with the Top Hotspots section in the Summary window to get a list of the most time-consuming functions. Click such a hotspot function to explore its call flow and other related metrics in the Bottom-up view.
By default, the data in the Bottom-up view is sorted in the descending order by the CPU Time providing the most time-consuming functions first. Focus on the functions with the largest CPU time. These are your candidates for optimization.
Expand the CPU Time column to get more details on how effectively the CPU time was used:
Focus your tuning efforts on the program units with the largest Poor value. This means that during the execution of these program units your application underutilized the CPU time. The overall goal of optimization is to achieve Ideal (green ) or OK (orange ) CPU utilization state and shorten the Poor and Over CPU utilization values.
Identify Algorithm Issues
You can identify issues with the calling sequences in your application and improve performance by revising the way functions are called. The following methods to locate potential issues are available:
Top-down Tree pane: Analyze the Total and Self time data for callers and callees of the hotspot function to understand whether this time can be optimized.
Call Stack pane: Identify the highest contributing stack for the program unit(s) selected in the Bottom-up or Top-down Tree panes. Use the navigation buttons to see the different stacks that called the selected program unit(s). The contribution bar shows the contribution of the currently visible stack to the overall time spent by the selected program unit(s). You can also use the drop-down list in the Call Stack pane to view data for different types of stacks.
Stack data is available by default for the user-mode sampling mode. To have this data for the hardware event-based sampling mode, you need to enable the Collect stacks option in the Hotspots analysis configuration.
Double-click the hottest function to view its related source code in the Source/Assembly window. You can open the code editor directly from the Intel® VTune™ Amplifier and edit your code (for example, minimizing the number of calls to the hotspot function).
If you ran the analysis with the default Show additional performance insights option, the Summary view will include the Insights section that provides additional metrics for your target such as efficiency of the hardware usage and vectorization. This information helps you identify potential next steps for your performance analysis and understand where you could focus your optimization efforts.