When the sample application exits, the Intel® VTune™ Amplifier finalizes the results and opens the Hotspots by CPU Usage viewpoint where each window or pane is configured to display code regions that consumed a lot of CPU time. To interpret the data on the sample code performance, do the following:
Understand the basic performance metrics provided by the Basic Hotspots analysis.
The screenshots and execution time data provided in this tutorial are created on a system with 12 CPU cores. Your data may vary depending on the number and type of CPU cores on your system.
Understand the Basic Hotspots Metrics
Start analysis with the Summary window. To interpret the data, hover over the question mark icons to read the pop-up help and better understand what each performance metric means.
The CPU Usage Histogram represents the Elapsed time and usage level for the available logical processors.
tachyon_find_hotspots application ran mostly on one logical CPU. If you hover over the highest bar, you see that it spent 14.320 seconds using one core only, which is classified by the VTune Amplifier as a Poor utilization for a multicore system. To understand what prevented the application from using all available logical CPUs effectively, explore the Bottom-up pane.
Analyze the Most Time-consuming Functions
Click the Bottom-up tab to explore the Bottom-up pane. By default, the data in the grid is sorted by Function. You may change the grouping level using the Grouping drop-down menu at the top of the grid.
Analyze the CPU Time column values. This column is marked with a yellow star as the Data of Interest column. It means that the VTune Amplifier uses this type of data for some calculations (for example, filtering, stack contribution, and others). Functions that took most CPU time to execute are listed on top.
initialize_2D_buffer function took the maximum time to execute,8.031 seconds, and had the longest poor CPU utilization (red bars). This means that the processor cores were underutilized most of the time spent on executing this function.
To get the detailed CPU usage information per function, use the Expand button in the Bottom-up pane to expand the CPU Time column.
You may change the stack representation to a "tree" style by clicking the Change Stack Layout button.
initialize_2D_buffer function in the grid and explore the data provided in the Call Stack pane on the right. The Call Stack pane displays full stack data for each hotspot function, enables you to navigate between function call stacks and understand the impact of each stack to the function CPU time. The stack functions in the Call Stack pane are represented in the following format:
Analyze Performance per Thread
If you change the grouping level in the Bottom-up pane from Function/Call Stack to Thread/Function/Call Stack, you see that the
initialize_2D_buffer function belongs to the
tachyon_find_ho thread. To get detailed information on the thread performance, explore the Timeline pane.
Threads area that shows the distribution of CPU time utilization per thread. Hover over a bar to see the CPU time utilization in percent for this thread at each moment of time. Green zones show the time threads are active.
The Timeline analysis identifies that the
_start thread was actively utilizing CPU resources most of the time. The CPU Usage tooltip shows that CPU time values are about 100% whereas the maximum CPU time value for a system with 12 CPU cores is 1200%. This means that the processor cores were under-utilized for most of the time spent on executing the