When the sample application exits, the Intel® VTune™ Amplifier finalizes the results and opens the Hotspots by CPU Usage viewpoint where each window or pane is configured to display code regions that consumed a lot of CPU time. To interpret the data on the sample code performance, do the following:
Understand the basic performance metrics provided by the Basic Hotspots analysis.
The screenshots and execution time data provided in this tutorial are created on a system with 8 CPU cores. Your data may vary depending on the number and type of CPU cores on your system.
Understand the Basic Hotspots Metrics
The CPU Usage Histogram represents the Elapsed time and usage level for the available logical processors.
tachyon_find_hotspots application ran mostly on one logical CPU. If you hover over the highest bar, you see that it spent 38.15336 seconds using one core only, which is classified by the VTune Amplifier as a Poor utilization for a multicore system. To understand what prevented the application from using all available logical CPUs effectively, explore the Bottom-up pane.
Analyze the Most Time-consuming Functions
Click the Bottom-up tab to explore the Bottom-up pane. By default, the data in the grid is sorted by Function. You may change the grouping level using the Grouping drop-down menu at the top of the grid.
initialize_2D_buffer function took the maximum time to execute,22.746 seconds, and had the longest poor CPU utilization (red bars). This means that the processor cores were underutilized most of the time spent on executing this function.
initialize_2D_buffer function in the grid and explore the data provided in the Call Stack pane on the right. The Call Stack pane displays full stack data for each hotspot function, enables you to navigate between function call stacks and understand the impact of each stack to the function CPU time. The stack functions in the Call Stack pane are represented in the following format:
Analyze Performance per Thread
If you change the grouping level in the Bottom-up pane from Function/Call Stack to Thread/Function/Call Stack, you see that the
initialize_2D_buffer function belongs to the
tachyon_find_ho thread. To get detailed information on the thread performance, explore the Timeline pane.
Threads area that shows the distribution of CPU time utilization per thread. Hover over a bar to see the CPU time utilization in percent for this thread at each moment of time. Green zones show the time threads are active.
The Timeline analysis identifies that the
_start thread was actively utilizing CPU resources most of the time. The CPU Usage tooltip shows that CPU time values are about 100% whereas the maximum CPU time value for a system with 8 cores is 800%. This means that the processor cores were under-utilized for most of the time spent on executing the