Interpret Results

When the application exits, the Intel® VTune™ Amplifier finalizes the results and opens the Hardware Issues viewpoint. To interpret the collected data and understand where you should focus your tuning efforts for the specific hardware, do the following:

  1. Understand the event-based metrics

  2. Identify the hardware issues that affect the performance of your application


Understand the Event-based Metrics

Click the Summary tab to explore the data provided in the Summary window for the whole application performance:

Hardware Issues Viewpoint: Summary Window

The Elapsed time metric shows the wall time from the beginning to the end of the collection. Treat this metric as your basic performance baseline against which you will compare subsequent runs of the application. The goal of your optimization is to reduce the value of this metric. All other metrics in this section are hardware event ratios provided by Intel architects. Mouse over the icon to see the metric description and formula used for the metric calculation. VTune Amplifier highlights metrics values that exceed the threshold set for the corresponding metric. Such a value highlighted in pink signifies an application-level hardware issue. The text below a metric with the detected hardware issue describes the issue, potential cause and recommendations on the next steps, and displays a threshold formula used for calculation. Mouse over the truncated text to read a full description.

Quick look at the summary results discovers that the matrix application has the following issues:

  • CPI (Clockticks per Instructions Retired) Rate

  • LLC Miss

Identify the Hardware Issues

Click the Bottom-up tab to open the Bottom-up window and see how each program unit performs against the event-based metrics. Each row represents a program unit and percentage of the CPU cycles used by this unit. Program units that take more than 5% of the CPU time are considered hotspots. This means that by resolving a hardware issue that, for example, took about 20% of the CPU cycles, you can obtain 20% optimization for the hotspot.

By default, the VTune Amplifier sorts data in the descending order by Clockticks and provides the hotspots at the top of the list. The metric values for event ratios show up as numbers or bars. To change the data format, right-click a column and select Show Data As > format.

Hardware Issues Viewpoint: Bottom-up Window

You see that the multiply1 function is the most obvious hotspot in the matrix application. It has the highest event count (Clockticks and Instructions Retired events) and most of the hardware issues were also detected during execution of this function.


Mouse over a column header with an event-based metric name to see the metric description. Mouse over a highlighted cell to read the description of the hardware issue detected for the program unit.

For the multiply1 function, the VTune Amplifier highlights the same issues that were detected as the issues affecting the performance of the whole application:

  • CPI Rate is high (>1). Potential causes are memory stalls, instruction starvation, branch misprediction, or long-latency instruction. To define the cause for your code, explore other metrics in the Bottom-up window.

  • LLC miss metric shows that about 70% (0.703) of CPU cycles were spent waiting for LLC load misses to be serviced. Possible optimizations are to reduce data working set size, improve data access locality, blocking and consuming data in chunks that fit in the LLC, or better exploit hardware prefetchers. Consider using software prefetchers but beware that they can increase latency by interfering with normal loads and can increase pressure on the memory system.

Next Step

Analyze Code

Einzelheiten zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.