When the application exits, the Intel® VTune™ Amplifier finalizes the results and opens the Hardware Issues viewpoint. To interpret the collected data and understand where you should focus your tuning efforts for the specific hardware, do the following:
For a detailed tuning methodology based on the events and metrics in General Exploration, visit https://software.intel.com/en-us/articles/processor-specific-performance-analysis-papers.
The screenshots and execution time data provided in this tutorial are created on a system with 4 CPU cores, based on the Intel microarchitecture code name Haswell. Your data may vary depending on the number and type of CPU cores on your system.
The screenshots and execution time data provided in this tutorial are created for a sample code compiled with Intel® C++ Compiler. Your data may vary depending on the compiler you use.
Understand the Event-based Metrics
The Elapsed time metric shows the wall time from the beginning to the end of the collection. Treat this metric as your basic performance baseline against which you will compare subsequent runs of the application. The goal of your optimization is to reduce the value of this metric. All other metrics in this section are hardware event ratios provided by Intel architects. Mouse over the icon to see the metric description and formula used for the metric calculation. VTune Amplifier highlights metrics values that exceed the threshold set for the corresponding metric. Such a value highlighted in pink signifies an application-level hardware issue. The text below a metric with the detected hardware issue describes the issue, potential cause and recommendations on the next steps, and displays a threshold formula used for calculation. Mouse over the truncated text to read a full description.
Identify the Hardware Issues
Click the Bottom-up tab to open the Bottom-up window and see how each program unit performs against the event-based metrics. Each row represents a program unit and percentage of the CPU cycles used by this unit. Program units that take more than 5% of the CPU time are considered hotspots. This means that by resolving a hardware issue that, for example, took about 20% of the CPU cycles, you can obtain 20% optimization for the hotspot.
By default, the VTune Amplifier sorts data in the descending order by Clockticks and provides the hotspots at the top of the list. The metric values for event ratios show up as numbers and/or bars. To change the data format, right-click a column and select Show Data As > format.
You see that the
multiply1 function is the most obvious hotspot in the
matrix application. It has the highest event count (Clockticks and Instructions Retired events) and most of the hardware issues were also detected during execution of this function.
Mouse over a column header with an event-based metric name to see the metric description. Mouse over a highlighted cell to read the description of the hardware issue detected for the program unit.
CPI Rate is high (>1). Potential causes are memory stalls, instruction starvation, branch misprediction, or long-latency instruction. To define the cause for your code, explore other metrics in the Bottom-up window.
The Back-End Bound metric describes a portion of the pipeline where the out-of-order scheduler dispatches ready uOps into their respective execution units, and, once completed, these uOps get retired according to program order. Identify slots where no uOps are delivered due to a lack of required resources for accepting more uOps in the back-end of the pipeline. Stalls due to data-cache misses or stalls due to the overloaded divider unit are examples of back-end bound issues.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804