Tutorial

Interpret Microarchitecture Exploration Analysis Result Data

When the sample application exits, the
Intel® VTune™
Profiler
finalizes the results and opens the
Microarchitecture Exploration
viewpoint, which provides a high-level performance overview of the interaction between the application and the available hardware.
To interpret the data on the sample code performance, do the following:

Understand the Event-based Metrics

Start with the
Summary
pane for an overview of application performance.
Microarchitecture Exploration uPipe diagram showing large memory bound pipe leading to small retiring pipe
The µPipe diagram provides a graphical representation of CPU microarchitecture metrics showing inefficiencies in hardware usage. Treat the diagram as a pipe with an output flow equal to the ratio:
Actual Instructions Retired/Possible Maximum Instruction Retired
(pipe efficiency). The µPipe is based on CPU pipeline slots that represent hardware resources needed to process one micro-operation. Usually there are several pipeline slots available on each cycle (pipeline width). If a pipeline slot does not retire, this is considered a stall and the µPipe diagram represents this as an obstacle making the pipe narrow.
See the Microarchitecture Pipe page of the online User Guide for a more detailed explanation of the µPipe.
In this case, the
Memory Bound
metric is high, so only a small fraction (approximately
11%
) of pipeline slots are being retired. Hover over each section for a description and percentage of the total pipeline or refer to the metrics on the left.
The hierarchy of event-based metrics in the Microarchitecture Exploration viewpoint depends on your hardware architecture. Each metric is an event ratio defined by Intel architects and has its own predefined threshold.
VTune
Profiler
analyzes a ratio value for each aggregated program unit (for example, function). When this value exceeds the threshold, it signals a potential performance problem.
Microarchitecture Exploration summary view with flagged metrics including CPI Rate and Back-end Bound
The
Elapsed Time
section shows metrics related to hardware event ratios for your hardware. Hover over the flagged metrics to get a description of the issues, possible causes, and suggestions for resolving the issue. This result shows issues with both
CPI Rate
(Clockticks per Instructions Retired rate) and
Back-End Bound
. Both issues were identified as possible causes for slow execution by the original
Hotspots
analysis. In the expanded
Back-End Bound
section, there are issues with the application being
Memory Bound
, which matches the µPipe diagram. The
Bottom-up
pane can help identify the program units responsible for the memory issues.

Identify Hardware Usage Bottlenecks

Switch to the
Bottom-up
pane to see how each program unit performs against the event-based metrics. Each row represents a program unit and percentage of the CPU cycles used by this unit. Program units that take more than 5% of the CPU time are considered hotspots.
By default, the
VTune
Profiler
sorts data in the descending order by CPU Time and provides the hotspots at the top of the list. The metric values for event ratios show up as numbers and/or bars.
Bottom-up tab showing grouping by Function/Call Stack with multiply1 function showing highest CPU time, clockticks, CPI rate, and Back-end bound values
As was identified when running the
Hotspots
analysis, the
multiply1
function is the most obvious hotspot in the
matrix
application. It has the highest event count (
Clockticks
and
Instructions Retired
events) and most of the hardware issues were also detected during the execution of this function.
The
Back-End Bound
metric describes a portion of the pipeline where the out-of-order scheduler dispatches ready µOps into their respective execution units, and, once completed, these µOps get retired according to program order. Identify slots where no µOps are delivered due to a lack of required resources for accepting more µOps in the bad-end of the pipeline. Stalls due to data-cache misses or stalls due to the overloaded divider unit are examples of back-end bound issues.
Expand the
Back-End Bound
column to discover that the code is memory bound with the most percentage of stalls occurring on the main memory (DRAM). Hover over the highlighted cells to learn more about optimization opportunities.
Grid view showing Back-End Bound and Memory Bound columns expanded with flagged issues in DRAM Bound column

Analyze Code

Double-click the
multiply1
function to open the
Source
window and analyze the source code.
multiply.c source file with source code and key metrics shown
When you drill-down from the grid to the source view, the
VTune
Profiler
automatically highlights the code line that has the highest event count. In the
Source
pane for the
multiply1
function, you see that line 51 took the most of the Clockticks event samples during execution and was also highlighted as the top hotspot line in the Hotspots result. This code section multiplies matrices in the loop but ineffectively accesses the memory. Expand the
Back-End Bound
column to learn more. Focus on this section and try to reduce the memory issues.
For advanced users looking for a different way to identify and diagnose memory issues in your application, try running the
Memory Access
analysis type. An example of how to define which data structure induces inefficient memory access is available from the
VTune
Profiler
Cookbook
.

Next Step

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804