Interpret Microarchitecture Exploration Analysis Result Data

When the sample application exits, the
Intel® VTune™
finalizes the results and opens the
Microarchitecture Exploration
viewpoint, which provides a high-level performance overview of the interaction between the application and the available hardware.
To interpret the data on the sample code performance, do the following:

Understand the Event-based Metrics

Start with the
pane for an overview of application performance.
Microarchitecture Exploration uPipe diagram showing large memory bound pipe leading to small retiring pipe
The µPipe diagram provides a graphical representation of CPU microarchitecture metrics showing inefficiencies in hardware usage. Treat the diagram as a pipe with an output flow equal to the ratio:
Actual Instructions Retired/Possible Maximum Instruction Retired
(pipe efficiency). The µPipe is based on CPU pipeline slots that represent hardware resources needed to process one micro-operation. Usually there are several pipeline slots available on each cycle (pipeline width). If a pipeline slot does not retire, this is considered a stall and the µPipe diagram represents this as an obstacle making the pipe narrow.
See the Microarchitecture Pipe page of the online User Guide for a more detailed explanation of the µPipe.
In this case, the
Memory Bound
metric is high, so only a small fraction (approximately
) of pipeline slots are being retired. Hover over each section for a description and percentage of the total pipeline or refer to the metrics on the left.
The hierarchy of event-based metrics in the Microarchitecture Exploration viewpoint depends on your hardware architecture. Each metric is an event ratio defined by Intel architects and has its own predefined threshold.
analyzes a ratio value for each aggregated program unit (for example, function). When this value exceeds the threshold, it signals a potential performance problem.
Microarchitecture Exploration summary view with flagged metrics including CPI Rate and Back-end Bound
Elapsed Time
section shows metrics related to hardware event ratios for your hardware. Hover over the flagged metrics to get a description of the issues, possible causes, and suggestions for resolving the issue. This result shows issues with both
CPI Rate
(Clockticks per Instructions Retired rate) and
Back-End Bound
. Both issues were identified as possible causes for slow execution by the original
analysis. In the expanded
Back-End Bound
section, there are issues with the application being
Memory Bound
, which matches the µPipe diagram. The
pane can help identify the program units responsible for the memory issues.

Identify Hardware Usage Bottlenecks

Switch to the
pane to see how each program unit performs against the event-based metrics. Each row represents a program unit and percentage of the CPU cycles used by this unit. Program units that take more than 5% of the CPU time are considered hotspots.
By default, the
sorts data in the descending order by CPU Time and provides the hotspots at the top of the list. The metric values for event ratios show up as numbers and/or bars.
Bottom-up tab showing grouping by Function/Call Stack with multiply1 function showing highest CPU time, clockticks, CPI rate, and Back-end bound values
As was identified when running the
analysis, the
function is the most obvious hotspot in the
application. It has the highest event count (
Instructions Retired
events) and most of the hardware issues were also detected during the execution of this function.
Back-End Bound
metric describes a portion of the pipeline where the out-of-order scheduler dispatches ready µOps into their respective execution units, and, once completed, these µOps get retired according to program order. Identify slots where no µOps are delivered due to a lack of required resources for accepting more µOps in the bad-end of the pipeline. Stalls due to data-cache misses or stalls due to the overloaded divider unit are examples of back-end bound issues.
Expand the
Back-End Bound
column to discover that the code is memory bound with the most percentage of stalls occurring on the main memory (DRAM). Hover over the highlighted cells to learn more about optimization opportunities.
Grid view showing Back-End Bound and Memory Bound columns expanded with flagged issues in DRAM Bound column

Analyze Code

Double-click the
function to open the
window and analyze the source code.
multiply.c source file with source code and key metrics shown
When you drill-down from the grid to the source view, the
automatically highlights the code line that has the highest event count. In the
pane for the
function, you see that line 51 took the most of the Clockticks event samples during execution and was also highlighted as the top hotspot line in the Hotspots result. This code section multiplies matrices in the loop but ineffectively accesses the memory. Expand the
Back-End Bound
column to learn more. Focus on this section and try to reduce the memory issues.
For advanced users looking for a different way to identify and diagnose memory issues in your application, try running the
Memory Access
analysis type. An example of how to define which data structure induces inefficient memory access is available from the

Next Step

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.