User Guide

Contents

Microarchitecture Exploration View

Explore the
Intel® VTune™
Profiler
Microarchitecture Exploration viewpoint for the PMU analysis based on the top-down microarchitecture analysis method that uses key hardware metrics organized by execution categories so that you could easily identify what portion of the pipeline is responsible for the majority of execution time.
When the Microarchitecture Exploration analysis (formerly known as General Exploration) is complete, the
VTune
Profiler
opens the Microarchitecture Exploration viewpoint. The hierarchy of event-based metrics in this viewpoint depends on your hardware architecture. For example, starting with the Intel microarchitecture code name Ivy Bridge, the
VTune
Profiler
analyzes execution categories based on the Top-Down Microarchitecture Analysis Method:
Identifying Where Cycles Are Spent
The four leaf categories serve as high-level performance metrics in the Microarchitecture Exploration viewpoint.
Each metric is an event ratio defined by Intel architects and has its own predefined threshold.
VTune
Profiler
analyzes a ratio value for each aggregated program unit (for example, function). When this value exceeds the threshold and the program unit has more then 5% of CPU time from collection CPU time, it signals a potential performance problem and highlights such a value in pink.
To interpret the performance data provided during the hardware event-based sampling analysis, you may follow the steps below:

Learn Metrics and Define a Performance Baseline

In the Microarchitecture Exploration viewpoint, click the
Summary
tab to switch to the Summary window.
The first section displays the summary statistics on the overall application execution per hardware-related metrics measured in Pipeline Slots or Clockticks. Metrics are organized by execution categories in a list and also represented as a µPipe diagram. To view a metric description, mouse over the help icon :
Microarchitecture Exploration: Summary Window
In the example above, mousing over the
L1 Bound
metric displays the metric description in the tooltip.
A flagged metric value signals a performance issue for the whole application execution. Mouse over the flagged value to read the issue description:
You may use the performance issues identified by the
VTune
Profiler
as a baseline for comparison of versions before and after optimization. Your primary performance indicator is the Elapsed time value.
Grayed out metric values indicate that the data collected for this metric is unreliable. This may happen, for example, if the number of samples collected for PMU events is too low. In this case, when you hover over such an unreliable metric value, the
VTune
Profiler
displays a message:
You may either ignore this data, or rerun the collection with the data collection time, sampling interval, or workload increased.
By default, the
VTune
Profiler
collects Microarchitecture Exploration data in the
Detailed
mode. In this mode, all metric names in the Summary view are hyperlinks. Clicking such a hyperlink opens the
Bottom-up
window and sorts the data in the grid by the selected metric. The lightweight
Summary
collection mode is limited to the Summary view statistics.

Identify Hardware Issues

To view hardware issues per a program unit, switch to the Bottom-up pane. Each row represents a program unit and percentage of time used by this unit. Program units that take more than 5% of the CPU time are considered as
hotspots
. By default, the
VTune
Profiler
sorts the data in the descending order by Clockticks and provides the hotspots at the top of the list.
Most of the columns in the
Bottom-up
pane represent a hardware performance metric.
VTune
Profiler
calculates a metric based on the formula provided by Intel architects. Mouse over the column header to read the metric description. By default, metric values are represented as numbers. You can change the representation mode with the
Show Data As
context menu option.
The right pane displays a context summary for the selected function. Analyze per-function hardware metrics and their visual representation on the µPipe diagram to estimate the contribution of this particular function to the overall performance.
Each metric has a threshold value. If the metric value exceeds the threshold and the program unit is a hotspot, the
VTune
Profiler
highlights this value in pink as performance-critical. Mouse over each pink cell to read a description of the issue and recommended solution (if any).
Microarchitecture Exploration: Bottom-up Window
In the example above, created on the Intel microarchitecture code name Skylake, the
VTune
Profiler
identified the
sphere_intersect
function as one of the biggest hotspots that took much CPU time.
VTune
Profiler
detected that the back-end portion of the pipeline caused the stalls. For the back-end, the
VTune
Profiler
identified
Memory Bound > L1 Bound
issue as a dominant bottleneck. 14.6% of Clockticks used in this function was stalled missing L1 data cache. This means that if you focus on this function hotspot and optimize it, you can potentially gain ~15% speed-up for this function.
VTune
Profiler
is able to identify the most common types of pipeline bottlenecks. You may go deeper for more details. If the deeper levels of the metrics do not display any data, it means that the
VTune
Profiler
cannot see a dominant bottleneck on the lower level.

Analyze Source

When you identified a critical function, double-click it to open the
Source
/
Assembly
window and analyze the source code.
The
Source
/
Assembly
window displays locator metrics that show what code contributed the most to the issue represented by the metric. For example, if you have the Back-End Bound metric equal to 60% for your function, the source view for this function splits the 60% value across function source lines or instructions to help you identify a source line/instruction with the biggest value contributing the most to the total 60% Back-End Bound metric.
Use the hotspots navigation toolbar buttons to navigate to the biggest hotspot for each locator metric and identify the code to optimize.

What's Next

  • You may view the collected data using the Hotspots viewpoint or run the Hotspots analysis type. Analyzing the source and assembly code for the hotspot function in the Hotspots viewpoint helps identify which instruction contributes most to the poor performance and how much CPU time the hotspot source line takes. Such a code analysis could be useful for the hotspots that do not show any issues in the sub-metrics but do show problems at the upper level of metrics (see the example above).
  • Run the comparison analysis to understand the performance gain you obtained after your optimization.
  • You may create your custom analysis configuration and monitor events you are interested in.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804