Roofline Analysis

Purpose and Usage | Run | Chart Controls | Chart Data | What Do I Do Next?

Roofline Analysis Purpose and Usage

To add a Roofline chart to the Survey Report, run a Roofline analysis that helps you visualize actual performance against hardware-imposed performance ceilings, as well as determine the main limiting factor (memory bandwidth or compute capacity), thereby providing an ideal roadmap of potential optimization steps.

Use the Roofline chart to answer the following questions:

  • What is the maximum achievable performance with your current hardware resources?

  • Does your application work optimally on current hardware resources?

  • If not, what are the best candidates for optimization?

  • Is memory bandwidth or compute capacity limiting performance for each optimization candidate?

The Intel Advisor basic roofline model, the Cache-Aware Roofline Model (CARM), offers self data capability. The Intel Advisor Roofline with Callstacks feature extends the basic model with total data capability:

  • Self data = Memory access, FLOPs, and duration related only to the loop/function itself and excludes data originating in other loops/functions called by it

  • Total data = Data from the loop/function itself and its inner loops/functions

The total-data capability in the Roofline with Callstacks feature can help you:

  • Investigate the source of loops/functions instead of just the loops/functions themselves.

  • Get a more accurate view of loops/functions that behave differently when called under different circumstances.

  • Uncover design inefficiencies higher up the call chain that could be the root cause of poor performance by smaller loops/functions.

Run a Roofline Analysis

In the Vectorization Workflow tab, click the Intel Advisor control: Run analysis control under Run Roofline.

Outcome: The Intel Advisor executes the target application twice to:

  • Measure the hardware limitations of your machine and collect loop/function timings using the Survey analysis.

  • Collect FLOP and integer operations data, and memory traffic data, using the Trip Counts and FLOP analysis - this collection can take three to four times longer than the Survey analysis.

After both analyses are complete, the Intel Advisor adds a Roofline chart to the Survey Report.

To implement the Roofline with Callstacks feature:
Intel Advisor: Roofline with Callstacks

  1. Run the Roofline analysis with the With Callstacks checkbox enabled. Upon completion, the Intel Advisor displays a Roofline chart.

  2. Enable the With Callstacks checkbox in the Roofline chart.

Roofline Chart Controls

There are several controls to help you show/hide the Roofline chart:
Intel Advisor: Roofline Chart & Survey Report

1

Click to toggle between Roofline chart view and Survey Report view.

2

Click to toggle to and from side-by-side Roofline chart and Survey Report view.

3

Drag to adjust the dimensions of the Roofline chart and Survey Report.

There are several controls to help you focus on the Roofline chart data most important to you, including the following.
Intel Advisor: Roofline controls

1

  • Select one or more loops/functions by tracing a rectangle with your mouse.

  • Zoom in and out by tracing a rectangle with your mouse. You can also zoom in and out using your mouse wheel.

  • Move the chart left, right, up, and down.

  • Undo or redo the previous zoom action.

  • Reset to the default zoom level.

  • Export the chart as a dynamic and interactive HTML or SVG file that does not require the Intel Advisor viewer for display. Use the arrow to toggle between the options.

2

  • Adjust rooflines to see practical performance limits if an application uses fewer threads than available cores.

  • Build roofs for single-threaded applications (or for multi-threaded applications configured to run single threaded, such as one thread-per-rank for MPI applications. (You can use Intel Advisor filters to control the loops displayed in the Roofline chart; however, the Roofline chart does not support the Threads filter.)

3

  • Toggle the display between floating-point, integer operations, and mixed operations (floating-point and integer).

  • Enable the display of Roofline with Callstacks additions to the Roofline chart.

4

Display Roofline chart data from other Intel Advisor results or non-archived snapshots for comparison purposes.

Use the drop-down toolbar to:

  • Load a result/snapshot and display the corresponding filename in the Compared Results region.

  • Clear a selected result/snapshot and move the corresponding filename to the Ready for comparison region.

    Note: Click a filename in the Ready for comparison region to reload the result/snapshot.

  • Save the comparison itself to a file.

    Note: The arrowed lines showing the relationship among loops/functions do not reappear if you upload the comparison file.

Click a loop/function dot in the current result to show the relationship (arrowed lines) between it and the corresponding loop/function dots in loaded results/snapshots.

Intel Advisor: Roofline Comparison

5

  • Color Roofline chart zones to show if loops/functions are essentially:

    • Memory bound - If so, consider improving memory access patterns or using cache blocking.

    • Compute bound - If so, consider using a different instruction set architecture (ISA) or faster instructions, such as fused multiply-add (FMA) instructions.

    • Compute bound with memory roofs.

  • Adjust the default scale setting to show:

    • The optimal scale for each Roofline chart view

    • A scale that accommodates all Roofline chart views

  • Change the visibility and appearance of roofline representations (lines).

  • Change the appearance of loop/function weight representations (dots).

  • Manually fine-tune roof values to set hardware limits specific to your code.

6

Zoom in and out using numerical values.

7

Hover your mouse over an item to display metrics for it.

If you hover your mouse over a loop/function dot, the Roofline chart displays two blue projection dots with metrics that show potential performance if you optimize the loop/function to reach the next roofline and the maximum achievable roofline. (If the next roofline and maximum achievable roofline are the same, the Roofline chart displays only one blue projection dot.)

Click a loop/function dot to:

  • Outline it in black.

  • Display metrics for it.

  • If Roofline with Callstacks is enabled, display the corresponding, navigable, color-coded callstack.

  • Display corresponding data in other window tabs.

You can also click an item in the Callstack pane to flash the corresponding loop/function dot in the Roofline chart.

If Roofline with Callstacks is enabled, click a loop/function dot Intel Advisor: Collapse control control to collapse descendant dots into the parent dot, or click a loop/function dot Intel Advisor: Expand control control to show descendant dots and their relationship via visual indicators to the parent dot.

Right-click a loop/function dot or a blank area in the Roofline chart to perform more functions, such as:

  • Further simplify the Roofline chart by filtering out (temporarily hiding a dot), filtering in (temporarily hiding all other dots), and clearing filters (showing all originally displayed dots).

  • Copy data to the clipboard.

8

If Roofline with Callstacks is enabled, show/hide the Callstack pane.

9

Display the number and percentage of loops in each loop weight representation category.

Roofline Chart Data

The Roofline chart plots an application's achieved performance and arithmetic intensity against the machine's maximum achievable performance:

  • Arithmetic intensity (x axis) - measured in number of floating-point operations (FLOPs) and/or integer operations (INTOPs) per byte, based on the loop/function algorithm, transferred between CPU/VPU and memory

  • Performance (y axis) - measured in billions of floating-point operations per second (GFLOPS) and/or billions of integer operations per second (GINTOPS)

In general:

  • The size and color of each Roofline chart dot represent relative execution time for each loop/function. Large red dots take the most time, so are the best candidates for optimization. Small green dots take less time, so may not be worth optimizing.

  • Roofline chart diagonal lines indicate memory bandwidth limitations preventing loops/functions from achieving better performance without some form of optimization. For example: The L1 Bandwidth roofline represents the maximum amount of work that can get done at a given arithmetic intensity if the loop always hits L1 cache. A loop does not benefit from L1 cache speed if a dataset causes it to miss L1 cache too often, and instead is subject to the limitations of the lower-speed L2 cache it is hitting. So a dot representing a loop that misses L1 cache too often but hits L2 cache is positioned somewhere below the L2 Bandwidth roofline.

  • Roofline chart horizontal lines indicate compute capacity limitations preventing loops/functions from achieving better performance without some form of optimization. For example: The Scalar Add Peak represents the peak number of add instructions that can be performed by the scalar loop under these circumstances. The Vector Add Peak represents the peak number of add instructions that can be performed by the vectorized loop under these circumstances. So a dot representing a loop that is not vectorized is positioned somewhere below the Scalar Add Peak roofline.

  • A dot cannot exceed the topmost rooflines, as these represent the maximum capabilities of the machine; however, not all loops can utilize maximum machine capabilities.

  • The greater the distance between a dot and the highest achievable roofline, the more opportunity exists for performance improvement.

In the following Roofline chart representation, loops A and G (large red dots), and to a lesser extent B (yellow dot far below the roofs), are the best candidates for optimization. Loops C, D, and E (small green dots) and H (yellow dot) are poor candidates because they do not have much room to improve or are too small to have significant impact on performance.
This is a visual model, not an actual screenshot, of the Roofline Chart

The following Roofline chart representation shows some of the added benefits of the Roofline with Callstacks feature, including:

  • A navigable, color-coded Callstack pane that shows the entire call chain for the selected loop/function, but excludes its callees

  • Visual indicators (caller and callee arrows) that show the relationship among loops and functions

  • The ability to simplify dot-heavy charts by collapsing several small loops into one overall representation

    Loops/functions with no self data are grayed out when expanded and in color when collapsed. Loops/functions with self data display at the coordinates, size, and color appropriate to the data when expanded, but have a gray halo of the size associated with their total time. When such loops/functions are collapsed, they change to the size and color appropriate to their total time and, if applicable, move to reflect the total performance and total arithmetic intensity.


Intel Advisor: Roofline with Callstacks

See Also

For more complete information about compiler optimizations, see our Optimization Notice.