Visualize actual performance against hardware-imposed performance ceilings (rooflines1)—such as memory bandwidth and compute capacity—to provide an ideal road map of potential optimization steps. This analysis highlights loops that have the most headroom for improvement, which allows you to focus on areas that deliver the biggest performance payoff.

In this illustration, loops A and G are good optimization candidates. Loop B has room to improve but will have less impact. Loops E, C, D, and H are poor choices.

A Roofline chart visually depicts application performance (dots) relative to the hardware limitations (lines).

Get loop-specific recommendations for performance improvement.

The Code Analytics tab provides a more detailed view for each loop.

Visually Track Optimization Progress

Review optimization strategies by comparing multiple analysis results on the same chart. See at a glance how the optimization impacted the FLOPS of each loop.

Integrated Roofline (Technical Preview)

The experimental feature examines each loop at different cache levels and arithmetic intensities to provide precise insights into which cache level causes the performance bottlenecks.

 

Quickly pinpoint memory hierarchy bottlenecks and identify the next optimization steps:

  • Determine which loops are limited by cache
  • Find inefficient access patterns
  • Locate loops that may benefit from vectorization or threading optimizations

Compared to the Cache-Aware Roofline analysis, this feature provides a much more detailed, yet slower, memory traffic analysis.

Learn About Additional Features

Vectorization Optimization

Enable more vector parallelism and improve its efficiency.

Thread Prototyping

Model, tune, and test multiple threading designs.

Build Heterogeneous Algorithms

Create and analyze data flow and dependency computation graphs.

Product and Performance Information

1

Roofline modeling was first proposed in 2009 by University of California at Berkeley researchers Samuel Williams, Andrew Waterman, and David Patterson in Roofline: An Insightful Visual Performance Model for Multicore Architectures.

Reference Cache-Aware Roofline Model (CARM) and Aleksandar Ilic, Frederico Pratas, and Leonel Sousa, Cache-Aware Roofline Model: Upgrading the Loft (IEEE Computer Architecture Letters, January 2014), volume 13, number 1, 21–24.

2

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804