This document describes the traditional scenario for using the Roofline feature of Intel® Advisor. The Roofline analysis can be run from the command line, the standalone Advisor GUI, or the integrated Microsoft* Visual Studio* plug-in GUI. Note that using the Roofline in 2017 Update 1 has some additional steps.
advixe-cl -collect=roofline -project-dir=MyResults -- MyExecutable
advixe-cl -collect=survey -project-dir=MyResults -- MyExecutable advixe-cl -collect=tripcounts -flop -project-dir=MyResults -- MyExecutable
advixe-cl -collect=survey -project-dir=MyResults -- MyExecutable advixe-cl -collect=tripcounts -flops-and-masks -project-dir=MyResults -- MyExecutable
Note for MPI Applications: Survey and Trip Counts must be run separately on MPI applications. See this article for more information.
The Roofline chart is highly customizable. Detailed information about the various interface controls can be found in the Advisor User Guide.
On the chart you can see different rooflines available on your machine: memory/cache bounds and compute bounds. Those rooflines are obtained dynamically by running a small benchmark prior to running your application. Memory/cache rooflines define a performance ceiling if the data cannot fit into that particular cache. The compute rooflines show compute performance bounds if scalar, single/double precision vector, or FMA computations are used.
For every hot loop in your program, analyse the loop position in the roofline plot. The hottest loops in the plot are displayed as large and red. Identify performance gaps and opportunities for each loop. Use other information and recommendations provided by Advisor to improve the performance of your application. Selecting a particular loop on the roofline plot causes that loop's information to be displayed in the tabs of the bottom pane, such as the source tab. It also highlights them on the survey report page.
Hint: If you have nested loops in nested routines, changing the filtering mode to “Loops And Functions” can be helpful because only the selftime FLOPS metric is calculated. To analyse FLOPS data for outer loops, all nested loops and functions calls should be carefully reviewed. For more information on this topic refer to the Selftime-based FLOPS computing article.
If you have any questions or problems please contact the Advisor team by email at email@example.com.
Additional Instructions for 2017 Update 1: Before opening the GUI or collecting or viewing data in this version, you must set the environment variable ADVIXE_EXPERIMENTAL=roofline in order to activate the roofline feature, which was still experimental at this point.
Additionally, if collecting data from the GUI, you must check the "Collect information about FLOPS, L1 memory traffic, and AVX-512 mask usage" checkbox in the Trip Counts tab of the Project Properties, because there is no FLOPS checkbox in the workflow interface.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804