This document describes the traditional scenario for using the Roofline feature of Intel® Advisor.
- Intel® Advisor can be executed as a standalone GUI tool, as an integrated Microsoft* Visual Studio plug-in on Windows*, or on the command line. If you plan to use the command line or standalone GUI, you should run advixe-vars.bat (on Windows) or source advixe-vars.sh (on Linux) to set up the environment variables.
Important! If you're using Intel® Advisor 2017 Update 1, make sure you've set the ADVIXE_EXPERIMENTAL=roofline environment variable before running collection commands or starting Microsoft* Visual Studio or the Advisor standalone GUI.
In Intel® Advisor 2017 Update 2 and later releases, this step is no longer necessary.
- Start Advisor. To run the standalone GUI use advixe-gui.exe or the advixe-gui command, then create an Advisor project.
- Configure your Project Properties. In Visual Studio, you can configure project settings by pressing the toolbar button. In the standalone GUI, the Project Properties window will open automatically when a project is created.
In Intel® Advisor 2017 Update 1, you will need to make sure the checkbox “Collect information about FLOPS…” is checked on the “Survey Trip Count Analysis” project settings page.
In Intel® Advisor 2017 Update 2, simply ensuring the "FLOPS" checkbox under the "Find Trip Counts and FLOPS" section of the workflow is checked is sufficient.
- To open a vectorization workflow, just click on the toolbar button in the standalone GUI, or the in the Microsoft* Visual Studio integrated version.
Note for users of Intel® Advisor 2017 Update 1: If your vectorization workflow does not contain a "collect roofline" button, depicted below, then you did not complete step 2 successfully.
- Collect the roofline data. There are two ways to do this:
- Press the "Collect roofline" button. Advisor will collect Roofline information for you. Your application will be executed twice: first to collect loop timings using the very lightweight "Survey" analysis, and then to collect FLOPS and calculate Arithmetic Intensity using the "Trip Counts and FLOPS" analysis. The FLOPS analysis usually takes 3-4x times as long to collect the data.
- You could also perform those two analyses manually by running "Survey Target" and then "Find Trip Counts and FLOPS" from the vectorization workflow, or on the command line.
Note for users of Intel® Advisor 2017 Update 1: The environment variable from step 2 must be set before running the data collections. Otherwise, no roofline data will be present when opening the results, even if the environment variable is set before opening the GUI or Microsoft* Visual Studio.
- In the "Survey and Roofline" tab you will see the Roofline information for your application.
- On the chart you can see different rooflines available on your machine: memory/cache bounds and compute bounds. Those rooflines are obtained dynamically by running a small benchmark prior to running your application. Memory/cache rooflines define a performance ceiling if the data cannot fit into that particular cache. The compute rooflines show compute performance bounds if scalar, single/double precision vector, or FMA computations are used.
- You can disable/enable rooflines in the toolbox in the upper-right corner of the chart; the icon is the small box with three lines, circled in green above. Hot loops display parameters can be also tuned here.
- The hottest loops in the plot are displayed by default as large and red. As you can see, in the screenshot above, there is a significant performance improvement opportunity for the bottom loop and very little opportunity for the upper one.
- In the bottom-right corner of the chart there is a workload histogram. It visualizes the fraction of execution time each loop takes.
- To see the survey results in a table form you should click on the "Survey" sidebar, circled in red above. By dragging or clicking the white ribbon (circled above in blue) you can configure a view where both survey report and roofline plot are displayed simultaneously, side by side.
- If your application is not threaded you can use single-threaded rooflines by checking the check-box in the top of the chart. There are also zooming controls available.
- Selecting a particular loop on the roofline plot causes that loop's information to be displayed in the tabs of the bottom pane, such as its source code being displayed in the source tab. You also can select loops on the survey report page and then switch to the roofline page, where the same loop will be highlighted.
- The bottom pane contains several tabs with various types of information about the loops. Please refer to the Advisor documentation to get additional help on it.
- If you have nested loops in nested routines, changing the filtering mode to “Loops And Functions” can be helpful because only the selftime FLOPS metric is calculated. To analyse FLOPS data for outer loops, all nested loops and functions calls should be carefully reviewed. For more information on this topic refer to the Selftime-based FLOPS computing article.
- For every hot loop in your program, analyse the loop position in the roofline plot. Identify performance gaps and opportunities for each loop. Use other information and recommendations provided by Advisor to improve the performance of your application.
If you have any questions or problems please contact the Advisor team by email at email@example.com.