User Guide


GPU Roofline Insights
Perspective from Command Line

To plot a Roofline chart, the
Intel® Advisor
runs two steps:
  1. Collect OpenCL™ kernels timings and memory data using the Survey analysis with GPU profiling.
  2. Measure the hardware limitations and collect floating-point and integer operations data using the Characterization analysis with GPU profiling.
    Intel® Advisor
    calculates compute operations (FLOP and INTOP) as a weighted sum of the following groups of instructions: BASIC COMPUTE, FMA, BIT, DIV, POW, MATH
    Intel Advisor
    automatically determines data type in the collected operations using the
For convenience,
Intel Advisor
has the shortcut
command line action, which you can use to run both Survey and Characterization analyses with a single command. This shortcut command is recommended to run the
GPU Roofline Insights


  1. Configure your system to analyze GPU kernels.
  2. Set
    Intel Advisor
    environment variables
    with an automated script to enable the
    command line interface (CLI).
In the commands below, the options in square brackets (
) are recommended if you want to change what data is collected.

Plot a GPU Roofline Chart

Run the Roofline analysis for GPU using one of the following methods:
  • With the shortcut
    advisor --collect=roofline --project-dir=
    --profile-gpu [--target-gpu=
    ] [--gpu-sampling-interval=
    ] --
  • With two separate commands:
    advisor --collect=survey --project-dir=
    --profile-gpu --
    advisor --collect=tripcounts --project-dir=
    --profile-gpu --flop [--target-gpu=
    ] [--gpu-sampling-interval=
    ] --
  • --profile-gpu
    is an option to analyze GPU kernels. This option is required for each command.
  • --flop
    is an option to collect data about floating-point and integer operations. This option is required for the
  • --target-gpu
    is a target GPU adapter to collect profiling data. The adapter configuration should be in the following format
    . Only decimal numbers are accepted. Use this option if you have more than one GPU adapter on your system. The default is the latest GPU architecture version found on your system.
    To see a list of GPU adapters available on your system, run
    advisor --help collect
    and scroll down to the
    option description.
  • --gpu-sampling-interval=
    is an interval (in milliseconds) between GPU samples. By default, it is set to
The Roofline analysis collects data both for GPU kernels and CPU loops/functions in your application. For kernels running on GPU, the
Intel Advisor
generates a Memory-Level Roofline by default.
If you want to collect advanced data for loops/functions running on CPU, use
Collect GPU Roofline data for a GPU adapter with the address 0:0:2.0:
advisor --collect=roofline --project-dir=./advi -–profile-gpu -–target-gpu=0:0:2.0 -- myApplication

View the Results

Intel Advisor
provides several ways to work with the GPU Roofline results.
View Results in GUI
When you run
Intel Advisor
CLI, a project is created automatically in the directory specified with
. All the collected results and analysis configurations are stored in the
project, that you can view in the
Intel Advisor
To open the project in GUI, you can run the following command:
advisor-gui <project-dir>
If the report does not open, click
Show Result
on the Welcome pane.
You first see a Summary report that includes performance characteristics for code regions in your code. The left side of the report shows metrics for code regions that run on a GPU, the right side of the report shows metrics for code regions that run on a CPU. The report shows the following data:
  • Program metrics for all code regions executed on the GPU and loops/functions executed on the CPU, including total execution time, GPU usage effectiveness, and the number of executed operations.
  • Preview Roofline charts for CPU and GPU parts of your code. The charts plot an application's achieved performance and arithmetic intensity against the maximum achievable performance for top three dots and total dot, which combines all loops/functions (for CPU) and kernels (for GPU). By default, it shows Roofline for a dominating operations data type (INT or FLOAT). You can switch to a different data type using the
    This pane also reports the number of operations transferred per second, bandwidth for different memory levels, and an instruction mix histogram (for GPU only).
  • Top five hotspots on CPU and GPU sorted by elapsed time.
  • Performance characteristics of how well the application uses hardware resources.
  • Information about the analyses executed and platforms that the data was collected on.
View an Interactive HTML Report
To generate an interactive HTML report for the GPU Roofline chart from CLI, run the following command:
advisor --report=roofline --project-dir=
--gpu [--data-type=<type>]
  • --report-output=
    is a path and a name for an HTML file to save the report to. For example,
    . This option is required to generate an HTML report.
  • --gpu
    is an option to generate a Roofline chart for GPU kernels. This option is required.
  • --data-type=
    is a type of data to show in the HTML report by default. Available types are
    (default) or
    . You cannot change the data type after the report is generated.
When you open the report, you see the GPU Roofline chart with the selected configuration. In this report, you can:
  • Expand the
    Performance Metrics Summary
    drop-down to view the summary performance characteristics for your application.
  • Select memory levels to show dots for from the filter drop-down list on the chart.
  • Double-click a dot on the chart to expand it for other memory levels and see roof rulers.
  • Hover over a dot to see a detailed tooltip with performance metrics.
Interactive GPU Roofline HTML report
Save a Read-only Snapshot
A snapshot is a read-only copy of a project result, which you can view at any time using the
Intel Advisor
GUI. To save an active project result as a read-only snapshot:
advisor --snapshot --project-dir=
[--cache-sources] [--cache-binaries] --
  • --cache-sources
    is an option to add application source code to the snapshot.
  • --cache-binaries
    is an option to add application binaries to the snapshot.
  • <snapshot-path
    is a path and a name for the snapshot. For example, if you specify
    , a snapshot is saved in a
    directory as
    . You can skip this and save the snapshot to a current directory as
To open the result snapshot in the
Intel Advisor
GUI, you can run the following command:
You can visually compare the saved snapshot against the current active result or other snapshot results.

Next Steps

Continue to identify performance bottlenecks on GPU. For details about the metrics reported, see Accelerator Metrics.

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at