GPU Roofline Insights Perspective from Command Line
- Collect OpenCL™ kernels timings and memory data using the Survey analysis with GPU profiling.
- Measure the hardware limitations and collect floating-point and integer operations data using the Characterization analysis with GPU profiling.
Plot a GPU Roofline Chart
- With the shortcut--collect=rooflinecommand:advisor --collect=roofline --project-dir=<project-dir>--profile-gpu [--target-gpu=<address>] [--gpu-sampling-interval=<double>] --<target-application>[<target-options>]
- With two separate commands:advisor --collect=survey --project-dir=<project-dir>--profile-gpu --<target-application>[<target-options>]advisor --collect=tripcounts --project-dir=<project-dir>--profile-gpu --flop [--target-gpu=<address>] [--gpu-sampling-interval=<double>] --<target-application>[<target-options>]
- --profile-gpuis an option to analyze GPU kernels. This option is required for each command.
- --flopis an option to collect data about floating-point and integer operations. This option is required for the--collet=tripcountsstep.
- --target-gpuis a target GPU adapter to collect profiling data. The adapter configuration should be in the following format<domain>:<bus>:<device-number>.<function-number>. Only decimal numbers are accepted. Use this option if you have more than one GPU adapter on your machine. The default is the latest GPU architecture on your machine.To see a list of GPU adapters available on your system, runadvisor --help collectand scroll down to the--target-gpuoption description.
- --gpu-sampling-interval=is an interval (in milliseconds) between GPU samples. By default, it is set to<double>1.
advisor --collect=roofline --project-dir=./advi -–profile-gpu -–target-gpu=0:0:2.0 -- myApplication
View the Results
- Program metrics for all code regions executed on the GPU and loops/functions executed on the CPU, including total execution time, GPU usage effectiveness, and the number of executed operations.
- Preview Roofline charts for CPU and GPU parts of your code. The charts plot an application's achieved performance and arithmetic intensity against the maximum achievable performance for top three dots and total dot, which combines all loops/functions (for CPU) and kernels (for GPU). By default, it shows Roofline for a dominating operations data type (INT or FLOAT). You can switch to a different data type using theFLOAT/INTtoggle.This pane also reports the number of operations transferred per second, bandwidth for different memory levels, and an instruction mix histogram (for GPU only).
- Top five hotspots on CPU and GPU sorted by elapsed time.
- Performance characteristics of how well the application uses hardware resources.
- Information about the analyses executed and platforms that the data was collected on.
- --report-output=is a path and a name for an HTML file to save the report to. For example,<path>/home/roofline.html. This option is required to generate an HTML report.
- --gpuis an option to generate a Roofline chart for GPU kernels. This option is required.
- --data-type=is a type of data to show in the HTML report by default. Available types are<type>float(default) orint. You cannot change the data type after the report is generated.
- Expand thePerformance Metrics Summarydrop-down to view the summary performance characteristics for your application.
- Select memory levels to show dots for from the filter drop-down list on the chart.
- Double-click a dot on the chart to expand it for other memory levels and see roof rulers.
- Hover over a dot to see a detailed tooltip with performance metrics.
- --cache-sourcesis an option to add application source code to the snapshot.
- --cache-binariesis an option to add application binaries to the snapshot.
- <snapshot-pathis a path and a name for the snapshot. For example, if you specify/tmp/new_snapshot, a snapshot is saved in atmpdirectory asnew_snapshot.advixeexpz. You can skip this and save the snapshot to a current directory assnapshot.XXX.advixeexpz