User Guide


Configure GPU Analysis from Command Line

Use the
option for configuring
Intel® VTune™
to profile applications that use a Graphics Processing Unit (GPU) for rendering, video processing, and computations. GPU analysis monitors overall GPU activity (graphics, media, and compute), collects Intel® HD Graphics and Intel® Iris® Graphics hardware metrics, and then shows this data correlated with CPU processes and threads.
The following knobs are supported for GPU analysis:
Knob Name
Supported Analysis Types
enable-gpu-usage=true | false
runss, runsa
Analyze frame rate and usage of Processor Graphics engines.
gpu-counters-mode=none |overview | global-local-accesses | compute-extended | full-compute | render-basic
gpu-hotspots, graphics-rendering, gpu-offload, runss, runsa
Analyze performance data from Processor Graphics based on the GPU Metrics Reference.
  • overview
    - track general GPU memory accesses such as Memory Read/Write Bandwidth, GPU L3 Misses, Sampler Busy, Sampler Is Bottleneck, and GPU Memory Texture Read Bandwidth. These metrics can be useful for both graphics and compute-intensive applications.
  • global-local-accesses
    - include metrics that distinguish accessing different types of data on a GPU: Untyped Memory Read/Write Bandwidth, Typed Memory Read/Write Transactions, SLM Read/Write Bandwidth, Render/GPGPU Command Streamer Loaded, and GPU EU Array Usage. This metrics are useful for compute-intensive workloads on the GPU.
  • compute-extended
    - analyze GPU activity on the Intel processor code name Broadwell. This metrics set is disabled for other systems.
  • full-compute
    - collect both
    metrics with the option enabled to analyze all types of EUs array stalled/idle issues in the same view.
  • render-basic
    (preview) - collect Pixel Shader, Vertex Shader, and Output Merger metrics.
This option is available only for supported platforms with the Intel Graphics Driver installed.
gpu-sampling-interval=<value in us>
gpu-hotspots, runss, runsa
Set the interval between GPU samples between 10 and 1000 microseconds. Default is 1000us. An interval of less than 100us is not recommended.
enable-gpu-runtimes=true | false
gpu-hotspots, runss, runsa
Capture the execution time of OpenCL™ kernels and Intel Media SDK programs on a GPU, identify performance-critical GPU computing tasks, and analyze the performance per GPU hardware metrics.
OpenCL kernels analysis is currently supported for Windows and Linux target systems with Intel HD Graphics and Intel Iris Graphics. Intel® Media SDK Program Analysis Configuration is supported for Linux targets only and should be started with root privileges.
Example 1: Running Analysis for an Intel Media SDK Application
This example starts
as root and launches the GPU Compute/Media Hotspots analysis for an Intel Media SDK application running on Linux:
vtune -collect gpu-hotspots -knob enable-gpu-runtimes=true -r quadrant_r001 -- BitonicSort
To analyze a remote Linux target from the Windows system, the same example looks as follows:
vtune -target-system=ssh:user1@ -collect gpu-hotspots -knob enable-gpu-runtimes=true -r quadrant_r001 -- BitonicSort.exe
Example 2: Running Analysis with OpenCL Kernels Tracing
Perform GPU Compute/Media Hotspots or custom analysis, enabling the
knob to analyze GPU usage of a processor graphics engine, using the Overview
counter set, which is available only on a supported platform with an Intel Graphics Driver installed. Enable tracing of OpenCL kernels execution with the
For example, to run GPU Compute/Media Hotspots analysis, collect GPU hardware metrics and trace OpenCL kernels on the
application (
is the option of the application), enter:
vtune -collect gpu-hotspots -knob gpu-counters-mode=overview -knob enable-gpu-runtimes=true -- BitonicSort -g

GPU Analysis on Android* System

You can enable GPU analysis for algorithm analysis types on Android systems with Intel HD Graphics and Intel Iris Graphics by using the following knobs:
  • enable-gpu-usage
    to analyze frame rate and usage of Intel HD Graphics and Intel Iris Graphics engines based on ftrace events
  • gpu-counters-mode
    to analyze performance data from Intel HD Graphics and Intel Iris Graphics based on the preset counter sets
  • gpu-sampling interval
    to specify a data collection interval between GPU samples
This example runs the GPU Compute/Media Hotspots analysis and monitors GPU usage.
host>./vtune -collect gpu-hotspots -target-system=android -r quadrant_r001 -target-process -knob enable-gpu-usage=true -knob gpu-counters-mode=overview

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804