User Guide

  • 2021.2
  • 06/24/2021
  • Public Content

Identify GPU-CPU Issues with Graphics Trace Analyzer

Use Intel® GPA Trace Analyzer to analyze efficiency of synchronization, parallelism, command execution, and evaluate workload performance across the CPU and GPU.
Trace Analyzer captures a trace, which is a record of activity on both CPU and GPU during application execution. With trace data, you can analyze CPU-based workloads together with GPU-based workloads within a unified time domain:
  • Correlate GPU and CPU activity and identify whether your application is GPU or CPU bound
  • Identify GPU and CPU application frame rate and how it depends on vertical synchronization
  • Explore the performance of your application over time per selected GPU and CPU metrics
  • Analyze API calls (draw calls, buffer locks, resource updates, presents)
  • Explore application events specified by a user through debug API events and Instrumentation and Tracing Technology API (ITT API) markers
  • Analyze GPU usage per DMA packet on a software queue
  • Explore GPU usage and analyze a software queue for GPU engines at each moment of time
  • Analyze synchronization in multi-context graphics applications (DirectX* 12, Vulkan*).

Capture and Open a Trace

Before the analysis, stop all irrelevant applications that use GPU. You cannot identify performance issues accurately when several apps are competing for the GPU resource.
To capture trace data during the application run, do the following:
Configure Analysis Settings
  1. Launch Graphics Monitor.
  2. Click the
  3. In the
    tab, configure tracing options as needed.
  4. Optionally, configure other analysis settings:
Run Analysis
  1. In the Graphics Monitor Launcher screen, specify an application for analysis.
  2. Choose
    mode from the launch modes drop-down menu.
  3. Click the
    button to launch the application.
  4. Choose one of the following methods to capture the trace data to a file:
    • System Analyzer HUD
      (default). When the capture is complete, the HUD displays a message with the file name or possible errors, if any.
    • System Analyzer
      1. Return to the Graphics Monitor Launcher screen and click the
        Connect System Analyzer
        button next to your application. The button becomes available after you start the application.
      2. If needed, modify the list of metrics to monitor in the System Analyzer window.
      3. Click the
        Capture Trace
        button to capture a trace file. When the capture is complete, the System Analyzer displays a message with the file name or possible errors, if any.
By default, the duration of a trace is five seconds. You can change that in the Trace tab of Graphics Monitor options.
View Collected Data
To view the collected data:
  1. From the Graphics Monitor context menu, launch Graphics Trace Analyzer.
  2. In the Open Trace Capture window, select and open the captured trace file.

View Collected Data

Trace represents the graphics rendering process. Each graphics API call produces commands, then the graphics driver puts them into command buffers, pushes the buffers into a CPU queue, and schedules the commands for execution on the GPU. Main data types that Trace Analyzer collects are the following:
  • GPU queue
    : shows how GPU executes commands forming a frame buffer you see on the screen.
  • Driver queue
    : shows how graphics driver schedules graphics commands for execution on the GPU.
  • Graphics API calls
    : represent the activity of each thread (calls to the graphics API, synchronization function calls, events, tasks annotated by ITT API).
The GPU queue size indicates whether the GPU is busy or idle.
The driver queue shows how many graphics commands are submitted, and how many of them wait for the execution.
Without the right balance, CPU or GPU bottlenecks can appear. To identify them, you can analyze how present calls exist in a queue until finally executed by the GPU. For example, if driver queue size is large and the GPU is busy the whole time, the application might be GPU-bound, which means that the CPU supplies data faster than the GPU can render it. A GPU bottleneck appears.
A trace also presents the following data on the time line:
  • CPU cores
    : show how threads from different processes including your profiled application are executed.
  • Flip queue
    : reflects the relationship between the application present calls, present packages of GPU/CPU queues, and Vertical Synchronization (VSync) event of the monitor.
  • CPU frames
    : show the range containing graphics commands between two successive frames buffer swap calls.
  • CPU and GPU metrics
    : show CPU and GPU performance for the selected metrics and metric set. CPU and GPU metrics help you compare CPU and GPU utilization, and spot problematic areas.
  • Parallel Execution
    (DirectX 11): shows how the driver parallelizes execution of submitted render events.

Identify Performance Issues

Identify a GPU-bound Application
You can identify the GPU-bound game by the following criteria:
  • GPU is busy the whole time and GPU queue has no visible gaps.
  • Driver queue continuously accumulates command buffers waiting for the execution on the GPU. Driver queue size is long.
  • Average command buffer execution time exceeds the desired limit based on the expected FPS rate.
  • CPU metrics have visible gaps, while the GPU is at maximum utilization.
If your application is GPU-bound:
Capture a stream or a frame of a problematic area and analyze rendering performance in-depth using Graphics Frame Analyzer.
Identify a CPU-bound Application
You can identify the CPU-bound game by the following criteria:
  • Hardware queue has visible gaps, indicating that the GPU is not fully busy.
  • GPU engine metrics have visible gaps, while the CPU is at maximum utilization.
  • Driver queue size is long enough.
If your application is CPU-bound:
  • Use Graphics Trace Analyzer tracks with events generated by Debug API, ITT API markup, or Events Tracing for Windows (ETW).
  • Use other CPU-side performance analysis tools offered by Intel. Use Intel® VTune Profiler to find your hotspot and identify issues related to CPU utilization, or use Intel® Advisor for deep focus on threading and vectorization.

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at