User Guide

  • 2021.1
  • 03/24/2021
  • Public Content

Identify GPU-CPU Issues with Graphics Trace Analyzer

Analyze efficiency of synchronization, parallelism, and resource usage on CPU and GPU.
Use Graphics Trace Analyzer to capture a trace, which is a detailed record of activity on both CPU and GPU during application execution.
A trace presents the following data on the timeline:
  • Activity on CPU cores, as reported by the kernel. This includes all processes that were executed on the physical/logical cores during trace capture.
  • GPU context and context switches.
  • CPU frames presented on the timeline.
  • Application thread tracks that represent the activity of each thread in time and function calls from these threads. This includes calls to the graphics API, synchronization function calls, events, and tasks annotated by ITT API.
  • GPU metric tracks for the selected metrics and metric set.
With this information, you can use Trace Analyzer to understand if your application is CPU- or GPU- bound, identify bottlenecks and issues related to synchronization and command execution, and estimate CPU and GPU load.

Capture and Open a Trace

  1. Run Graphics Monitor on your system.
  2. In the Graphics Monitor window, specify an application for analysis and choose
    mode from the launch modes drop-down menu.
  3. Click
    to launch the application.
    The application starts running with the System Analyzer Heads-Up Display (HUD) overlay.
  4. Use
    to capture a trace. During the capture, all applied overrides are turned off.
    When the capture is complete, the HUD displays a message with the file name or possible errors, if any.
    By default, trace capture duration is set to five seconds. You can adjust trace duration in the
    tab of Graphics Monitor options.
To open a trace, use one of these methods:
  • Locate the trace in the
    Open File
    dialog box of the Graphics Monitor Configuration window.
  • Alternatively, launch Graphics Trace Analyzer and select your trace from the
    Trace Capture Thumbnails

Perform Platform Analysis

If your application is CPU-bound, you can capture trace data during the application run to perform in-depth platform analysis with respect to the CPU and GPU activity distribution.
Intel® GPA
collects real-time trace data during the application run and provides information on the code execution on the various CPU and GPU cores in your system, so that you can analyze some CPU-based workloads together with GPU-based workloads within a unified time domain.
Intel® GPA
, you can:
  • Explore GPU usage and analyze a software queue for GPU engines at each moment of time
  • Analyze GPU usage per DMA packet on a software queue
  • Analyze API calls (draw calls, buffer locks, resource updates, presents)
  • Correlate CPU and GPU activity and identify whether your application is GPU or CPU bound
  • Explore your application performance for user tasks created with the Intel® ITT API
  • Identify GPU and CPU application frame rate and how it depends on vertical synchronization
  • Explore the performance of your application over time per selected GPU metrics
The typical workflow is as follows:
Configure Platform Analysis
  1. Launch Graphics Monitor using your preferred method.
  2. Open the Graphics Monitor Launcher screen and select
    from the launch method drop-down menu.
  3. Click the
  4. In the
    tab, configure tracing options as needed.
  5. Optionally, configure other analysis settings, such as a set of default GPU/CPU metrics to monitor for your application.
  6. From the Graphics Monitor Launcher screen, navigate to your target application.
Run Platform Analysis
In the Graphics Monitor Launcher screen, click the Start button to launch the application and start tracing.
To capture the trace data to a file, choose one of the following methods:
  • System Analyzer HUD
    To capture trace data, press
  • System Analyzer
    To capture trace data, follow these steps:
    1. From the Graphics Monitor Launcher screen, click the
      Connect System Analyzer
      button next to your application.
    2. If needed, modify the list of metrics to monitor in the System Analyzer window.
    3. Click the
      Capture Trace
      button to capture a trace file.
      By default, the duration of a trace is five seconds. You can change that in the Trace tab of Graphics Monitor options.
View collected data
To view the collected data:
  1. From the Graphics Monitor context menu, launch Graphics Trace Analyzer.
  2. In the Open Trace Capture window, select and open the captured trace file.

Identify a GPU-bound Application

Graphics rendering is a process of submitting commands into a graphics driver. Driver batches submitted commands in command buffers, pushes the buffers into the CPU queue, and schedules the commands for execution on the GPU. The size of a queue indicates whether the GPU is busy or starved. The queue size also shows how many graphics commands are submitted, and how many of them wait for the execution.
To determine whether your application is CPU- or GPU-bound, analyze the GPU engine metrics. If you see gaps, while the CPU is at maximum utilization, then the application is CPU-bound. To the contrary, if there are idling zones in the CPU track, and the GPU queue has no gaps and is fully utilized continuously executing commands submitted by the game, then application is GPU-bound.
You can also measure frame duration, selecting all command packets executed on the GPU within a single frame. You can then estimate the FPS at this point in time by dividing by dividing 1000 milliseconds (one second) by frame duration in milliseconds.

Analyze Synchronization

Using Trace Analyzer, you can identify synchronization issues that may appear in multi-context graphics applications (DirectX* 12, Vulkan*) with multi-threaded rendering. In addition to GPU-side synchronization, you can also analyze synchronization between CPU threads to address some CPU-side performance issues.
Examples of GPU-GPU and GPU-CPU synchronization types you can analyze using Trace Analyzer are:
  • Synchronization between context queues, when a signal from one queue resumes execution of another queue.
  • Synchronization between context queue and CPU thread, when either a signal from a context queue resumes a CPU thread or a signal from a CPU thread resumes execution of context queue.
You can also analyze CPU-side synchronization between CPU threads. On Windows* OS, Trace Analyzer supports synchronization relations highlighting for the following Win32 API functions:
To visualize synchronization events between threads, locate one of these function calls on the trace and click on the bar. An arrow appears, pointing to related synchronization calls.

Next Steps

If you discover that your application is CPU-bound, consider annotating your code with ITT API to visualize tasks that take too long right in Trace Analyzer.
To profile CPU-side issues deeper, consider using other CPU-side performance analysis tools offered by Intel. Use Intel® VTune Profiler to find your hotspot and identify issues related to CPU utilization, or use Intel® Advisor for deep focus on threading and vectorization.
If your application is GPU-bound, capture a stream or a frame of a problematic area and analyze rendering performance in-depth using Graphics Frame Analyzer.

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at