User Guide

  • 2021.3
  • 09/23/2021
  • Public Content
Contents

Identify Issues in Graphics Application Execution with Trace Analyzer

Use Graphics Trace Analyzer to:
  • Identify problem areas in graphics application execution: analyze calls to graphics APIs such as Microsoft DirectX*, Vulkan* or OpenGL* , review user-defined debug markers, threads, queued GPU commands.
  • Perform a high-level analysis of synchronization and parallelism efficiency, relations between dependent threads and objects.
  • Evaluate workload performance across the CPU and GPU.
Graphics Trace Analyzer captures a
trace
, which is a record of activity on both the CPU and the GPU during application execution.

Trace Data

Trace represents data captured from graphics applications. During the rendering process, applications submit hundreds of graphics commands from different threads. The graphics driver interprets the commands for the GPU, puts them into command buffers, pushes the buffers into a CPU queue, and schedules the commands for execution on the GPU, forming a final frame on the screen.
GPU Activity Data
GPU activity data: GPU queue, Driver queue, Parallel Execution track, Flip queue, GPU metrics.
  • GPU queue
    shows how the GPU executes commands forming a final frame on the screen. The GPU queue indicates whether the GPU is busy or idle.
  • Driver queue
    shows how the graphics driver schedules graphics commands for execution on the GPU. The driver queue shows how many graphics commands are submitted, and how many of them are waiting for execution.
  • Parallel Execution
    track shows how the driver parallelizes execution of submitted render commands (draw, clear, dispatch, resource barriers). The track is available for DirectX apps.
  • OpenCL Execution
    track visualizes execution of OpenCL kernels on a GPU or a CPU.
  • Flip queue
    reflects the relationship between the application present calls, present packages of GPU/CPU queues, composition work performed by the Desktop Window Manager (DWM), and Vertical Synchronization (VSync) events. Flip queue data allow you to roughly estimate the frame rendering latency, which includes present, flip, and VSync events.
  • GPU metrics
    show GPU performance for the selected metrics set. Place the metrics track next to the GPU queue to see the correlation between application execution and GPU workload. For example, identify whether the GPU was busy during the processing of a certain package.
CPU Activity Data
CPU activity data: threads, cores, frames, metrics.
  • CPU threads
    track represents the activity of each thread: graphic API calls (draw calls, buffer locks, resource updates, presents), and user-defined debug annotation markers (Microsoft PIX, Instrumentation and Tracing Technology API (ITT API)).
  • CPU cores
    track shows how threads from different processes including your profiled application are executed.
  • CPU frames
    track shows the range containing graphics commands between two successive frames' buffer swap calls.
  • CPU metrics
    show CPU performance for the selected metrics set. CPU and GPU metrics help you compare CPU and GPU utilization, and spot problematic areas.

Identify Performance Issues

Your workflow may look like this:
The proposed workflow focuses on game analysis. The steps may differ if you aim to optimize content creation applications.
Define Performance Goals
Set clear optimization goals based on the style, dynamics of the game, and the hardware your audience might use.
Different game types such as shooters or storytelling games imply different optimization goals, for example:
  • Increase frame rate.
    The more dynamic the game is, the shorter the frames should be. At the same time, if the frame rate of the game is too high, the user may not see some of the rendered frames.
  • Optimize visual content representation.
    Depending on the game type, you may be interested in identifying additional GPU resources for better detailing, for example, for elaborate landscapes or textures.
  • Reduce frame duration for cloud gaming.
    Applications developed for cloud gaming have a restricted budget for each frame. In this case, there is a complex process behind the frame rate: receiving the user input, sending it to the server, processing, frame rendering, compression, sending data over the network, decompression, and displaying the frame on the screen.
Measure Frame Duration
You can use either frame rate or duration as a metric for analysis. For more precise results, start performance profiling with frame duration. The frame duration is measured in milliseconds and shown in curly braces for each frame in the
CPU Frames
track.
If the
CPU Frames
track is not available for your application, estimate the frame duration using present tokens. On the
Driver queue
track, select a range from the right border of a present packet to the right border of the next one. You can see the frame duration on the timeline.
You can estimate the frame duration using present tokens.
Make sure the frame duration is consistent and meets your performance goals. For example:
  • If frame duration is sufficient, you can analyze whether the GPU is optimally utilized and inspect available GPU resources to incorporate more state-of-the-art graphics in the game.
  • If all frames take longer than expected, you can identify whether your application is GPU-bound and inspect issues with Graphics Frame Analyzer.
  • If frame duration varies greatly, you can spot anomalies with Graphics Trace Analyzer: analyze API calls, parallelization, synchronization, ETW events, and debug API markers in more detail.
Analyze a Game with Sufficient Frame Duration
If the frame duration is sufficient and the GPU is loaded with instructions all the time, your game probably utilizes the GPU optimally.
If the frame duration is sufficient and the GPU is not loaded all the time, visible gaps in the GPU queue may indicate the following:
  • Underutilized GPU resources
  • Improper graphics workload balancing
  • Synchronization issues.
First, analyze how graphics workloads are distributed across CPU threads, and check GPU-CPU and GPU-GPU synchronization. For example, in the screenshot below, the gaps in the
GPU
and
Driver
queues indicate that the CPU is waiting for a signal from the GPU to resume processing and prepare the job for the GPU. In this application, the frames are rendered in triplets, and GPU-CPU synchronization increases the duration of the first frame in each triplet nearly fourfold. GPU-CPU synchronization is visualized with green arrows in Graphics Trace Analyzer:
In such cases, check whether there is a good reason for synchronization, whether you can change this and how your improvements will affect the gameplay.
Tip:
Refer to the video "What Do I Do If the GPU Shows Idle Time" to learn how to analyze a game where the GPU is underutilized.
If a GPU queue is not full and synchronization works properly, the GPU has resources that you can use to incorporate state-of-the-art graphics effects in a game without decreasing the frame rate. For example, you can add beautiful post-processing or textures.
Analyze a Game with Insufficient Frame Duration
Consistently long frames may indicate that your game is GPU-bound. You can identify a GPU-bound game by the following criteria:
  • The GPU is busy the entire time and the GPU queue has no visible gaps.
  • The Driver queue continuously accumulates command buffers waiting for execution on the GPU. In this case, the Driver queue size is long.
  • Average DMA buffer execution time exceeds the desired limit based on the expected frame duration.
  • CPU threads are inactive most of the time. The thread activity zone above the
    CPU tread
    track contains green or grey intervals indicating whether the thread was active or inactive during a particular period.
If your application is GPU-bound, capture a stream or a frame of a problematic area and analyze rendering performance in-depth using Graphics Frame Analyzer.
In other cases, when frame duration varies, search for anomalies with Graphics Trace Analyzer:
If your case needs more in-depth analysis, use other CPU-side performance analysis tools offered by Intel:

Capture and Open a Trace

Before the analysis, stop all irrelevant applications that utilize GPU. You cannot identify performance issues accurately when several apps are competing for the GPU resource.
To capture trace data during the application run, do the following:
Configure Analysis Settings
  1. Launch Graphics Monitor.
  2. Click the
    Options
    button on the lower left of the Graphics Monitor configuration window.
  3. In the
    Trace
    tab, configure tracing options as needed: set trace duration, choose data domains, enable data capturing on application startup.
  4. Optionally, configure other analysis settings:
  5. Exit the
    Options
    screen by clicking the
    Back
    button on the upper left of the screen.
Run Analysis
  1. In the Graphics Monitor Launcher screen, specify an application for analysis.
  2. Choose
    Trace
    mode from the launch modes drop-down menu on the lower right.
  3. Click the
    Start
    button to launch the application. A window with the game running will open.
Capture a Trace
Choose one of the following methods to capture the trace data to a file:
  • HUD (recommended)
    In the window with the target app running, press
    Ctrl+Shift+T
    (default). When the capture is complete, a message is displayed just below the HUD with the filename or possible errors, if any.
    Hot keys may interfere with game keyboard usage. In this case, you can customize shortcuts.
  • System Analyzer
    1. Return to the Graphics Monitor configuration window and click the
      Connect System Analyzer
      button next to your application. The button becomes available after you start the application.
    2. Click the
      Capture Trace
      button to capture a trace. When the capture is complete, the System Analyzer displays a message with the filename or possible errors, if any.
  • System View Trace
    Use system view capture from System Analyzer, if:
    • The methods above do not work for you
    • You are interested in system data rather than detailed application execution data such as API calls or debug regions. In this mode, only system data, for example, GPU utilization, is available for analysis.
    1. Click the
      Options
      button in the Graphics Monitor.
    2. In the
      Trace
      tab, set
      Trace System View in System Analyzer
      toggle to ON.
    3. Open System Analyzer.
    4. Start your application:
      • From Graphics Monitor, hover over the application and click the
        Run
        button.
      • Run the application from a file manager.
    5. Return to the System Analyzer and click the
      Capture Trace
      button.
View Collected Data
To view the collected data:
  1. From the Graphics Monitor context menu, launch Graphics Trace Analyzer.
  2. In the Open Trace Capture window, select and open the captured trace file.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.