Intel® Graphics Performance Analyzers 2013 for Windows* OS Getting Started Guide

Contents

About This Document
About Intel® GPA for Windows* OS
Downloading Intel GPA
Installing Intel GPA
Preparing Your Game for Analysis by Intel GPA
An Overview of the Analysis and Optimization Process
System Analysis with Intel GPA: Is My Application CPU or GPU Bound?
What is the Best Way to Analyze My Application on a Netbook or an Ultrabook™ Device?
How Do I Understand What Is Happening Within a Frame?
How Do I Analyze Tasks Across Both the CPU and GPU?
More Information

About This Document

Intel® Graphics Performance Analyzers (Intel® GPA) is a suite of tools for graphics analysis and optimization that can help you make games and other graphics-intensive applications run even faster. Intel® GPA supports the latest platforms based on Intel® Core™ and Intel® Atom™ processor families for applications developed for either Windows* OS or Android* OS.

This document discusses some of the key concepts of the tools that are useful for people who are familiar with performance analysis and optimization, but have not used Intel GPA before. This document has a task-based approach, covering some typical analysis and optimization workflows you will encounter.

This document discusses Intel GPA usage for optimizing DirectX*-based PC games running on Windows* OS. For information on how to start using Intel GPA to analyze Android* OS applications running on Intel® Atom™ based phones, refer to the Intel GPA Getting Started Guide for Android.

For more information about the product, please refer to the Intel GPA Online Help documentation, or check out the Intel GPA Home Page.

Back to Top

About Intel® GPA for Windows* OS

Use Intel GPA for Windows* OS to:

  • collect and display hardware and software metrics data from your application in real time
  • conduct a number of Microsoft* Direct3D pipeline experiments to isolate graphics bottlenecks quickly
  • understand the high-level performance profile of your graphics application, and determine whether your application is CPU bound or GPU bound
  • create frame capture files that contain the entire Microsoft DirectX* context used to render the selected 3D frame
  • understand the performance of your application at the frame level, render target level, and draw call level
  • enable detailed analysis and "what if" optimization experiments without the need to recompile or rebuild your application
  • visualize the execution profile of the various tasks in your code over time

The following table summarizes the supported platforms. For details, please see Intel GPA Release Notes.

Target System
(the system where your game runs)
Analysis System
(your development system)
Target Application
(types of supported applications running on the target platform)
Windows* 7 OS Windows* 7/8 OS Microsoft* DirectX* 9/9Ex, 10.0/10.1, 11.0
Windows* 8 OS Windows* 7/8 OS Microsoft* DirectX* 9/9Ex, 11.0
or Windows* 8 Store Applications

Intel GPA on Windows* OS contains the following components:

Intel® GPA Monitor

This tool enables you to launch applications and other Intel GPA tools, automatically detect launched applications, configure various Intel GPA options for the Intel® GPA System Analyzer HUD, and capture frame and trace files.

Intel® GPA System Analyzer HUD (Heads-Up Display)

This tool displays hardware and software metrics data from your application in real time, and enables experimentation via Microsoft* Direct3D pipeline state overrides. Use this tool to understand the high-level performance profile of your graphics application, and determine whether your application is CPU bound or GPU bound.

  • If your application is GPU-bound, follow up with the Intel® GPA Frame Analyzer to analyze a captured frame with a low frame rate.
  • If your application is CPU-bound, follow up with the Intel® GPA Platform Analyzer to visualize the interaction between various tasks within your application on both CPU and GPU.

Intel® GPA System Analyzer

Intel GPA System Analyzer, which is a remote version of the Intel GPA System Analyzer Heads-Up Display (HUD), enables client/server analysis over a network connection, enabling minimum overhead on platforms with limited resources such as netbooks. Like the Intel GPA System Analyzer HUD, this tool helps you understand the high-level performance profile of your graphics application, and determine whether your application is CPU-bound or GPU-bound. Intel GPA System Analyzer provides some additional features over the HUD, including the ability to graph more metrics simultaneously, and a convenient drag-and-drop interface for adding new metrics.

Intel® GPA Frame Analyzer

This tool provides a detailed view of a captured frame file created with the Intel GPA System Analyzer or the Intel GPA System Analyzer HUD. Frame files contain the entire Microsoft DirectX* context used to render the selected 3D frame. This tool enables you to understand the performance of your application at the frame level, render target level, and even at the individual draw call level. With this tool you can make detailed analysis and “what if” optimization experiments without the need to recompile or rebuild your application.

Intel® GPA Platform Analyzer

This tool visualizes the execution profile of the various tasks in your code over time. Intel GPA collects real-time trace data during the application run and provides a system-wide picture of the code execution on the various CPU cores and the GPU in your system. The tool infrastructure automatically aligns clocks across all cores in the entire system so that you can analyze CPU-based workloads together with GPU-based workloads within a unified time domain. To effectively use this tool, you need to insert calls to the tracing API within your code in order to specify key logical tasks, such as collision detection.

See the rest of this document for more details on these tools and how to use them.

Back to Top

Downloading Intel GPA

If you have not already downloaded and installed the product, now it is a great time to do this! This will enable you to follow along as the document introduces key concepts within the product.

To get a copy of Intel® GPA, go to the Intel GPA Home Page and click the Download button. For analyzing games running on Windows* OS, select and download the Windows OS version of the product. The other versions of the product only apply to analyzing workloads running on Android* OS.

Installing Intel GPA

Once you have downloaded the self-extracting .exe file:

  • Open the self-installing executable file
  • Run setup.exe located in the top-level directory of the extracted files

Install the product from an Administrator account.

If you use the product in a client/server configuration, install Intel GPA on both systems.

For better experience with Intel GPA, install the latest graphics drivers and BIOS for your system. If you are using Intel® HD 2000/3000 graphics or Intel® HD 2500/4000 graphics, the Intel GPA Monitor will check the graphics driver version installed on your system, and let you know if a newer version is available.

Back to Top

Preparing Your Game for Analysis by Intel GPA

You do not need to modify your code or special libraries, if you just want to:

  • Determine whether your game is CPU or GPU bound (use the Intel GPA System Analyzer HUD or the Intel GPA System Analyzer), or
  • Figure out what is happening within a specific frame of your game (use the Intel GPA Frame Analyzer)

To visualize the execution profile of the various tasks in your code over time in the Intel GPA Platform Analyzer, you need to instrument your application with Intel® Tracing Technology (ITT) API: just add calls within your game code in order to designate logical tasks in your game. For detailed information regarding the ITT API, refer to the Intel® ITT API Programmer's Guide.

Back to Top

An Overview of the Analysis and Optimization Process

You should have a goal in mind before you start analyzing and optimizing your game. For example, you might want 30 frames per second on a 1280x1024 screen with specific settings (such as including fog and detailed shadows). Intel GPA can help you identify performance bottlenecks within your game, such as excessive vertex shader use or bandwidth limitations.

Intel GPA is also a visualization tool that enables you to:

  • View the performance and visual effect of various "what-if" experiments within the Intel GPA tools, such as modifying the shader code or changing the DirectX* state of various draw calls.
  • Measure detailed performance both before and after specific code changes. For example, Intel GPA Frame Analyzer can show detailed metrics for individual portions of the rendering pipeline down to the draw-call level of your application code.

You are recommended to analyze your application as follows:

  • Use the Intel GPA System Analyzer or Intel GPA System Analyzer HUD to determine whether your game or graphics application is CPU or GPU bound.
  • Use one or more of the Intel GPA tools to identify areas for improvement.
  • Change your game code.
  • Re-run Intel GPA to verify that your changes achieve the expected performance improvements.
  • If you still have not met your optimization and visual quality goals, then re-analyze the game with Intel GPA to pinpoint additional "hot spots" for further analysis and optimization.

You may also want to analyze and optimize for specific target platforms. The most common mainstream platforms are laptops or Ultrabook™ devices, where the install base is quite large. On the other hand, hardcore gamers are more likely to buy the highest performing system independent of the cost, so you will want to enable every possible visual effect on those platforms to help increase sales of your game.

Intel GPA can help you understand the best optimizations for various graphics platforms. Once you have optimized the basic scene rendering for your game on one system (such as a laptop), many of those optimizations should carry over to other platforms. Many game developers use Intel GPA to help test their games on various platforms, and for each platform they enable/disable certain features (such as detailed terrain features or additional interactive game elements) until they arrive at the best possible game-playing experience for their customers on that platform.

While Intel GPA can help you with the analysis and optimization tasks, you need to understand how your game partitions work across both the CPU and the GPU, and also the specific device capabilities of the device(s) you are targeting. For example, a good resource for game developers targeting Intel graphics systems are the Intel Graphics Developer's Guides, as they provide details on the architecture of the various GPU's, describe how various Microsoft DirectX* functions are implemented, and provide tips and tricks that allow you to get the most from these systems.

The rest of this document will examine the individual Intel GPA tools that can help you achieve your goals. All test cases described in this document use the sample program gpasample.exe that you can find in the install area for Intel GPA.

Back to Top

System Analysis with Intel GPA: Is My Application CPU or GPU Bound?

Let's "Start From the Top"

A "top-down" approach to analysis and optimization suggests that you first of all determine whether your problems are with the CPU or the GPU. Understanding your performance issues at this highest level is extremely important; otherwise you may spend lots of time optimizing parts of your game that end up being secondary to much larger performance issues.

A good rule of thumb is to have a balance between the utilization of the CPU and the GPU resources – do not have the CPU running at 100% while the GPU is only 20% utilized, and vice-versa.

To find out whether you application is CPU bound or GPU bound, use the Intel GPA System Analyzer HUD.
NOTE: You can also use the Intel GPA System Analyzer running in a client/server mode for similar purposes. This section focuses on using Intel GPA System Analyzer HUD only.

To answer the CPU or GPU question, the Intel GPA System Analyzer HUD works "behind the scenes" while your game is running:

  • The tool displays various CPU, DirectX*, and GPU metrics, enabling you to see whether the value of a metric is higher or lower than you expect.
    For example, FPS (frames per second) is important to most developers, and metrics such as the Aggregated CPU Load and GPU EU's Active can help determine whether these resources are underutilized.
    NOTE: "EU" is an abbreviation for "execution unit", the general-purpose GPU processors in recent Intel chipsets.
  • The tool enables you to perform various real-time, "what if" experiments to help identify various bottlenecks without changing the source code.
    For example, use the Null Hardware override (only available on devices with Intel® Processor Graphics) to simulate an infinitely fast GPU to see whether your game is GPU-bound.

Since the Intel GPA System Analyzer HUD displays its results as an overlay on your game, you need to start your game from within the Intel® GPA Monitor.

Using the Intel GPA System Analyzer HUD to Analyze Your Application

  1. If not already running, start the Intel GPA Monitor from the Microsoft Windows* Start Menu.
  2. Double-click the Intel GPA Monitor icon in the taskbar notification area, or click it and select Analyze Application....The Analyze Application dialog will be displayed.
  3. Select gpasample.exe (you can find this sample program in the install area for the product) in the list of applications and click Run.
    This will run the Intel GPA Sample application with the Intel GPA System Analyzer HUD, enabling you to quickly analyze the performance. Use the keyboard shortcut Ctrl+F1 to switch between HUD modes:
    1. HUD Off
    2. Display FPS only
    3. Display Performance Graphs
    4. Display keyboard shortcut Help
  4. Cycle through the various HUD display modes until you see both the FPS and graphs for four key metrics appearing as an overlay on your game, as shown here:
    sa_hud.png

NOTE: Some games may already be using a particular keyboard-shortcut combination. If a keyboard shortcut does not appear to be working with your game, you can assign new keyboard shortcuts in the Intel GPA Monitor > Profiles dialog box > Keyboard Shortcuts tab. For details, refer to Customizing the Keyboard-shortcut Assignments in the Intel GPA Online Help.

Configuring Metrics to Be Displayed

Intel GPA can display a wide variety of CPU, GPU, DirectX*, and driver metrics. Metrics available for display are dependent upon your specific graphics device. Intel GPA enables you to select up to four metrics that are overlaid on your running application.

To configure the metrics for display, do the following:

  1. In the Analyze Application dialog box click Manage Profiles... or right-click the Intel® GPA Monitor icon in the taskbar notification area and select Profiles....
  2. Go to the HUD Metrics tab to select the metrics to be collected.
  3. Double-click a desired metric to add it to the list of metrics to display in the Intel GPA System Analyzer HUD. To add several metrics, select them in the list and click the Add button. You can select up to four metrics that are overlaid on your running application.
  4. To change which metrics are displayed, double-click a metric in the list of displayed ones or select a metric/several metrics and click the Remove button, then repeat the previous step.

Performing Various 'What-If' Experiments

With the Intel® GPA System Analyzer HUD you can quickly perform "what if" experiments (also known as overrides) of various portions of the graphics pipeline in order to isolate one or more performance bottlenecks in your application. Override modes provide a method for high-level performance analysis and visual debugging.

These override modes operate "behind the scenes" within the graphics driver to modify one or more of the render states of the graphics pipeline to show the effect of that phase on the rendering process, without requiring any code changes in your game. If using a certain override mode improves performance significantly, then that overridden mode might be a performance bottleneck, and therefore warrants further analysis.

For example, as we mentioned before the Null Hardware override simulates an infinitely fast GPU – if using this override significantly increases your FPS, your game is limited by the GPU.

Other overrides can help isolate where in the rendering pipeline your bottlenecks are: try out Texture 2x2 to see whether your textures are causing memory bandwidth issues ("thrashing"), or even Simple Pixel Shader to check if your shader code is too complex. To use an override, follow this procedure:

  1. While your game and the Intel GPA System Analyzer HUD are running, use Ctrl+F1 to cycle through the HUD display modes until you see the list of keyboard shortcuts available.
  2. Use one of these shortcuts (such as Ctrl+Alt+H to use Null Hardware), and check whether the FPS improves.

The metrics and override modes that are available depend upon your system. For more details see the section titled Metrics Descriptions in the Intel GPA Online Help. This section also shows how to use different metrics and overrides, describing guidelines for reasonable values and providing suggestions for fixing performance bottlenecks that Intel GPA uncovers. Another great resource for tips and tricks is Practical Game Performance Analysis Using Intel® Graphics Performance Analyzers.

What is the Best Way to Analyze My Application on a Netbook or an Ultrabook™ Device?

If you need to analyze your application running on a netbook, or other platforms with limited resources, you can use the Intel GPA System Analyzer (network mode) instead of the Intel GPA System Analyzer HUD. This tool enables client/server analysis over a network connection, which has the following advantages:

  • both the Intel GPA System Analyzer and application can be viewed full-screen on their respective machines
  • accuracy of the measurements is not affected; when both the Intel GPA System Analyzer and the application run on the same system, the Intel GPA System Analyzer requires CPU and GPU cycles, which can affect the target application and alter the accuracy of the measurements

Analyzing Your Application in Network Mode

To analyze your application in network mode:

  1. Make sure that Intel® GPA is installed on both the analysis and target systems.
  2. Launch the Intel® GPA Monitor on the target system.
  3. Launch your graphics application on the target system via the Intel GPA Monitor, as described here.
    Note that you can also configure the Intel GPA Monitor settings to automatically start analyzing any launched application.
  4. Launch the Intel GPA System Analyzer on the analysis system.
  5. Type the IP address or host name of the target system you want to connect to.
  6. Click Connect.
  7. Select the application from the list.

Inspecting Metrics

Intel GPA System Analyzer enables you to graph several metrics simultaneously and easily add metrics to charts.

To manage a metric and add it to charts, drag and drop the metric from the left panel:

  • to the chart
  • to the space between two charts
  • above the first chart
  • below the last chart

sa_ini.png

To add multiple metrics to a single chart, hold Ctrl while dragging the metric. You can add up to four metrics to a chart.

To delete a chart, click the gray cross in the top left corner of the chart.

To create a frame capture file for further deeper analysis, click the button sa_fc_b.png or use the keyboard shortcut, which is Ctrl+Shift+C by default.

To create a trace capture file for further deeper analysis, click the button sa_tc_b.png or use the keyboard shortcut, which is Ctrl+Shift+T by default.

To export metric values to the CSV file for post-processing analysis, click the button sa_csv.png or use the keyboard shortcut, which is Ctrl+Shift+E by default.

You can also quickly perform "what if" experiments of various portions of the graphics pipeline in order to isolate one or more performance bottlenecks in your application. To apply a desired override, select it from the list in the left-hand side on the Intel GPA System Analyzer window, and see the results of the experiment in the graphs. For more details, refer to Performing Various "What-If" Experiments.

Back to Top

What Can I Do Next?

If all went well, you met your performance and playability goals by using the Intel GPA System Analyzer HUD to identify and resolve issues with your game. If you still have not met your overall objectives, here is what you could do:

  • If you believe that your game is GPU-bound, use the Intel GPA Frame Analyzer to perform a "deep dive" into exactly what is happening within a specific frame – see where in the rendering pipeline your game is spending its time, even down to the render target or draw call level of detail. Read this section of this article to learn how to use this tool.
  • If you believe that there are issues in balancing your workloads across the CPU and GPU, use the Intel GPA Platform Analyzer, as this tool uses a trace data file to provide you with a task-based overview of your game across both the CPU and GPU domains. Read this section of this article to learn how to use this tool.
  • If you know that your game is CPU-bound, you might try using some CPU-specific tools to help improve your performance on the CPU. In particular, with multiple cores now being the rule rather than the exception, parallelizing your code can have a big impact on overall CPU performance. For this situation, see this site for information on parallelizing your code.

Refer to the other sections of this document for more details on using these tools for further analysis and optimization of your game.

Back to Top

How Do I Understand What Is Happening Within a Frame?

As mentioned before, the Intel GPA Frame Analyzer uses a frame capture file to let you understand exactly what is happening within your game on a frame-by-frame basis. For key features available in this tool refer to the Intel GPA Overview.

Capturing a Frame for Analysis

Creating the frame capture file is pretty easy. You can use one of two ways:

Way 1: Use a keyboard shortcut to capture a specific frame:

  1. Start your game as mentioned in the previous section.
  2. When you see a scene in your game you would like to analyze, use the Ctrl+Shift+C keyboard shortcut to capture a frame.

You can capture multiple frames without stopping and restarting the game.

Way 2: Use an option within the Intel GPA Monitor interface to capture a frame whenever a certain "trigger" occurs (such as when the frame rate drops below 10 FPS).

With Intel GPA you get the ability to automatically trigger when to capture a frame, thereby allowing much finer control over which frame capture files you save for later analysis. To define a trigger for frame capture, you need to create a profile for your game, as described in the Intel GPA Online Help. Rather than go through the entire process, let’s quickly modify the Default profile to add a trigger:

  1. In the Analyze Application dialog box click Manage Profiles... or right-click the Intel® GPA Monitor icon in the taskbar notification area and select Profiles....
  2. In the Profiles window select the Default profile.
  3. Go to the Trigger tab and check the Enable trigger check box to enable the triggers option.
  4. Select a metric you want to use and the condition (here we selected "Aggregated CPU Load" equal to "95").
  5. Select what to do when the trigger occurs (here we selected "Frame and Trace Capture" and "Pause").
  6. Click OK.
    The next time you use Intel GPA and the "Textures Profile" to attach to your application, when you reach "Aggregated CPU Load" 95 Intel GPA will capture that frame and trace then pause your game/application.

Analyzing a Specific Frame

Once you have saved one or more frame capture files, you need to launch the Intel GPA Frame Analyzer tool:

  • Run the Intel® GPA Frame Analyzer from the Windows* Start Menu, or
  • Click the Intel GPA Monitor icon in the taskbar notification area and select Frame Analyzer

The initial dialog box asks which machine you are connecting from and which frame capture file you want to analyze – a thumbnail for each capture file helps you select which frame to analyze.

Overview of Key Features

Once you have selected a file, the frame capture file will be loaded into the tool:

fa_ini_gui.png

As shown above, the GUI includes the following major user interface areas:

  • Menu Bar panel elements enable you to open frame capture files, change the Metric Configuration, reset metrics selection, select ergs, view metric values range, and help you find problem areas in your frame.
  • Visualization Settings Tool Bar helps configure the erg display shown in the Visualization Panel
  • Visualization Panel displays the sequence and duration of captured events in graphical format
  • Scene Overview Panel shows a list of all ergs or regions and also metrics for a single erg.
  • Tabs panel provides numerous options for understanding the composition of your frame. Read further for details.
  • Ergs Information displays information on the selected ergs.
  • Render Target Viewer Panel shows the list of all render targets associated with the erg selection set and with the final rendered frame buffer.

The term erg refers to any work item within that frame that potentially renders pixels, which includes draw calls, clears, and other graphics API calls.

Another key concept of the Intel GPA Frame Analyzer GUI is the erg selection set. When you set the Visualization panel to the Erg Graph view and select one or more items in the Visualization panel or the Scene Overview panel, the results of that selection are immediately synchronized across all panels of the interface. Also, if you have selected a subset of the work items, performing experiments or other modifications in the Tabs panel affects only those items in the erg selection set.

Intel® GPA Frame Analyzer supports the following views of the bar chart:

  • Erg Graph - shows metrics associated with each erg
  • Render Targets Graph - shows metrics associated with sets of ergs, grouped by render target.

On traditional rendering architectures, you can toggle between the two views. On tile-based rendering architectures, only the Render Targets Graph view is available. This is because work is processed in batches, and per-erg metrics are not meaningful in tile-based rendering.

The Erg Visualization panel displays a bar chart representation of your frame – by default the x-axis represents the sequencing of the work items (with the leftmost being the earliest), and the y-axis represents the actual time spent rendering each work item. To select an item, just click on it, or drag across multiple items to select a contiguous range. Here is a quick tip to help quickly spot trouble areas: set the x-axis to GPU Duration and the y-axis to GPU Breakdown, as shown here:

In the Scene Overview panel, you will see the ergs represented in a hierarchical tree view form, grouped logically by the frame level, region level, and draw call level. Select items by clicking on one or more of the check boxes on the far left-hand side. You can also sort this list – click on the column header for any item to sort by that item, making it easier to select the topmost items for whatever criteria you want.

The Render Target Viewer provides visual feedback for the items you have selected. You can optionally display a larger version of the image in a separate window for a more view.

The real "workhorse" of this tool, the Tabs panel, provides numerous options for understanding the composition of your frame. Within the Tabs Panel you can try different experiments with the selected ergs without having to change your game's code. Here is an overview of each of these tabs:

  • Frame Overview: displays metrics for the entire frame (not just your erg selection set); useful for seeing detailed activity within the GPU for the entire frame.
  • Details: displays metrics for the items within your erg selection set; useful for seeing activity for the subset of ergs you have selected.
  • Texture: displays useful information about the textures used by your erg selection set; perform "what-if" experiments by clamping the MIP level to a specific value.
  • State: displays the state of various Microsoft DirectX* parameters for your erg selection set; change the values of these parameters on the fly to see the performance and visual impact of your changes
  • Shaders: displays all shader code used within your erg selection set; allows for "on-the-fly" edits to the shaders to see the effects of these changes.
  • Experiments: lets you select various experiments that help you drill-down to identify rendering issues in your frame; experiments include 2x2 textures, 1x1 scissor rectangle, simple pixel shader, and disable erg(s).
  • Pixel History: displays information regarding which ergs "touched" a specific pixel (as selected from the Render Target Viewer); see whether there are issues with the rendering order or the overall complexity of the rendered objects.
  • Geometry: displays the pre-transform vertices, provides different visualization modes, and shows vertex position values, and displays performance metrics for individual pipeline stages.
  • StretchRect: displays the surfaces before and after aStretchRect()call.
  • API Log: displays a summary of all Microsoft DirectX* API calls for the items in your erg selection set; especially useful for tracking down "expensive" ergs by seeing what API calls are within one or more ergs.

Once you run any of the experiments listed above, select the Details tab to see the value of GPU metrics collected before and after the experiment – all changes will be highlighted in green or red to show the increase/decrease in performance. The image below shows an example of the before/after results, using the 2x2 Textures experiment on an entire frame:fa_window.png
With only a 2.2% performance increase by using simple 2x2 Textures in gpasample.exe, it is unlikely that any texture optimization will significantly improve the rendering speed.

What Can I Do Next?

In this section we have shown just a small sample of the experiments available with the Intel® GPA Frame Analyzer. The best way to become familiar with the full range of capabilities is to "play" with a sample frame or two while using the Intel GPA Online Help as a guide (we recommend starting with gpasample.exe, as this example can be found in your Intel GPA installation directory).

Back to Top

How Do I Analyze Tasks Across Both the CPU and GPU?

The Intel GPA Platform Analyzer is an instrumentation-based tool, and helps you visualize the execution profile of the tasks over time across both the CPU cores and GPU of your system. This tool is especially critical now that the latest generation of mainstream compute platforms, even laptops, typically contain four or more CPU's. Therefore, to maximize the game-playing experience for your customers you will need to optimize the game across all the compute and rendering resources, and this tool can help you achieve this goal.

Instrumenting your Application with Intel® Tracing Technology API

To get the most out of the Intel GPA Platform Analyzer tool you need to add API calls in your game code to designate logical tasks in your game. This will help you visualize the relationship between tasks in your game, including when they start and end relative to other CPU and GPU tasks.

At the highest level a task is a logical group of work executing on a specific thread, and may correspond to any grouping of code within your game that you consider important. The latest version of the Intel GPA Platform Analyzer makes the task of marking up your code easy, simply identify the beginning and end of each logical task with __itt_task_begin and __itt_task_end calls. For example, you may wish to separately track "smoke rendering" or "detailed shadows", so you would add API tracking calls to the code modules for these specific features.

To get started, you will need to use the following API calls:

  • __itt_domain_create(): Cretaes a domain required in most Intel ITT API calls. You need to define at least one domain.
  • __itt_string_handle_create(): Creates a string handles for identifying your tasks. String handles are more efficient to store in the trace than strings.
  • __itt_task_begin(): Marks the beginning of a task.
  • __itt_task_end(): Marks the end of a task.

See the Intel® ITT API Programmer's Guide for detailed information on the ITT API calls.

Creating a Trace Capture File of the Task Activity

Once you have instrumented your code, the next part of the process is to create a trace capture file of the task activity:

  1. Start your game with Intel GPA as described in this section.
  2. Capture a trace file by using the default Ctrl+Shift+T keyboard shortcut. Note that you can also use "profiles" within the Intel GPA Monitor to automatically trigger the capture of a trace file based upon the occurrence of an event; the Intel GPA Online Help file has all the details on defining these triggers. Refer to Capturing a Frame for Analysis for information on how to add a trigger.
    Once you have captured your trace file, exit your game.
  3. Start the Intel GPA Platform Analyzer.
  4. The initial dialog box asks which machine you are connecting from and which trace capture file you want to analyze – a thumbnail for each trace file helps you select which trace to analyze.
  5. The tool displays all instrumented tasks in the Timeline View:

For this example, we have used the "Smoke" demo since this example is multi-threaded (and therefore is a better example for demonstrating the tool's features).

What Information Can You Get?

  1. Check the Task Groups panel at the top of the window. It displays a timeline of all tasks grouped according to the task relationships specified in your ITT API calls.
  2. Check the Task Timeline below the Task Groups panel. It shows a color-coded list of the all tasks for your game.
  3. Select a particular task in the Task Timeline, and you'll see detailed information about that task displayed in bottom of the the windows - in Statistics and Summary tabs.
  4. If you select either the DX CPU or DX GPU tasks, you can view CPU or GPU metrics for that task from the Metadata.
  5. To quickly zoom in/out, use the scroll button on your mouse to drill down to a specific task or view an overall of all your tasks.

Determining why a Task Seems to Run Longer than You Expect It To

There are a few options for determining why a task seems to run longer than you expect it to:

  1. Add more detailed task instrumentation within any long task you are interested in to determine what portion of the long task is consuming the most time.
  2. Mark up any code loops that might be lengthy with __itt_task_begin and __itt_task_end ITT APIs.
  3. Use __itt_metadata_add API at the end of a loop including an integer counter to denote how many times a loop was executed.
  4. Re-run the application and take a new capture trace. You can iterate this process as many times as needed in a drill-down fashion.

What Can I Do Next?

The Intel GPA Platform Analyzer helps you visually identify synchronization and load balancing issues in your game. Game engines with multiple discrete tasks, such as collision detection and terrain generation, need a tool such as the Intel GPA Platform Analyzer to help visualize the dynamics of resource sharing. The Intel GPA Online Help file contains detailed information about the tool, as well as suggestions for pinpointing synchronization and resource utilization issues. Another good resource for those interested in game design techniques is the Intel White Paper titled Designing the Framework of a Parallel Game.

Back to Top

More Information

To learn more about Intel GPA, be sure to check out the help documentation, which you can access through this link. Also see videos and other product documentation available from the Intel® GPA Home Page. If you have any questions, issues, or feedback on the product, contact your Intel representative or visit the Intel GPA Support Forum.

Back to Top

* Other names and brands may be claimed as the property of others.

For more complete information about compiler optimizations, see our Optimization Notice.

Comments

Abhishek 81's picture

Great Information,Thanks.

Abhishek Nandy