Hello... This article was written using an older version of the Intel GPA product. However, most of the concepts and techniques discussed here are independent of a particular version of Intel GPA. To download the latest release, visit the Intel GPA Home Page.
By Jeff Freeman and Doraisamy Ganeshkumar
To be more godlike, one must appease the masses. Millions of potential gamers are running their systems on Intel® Graphics chipsets. To better reach this mass audience, Gas Powered Games worked with Intel to get its game running more efficiently on integrated graphics.
This case study looks at Gas Powered Games' Demigod using Intel® Graphics Performance Analyzers (Intel® GPA), a utility set for analyzing graphics performance on Intel® Graphics Media Accelerator Series 4 and other supported Intel Graphics chipsets.
Introducing Intel Graphics Performance Analyzers
Intel GPA is a suite of graphics performance optimization tools which enable game developers to quickly and easily identify, isolate, and optimize graphics performance issues for Microsoft DirectX*-based games. The primary tools in the suite are the Intel GPA System Analyzer and the Intel GPA Frame Analyzer.
The Intel GPA System Analyzer is a high-level tool that helps developers assess game performance across the CPU and GPU. Its interactive, real-time displays show various metrics and allows DirectX-level overrides. Key features include:
- Drag-and-drop metric display
- DirectX and graphics driver overrides, including a simple pixel shader, null hardware, and null driver, and many more
- Frame capture and transition to the Intel GPA Frame Analyzer
The Intel GPA Frame Analyzer is an interactive, deep single-frame GPU analysis tool that enables developers to analyze performance at the frame, region, and draw call level. Frame captures are file based and can be shared between developers and different GPUs for analysis. The major features of the frame analyzer are:
- Draw call bar chart visualization. A visualization of any selected metric for each draw call in the frame. The default metric is GPU duration.
- Scene overview. A sortable tree view of performance metrics at the frame level, region level (default region = render target change), and draw call level. All metrics are available in this view.
- Render target viewer. A thumbnail and full-sized view of all render targets associated with the current draw call selection set, including highlighting options for selected draw calls.
- Experiments tab. A set of selectable experiments including a simple pixel shader, 2x2 textures, and 1x1 scissor rect that modifies the current draw call selection set. The performance impact of these changes can be viewed in the bar chart and scene overview.
- Texture tab. A thumbnail and full-sized view of all textures associated with the current draw call selection.
- Shader tab. A shader viewer and on-the-fly editor. Includes the ability to modify a shader using an in-line-edit, cut and paste, and file change. Modifies all shaders within the current draw call selection set. The performance impact of these changes can be viewed in the bar chart and scene overview.
- State tab. A view that displays and allows modification of all DirectX states for the current draw call selection set.
- API log. A chronological view of all DirectX APIs organized by draw call.
- System information. Provides important information from the system that rendered the captured frame, including information about the driver, operation system, DirectX, and GPU versions.
Case Study: Gas Powered Games-Demigod
Intel actively engaged with members of the Intel® Software Partner Program. Participation in the program offered independent software vendors, who develop commercial software applications that use Intel® technology, a portfolio of benefits to support them across the entire product development cycle: from planning and developing to marketing and selling their applications.
This program provided Intel engineering support to a long list of commercial applications through a wide range of work that focuses on performance optimizations-most recently in the multi-core and graphics areas. Games have long been a focus as a high-performance version of software running in the consumer space. The following case study examines one such engagement with game developer Gas Powered Games based in Redmond, Washington. Demigod, its action-oriented and role-playing real-time strategy game, was analyzed using Intel GPA on Intel Graphics with the Intel® G45 Express Chipset and Intel® GM45 Express Chipset.
Stage 1: A GPU-Bounded Problem
In the Demigod game, several different approaches were taken to localize GPU workload as a performance-sensitive area when running on Intel Graphics. The goal of this performance analysis was to yield the greatest performance increase with the least amount of fidelity loss to bring the frame rate within a playable range. To do this, low Þ delity settings were selected as a base case. A test level was selected for the game, and performance sampling was started with the Intel GPA System Analyzer. Figure 1 illustrates some interesting metrics that noted a low frame rate and fairly signiÞ cant graphics utilization.
Given the relatively low overall processor utilization and memory bandwidth load noted in Figure 5, we can presume that this is not indicative of a single slow frame but rather an overall GPU-bounded performance problem with the scene itself.
Figure 1. Intel® GPA System Analyzer: Sampling of a scene in Demigod* indicating a high load on the graphics processing unit.
Stage 2: Scene Selection
Further analysis required selecting a specific scene that was yielding low frame-rate numbers given that the GPU workload remained high as indicated by the overall frame rate and low processor utilization in other scenes as well. The scene in Figure 2 was selected because it is representative of a typical environment rendered with lower graphic settings in which the game operates in terms of visuals, level of detail, characters, props, and graphical workload. The red square in the upper-left corner indicates the presence of Intel GPA, and the frame rate is indicated in yellow noting 14 frames per second (FPS) for this scene.
Figure 2. A typical scene in Demigod*: Graphics detail is shown using the lowest game settings.
Stage 3: Isolating the Cause
In some cases enabling efforts are supported by the presence of source code. Demigod was one such case allowing for a detailed exploration of the code and how it matched up to what was going on in the rendered scene. Much like the Intel® VTune™ Performance Analyzer can identify hot spots in CPU code, the Intel GPA Frame Analyzer identifies hot spots at the DX API level as it maps to GPU performance. The Intel GPA Frame Analyzer displays GPU performance data at the following levels: frame, render target, individual draw call, and the tool's current selection set of draw calls. This is how the Intel GPA Frame Analyzer is able to quickly identify GPU bottlenecks and map them back to DirectX constructs that make sense to a game developer. The Intel GPA Frame Analyzer allows us to further explore the GPU performance of the game in greater detail. When we first started the analysis we did not see high bus utilization, which would be expected in cases where the GPU is handling too large of a vertex buffer so it was likely that the issue was elsewhere on the GPU side. The Intel GPA Frame Analyzer provides a window into what is going on in the scene.
Figure 3 shows an extra long draw call (based on GPU duration) within the GPU processing of the frame. This call is immediately shown to be a Clear operation.
Figure 3. Intel® GPA Frame Analyzer's GPU bar chart indicating an outlier Clear call.
Unnecessary calls to Clear have a performance impact on Intel Graphics. Noting that we have selected a low fidelity mode, the code was double checked, and we determined that although Shadows were disabled in low fidelity mode, the sizeable texture buffer was still getting allocated and cleared. A simple condition to avoid this call when Shadows were disabled yielded a slight performance boost to 15 FPS, as noted in Figure 4, without changing the rendered scene.
Figure 4. After disabling the Clear call when Shadows are disabled.
Now that one problem was diagnosed, returning to the Intel GPA System Analyzer yielded numbers similar to the previous result, shown in Figure 5. This is somewhat expected because the first fix was not the root cause, but was still worth evaluating as a possible change.
Figure 5. Intel® GPA System Analyzer after skipping the Clear call in low fidelity mode.
This supports the assumption that the GPU is still busy doing other work. The Intel GPA Frame Analyzer also details shader activity within a sample set and a single frame. Returning to the Intel GPA Frame Analyzer and looking at the shader activity during that frame as well as performing an analysis of the shader itself and the number of instructions executed by each, we found that a specific shader was consuming a lot of time on the GPU. The Intel GPA Frame Analyzer offers the useful "comment this out" functionality by overriding a shader to short circuit it and does the work of only outputting a single color-yellow. This provides a visual indication of what that shader does. Best of all, this can all be done without editing a file. Just change a setting in the Intel GPA Frame Analyzer, and the frame will be recomputed on-the-fly with the output applied by the test shader to render the same scene we saw before. The frame rate increase indicated in Figure 6 is a result of applying the simplified Intel GPA Frame Analyzer yellow shader.
Figure 6. Same scene with a high CPU load shader outputting yellow.
It looks as if this particular shader is not signifcantly affecting the scene in Low Fidelity mode. Figure 7 shows a pixel-by-pixel comparison of the key differences between this altered scene and the original. Looking at the code, it turns out that the shader applies a metallic feature to the structures in the scene, ignoring some differences in the lava's post-processing effects that will be explained later.
Figure 7. Pixel-by-pixel image comparison of the Intel® CPA Frame Analyzer's yellow shader and the original.
Removing this shader from processing bumped the frame rate up again to roughly 18 FPS, while only removing a relatively low visual fidelity attribute in the scene. Returning to the Intel GPA System Analyzer with the change to skip the Clear call and not use the metallic shader yielded the results shown in Figure 8.
Figure 8. Intel® CPA System Analyzer after applying the clear and shader change.
Based on this new sampling from the Intel GPA System Analyzer, the processor load is relatively the same as in the previous sample sets, and as evident in the game the frame rate was still low indicating that a GPU-bounded problem still exists. Returning again to the Intel GPA Frame Analyzer, it appears that two post-processing effects on the lava in the scene were consuming a good deal of resources for integrated graphics. As Figure 9 shows, by disabling Bloom and Blur in the code that Demigod provided, the frame rate jumps up to 26 FPS but a great deal of visual fidelity is lost, which is not desirable.
Figure 9. Light shaft Blur and Bloom disabled-clearly not a desirable change.
After noticing the fidelity loss by disabling both settings, Bloom was left on, but the Blur post-processing effect from the light shafts was disabled, yielding nearly the same performance gain (24 FPS versus 26 FPS found when both effects were disabled). Figure 10 shows this result.
Figure 10. Final result: Clear, metallic shader removed; light shaft Blur disabled with Bloom left on.
The final tally is a net increase of from 14 to 24 FPS running on Intel Graphics in low fidelity mode simply by removing a few high-end effects while preserving as much of the scene as possible. Reflecting back to the pixel-bypixel comparison that included the Blur effect's removal, you'll see that the net total difference was relatively small compared to the rendered scene, bringing the game within a more playable range.
To learn more about Intel GPA and Intel Graphics, please read the Intel® 4 Series Chipsets Integrated Graphics Developer's Guide: /en-us/articles/intel-graphics-media-accelerator-developers-guide
About the Authors
Jeff Freeman is a software engineer in the Intel Software and Services Group, where he supports Intel Graphics solutions in the Visual Computing Software Division. He holds a B.S. in Computer Science from Rensselaer Polytechnic Institute.
Doraisamy Ganeshkumar is a software engineer in the Intel Software and Services Group, where he supports Intel Graphics solutions in the Visual Computing Software Division.
- Intel® Graphics Media Accelerator Developer's Guide: /en-us/articles/intel-graphics-media-accelerator-developers-guide
- Demigod*: http://www.demigodthegame.com/
- Gas Powered Games: http://www.gaspowered.com/
- Stardock: http://www.stardock.com/index_demigod.asp
Capture the buzz. Subscribe to Intel® Software Dispatch for Visual Adrenaline. (Did we mention it's fun, informative, visually stimulating, free, and you can unsubscribe at any time?)