For the most recent installment of Tom Clancy’s* Splinter Cell series, engineers from Ubisoft and Intel analyzed the game to make it run smoothly and achieve the best performance on Intel® hardware. Using Intel® Graphics Performance Analyzers (Intel® GPA), we found some bottlenecks in the frame. Ubisoft was then able to optimize the draw calls we identified, tripling the frame rate. The expensive rendering passes we found were the lighting environment pass and the shadow pass. On a 4th generation Intel® Core™ i7 processor-based desktop, a frame rate of 43 frames per second (fps) was achieved at 1366x768 resolution with low settings, 35 fps on medium settings.
Specifying a Workload
Optimizing a game is full of experiments—changing shaders, using different textures, trying new approaches—to find troublesome components or rendering passes and increase the speed. As with any experiment, an analytical approach is vital in determining the quality of the results. An in-game workload (representative scene where performance is lacking) should be chosen for the analysis. Figure 1 shows one such scene: numerous objects, multiple light sources, and several characters.
Figure 1. Scene chosen for analysis.
Identifying and Addressing the Problems
To see what’s going on here, we used the Intel GPA Monitor to remotely capture a frame for analysis. Loading that in the Intel GPA Frame Analyzer, the problematic ergs (units of work, from the Greek ἔργον (ergon) meaning “work”) can be identified by their charted size, the amount of time the GPU spends on them. By inspecting the ergs individually and mapping them to stages in the game’s pipeline, we isolated two main issues (figure 2): the SeparableSSS pass and the lighting environment pass. The SeparableSSS pass was removed due to the extremely high cost.
Figure 2. Erg view of Intel® GPA frame capture before optimization
This definitely helped the overall performance. With these two ergs out of the way, the lighting environment pass was our next area to tackle. It was problematic not solely because of the amount of time spent on each erg, but the large number of ergs used, combining to constitute a prohibitive amount of total GPU time: ~100 x 1500 μs = 150,000 μs! Figure 3 makes this painfully clear.
Figure 3. Lighting environment pass grows prohibitive
Figure 4 shows the gains after optimizing the lighting environment pass.
Figure 4. Removing the lighting environment pass sped things up significantly
The two ergs tied for second place in height are the ShadowPass.Composite function. These were also optimized.
After these changes, combining the highly effective shadow compositing into the rendering passes brought processing costs still lower (figure 5).
Figure 5. Intel® GPA frame capture after optimization
Overall, the performance approximately tripled, achieving 43 fps at 1366x768 resolution with low settings (35 fps on medium settings) on a 4th generation Intel Core i7 processor-based desktop.
Approaching optimization like a series of experiments is useful in finding the root causes of your performance problems. The issues outlined here represent just a few ways to use Intel GPA to streamline your applications. Many more can be found through the documentation, online forums, and articles on Intel® Developer Zone.
About the Author
Brad Hill is a Software Engineer at Intel in the Developer Relations Division. Brad investigates new technologies on Intel hardware and shares the best methods with software developers via the Intel Developer Zone and at developer conferences. He also runs Code for Good Student Hackathons at colleges and universities around the country.
Intel, the Intel logo, and Core are trademarks of Intel Corporation in the U.S. and/or other countries.
Copyright © 2013 Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.