Eugen Systems* and Intel worked together so that Wargame: European Escalation looks and runs great on Intel® platforms. We used Intel® Graphics Performance Analyzers (Intel® GPA) to quickly find graphics bottlenecks and let engineers at both companies try out fixes in the game. After identifying the bottlenecks, Eugen Systems added level of detail (LOD) support to the vegetation and grass, and sped up the shadow code. Now the game runs 1.3x faster, and no longer has low frame rates on close zoom.
The game also has excellent touch support on Windows* 8 desktop, and here we study how that was added to the existing game. Wargame uses the WM_TOUCH Windows messages, in order to have complete control when interpreting touch events.
Eugen Systems makes fast learning curve RTS games, with thorough game mechanics. Their latest game, Wargame: European Escalation looks and runs great on Intel platforms running Windows 8 Desktop. This case study shows how Eugen Systems and Intel ensured that the game looks very nice on systems with Intel® HD Graphics. Intel Graphics Performance Analyzers (Intel GPA) were essential for finding graphics bottlenecks, quickly experimenting with possible fixes, and verifying the final speedup.
After adding LOD support to the vegetation and grass, making several performance changes to shadows, and other optimizations, the game runs much faster than before on Intel® 3rd Generation Core™ processors with HD Graphics. The average speedup is 1.3x, with some cases yielding much better improvements.
The game also uses touch on Intel Ultrabooks™ with touch screens. Here, we look at how touch support was built deeply into the game, for a first-class touch gaming experience. Starting with some existing Windows 7 touch support, Eugen used the WM_TOUCH Windows messages to extend touch support throughout the game. All parts of the game now support touch on Windows 8 Desktop (backwards-compatible with Windows 7). Supporting a rich set of interactions with each unit is complicated; we explain how the gamer can now give basic and expanded commands to all units with touch. The menu systems also didn’t easily support the game’s mouse-over tooltips when playing with touch. We list the design choices that let the gamer easily navigate the menu system, while still getting tooltip support when it’s needed.
The game also detects your GPU and picks the best settings for you “out of the box”, without any manual configuration required.
Together, these features and performance give a great gaming experience on Intel platforms. Let’s see how they did it.
For DirectX 11 performance on Intel’s latest platforms, early builds of the game ran a bit slower than they could. Here’s the starting data at common settings, running a typical scenario, at an average zoom height.
1280x720, Low quality
1366x768, Medium (Mid) quality
Intel® HD Graphics 2500
Intel® HD Graphics 4000
Figure 1:Baseline performance, before changes
The game ran great under some configurations. While the average results were OK, the game didn’t run as well as we hoped at medium quality settings, or on the HD Graphics 2500 part. We knew the game could run great on all of these configurations, and run consistently at different zoom levels. Most of our attention in this case study focuses on HD Graphics 4000 and 1366x768 resolution, at medium quality.
Although it’s not shown here, when the game was zoomed all the way in, performance would also drop significantly. The game already used a level of detail (LOD) technique to make rendering simpler when you zoom out, by switching off grass and vegetation rendering above a certain camera elevation. This works well, and prevents the frame rate from dropping at high zoom levels, but the close zoom levels needed attention to avoid frame rate problems.
Looking for bottlenecks 1: Was it CPU or GPU bound?
Now, it was time to find the bottlenecks. Using the overrides in GPA:
Disable Draw Calls
Figure 2:Experiments with overrides in GPA
There was a large speedup when we used the NULL Hardware override, so we knew it was spending a lot of time executing in the graphics hardware. Going the extra step and disabling draw calls, we saw another speedup. This tells us that the game was spending some time in DirectX and the graphics driver, as well. But the NULL hardware test made it clear that the majority of the time was being spent in the hardware.
This game was GPU-bound. We confirmed this with GPUView.
Figure 3:CPU queue in GPUView, averaging ~3 frames queued
Here, we see the queue of draw activity over time, with the boundary of each frame visible in red. There were ~3 frames queued up all the time, so the game was consistently waiting for the GPU. The game was fully GPU-bound.
Looking for bottlenecks 2: Why was it GPU-bound?
Now, let’s go deeper, to find areas for possible improvement.
Studying a typical frame in more detail, we checked for the most time-consuming activities. At first glance, the Erg view displays a lot of Draw calls, with a few larger spikes of activity but no clear place to investigate first.
The Render Target view made it clear that there was a lot of time spent on just one render target, about 60% of the frame time. Selecting that render target, a switch to the erg graph highlighted the draw calls that affect the same render target. With this render target selected, the frame buffer view made it clear that this was the vegetation (select wireframe to see the geometry more easily).
Figure 4:Render Target view, showing most of the frame time dominated by a single RT
Figure 5:Erg view of the same selection, showing all the Draw calls that affect the same Render Target
Frame Analyzer reported that the vegetation rendering took about 60% of the frame time (about 300 draw calls). This points out the value of using Frame Analyzer to study your game once in a while; we regularly learn surprising facts about the games we study. Since there is always some bottleneck, the GPA Analyzers are a very good way to regularly find and understand them.
Figure 6:Wireframe view of the same selection, showing grass and vegetation
The wireframe view makes it clear that this is grass and vegetation. So this part of rendering was a good place to look closer.
Frame summary and high-level observations
To get a better understanding of the frame and its grass and vegetation bottlenecks, we reviewed the frame summary in Frame Analyzer. Here’s the most interesting data
Post-GS Primitive Count:
Figure 7:Frame summary
Several things become clear from this data:
- There were lots of small primitives. Since there was a skybox with simple geometry, the rest of the geometry was very small, which you can also see in the wireframe view above. This view also made it clear that the vegetation geometry had a constant density, which is more dense than necessary for the far vegetation.
- There are lots of PS invocations, ~11 million. This number was far too high to be efficient.
- The EU Active count is very low; between EUs that are idle and stalled, much could be done to improve utilization of the execution units.
- There was a clear split between large ergs and small ergs; selecting one part highlights that this is near vs. far vegetation geometry (see below).
Let’s step through these issues.
While the issues may all be related, let’s start with the difference between near and far vegetation geometry.
The vegetation geometry was very dense. Especially for farther distances, that can have a large effect on GPU performance, with very little quality gain. In addition, when rendering small overlapping alpha-tested triangles, performance can suffer, and dense geometry will tend to exaggerate this effect.
Although the game had a good LOD system based on camera height, perhaps there was more that could be done to make it easier to render this far geometry.
Within the draw calls for the grass and vegetation, there seem to be larger (slower) and smaller (faster) ergs. Selecting just one part of these made it clear.
Figure 8:Small vegetation ergs selected
With those ergs selected, it was clear that they were rendering the near vegetation.
Figure 9:The geometry drawn by the small vegetation ergs
These near geometry calls were relatively small and fast, and rendered 4 million pixels (out of 6.4 million total rendered). Together, they accounted for about half of the PS invocations but took only 13% of the frame time.
Selecting the other vegetation ergs confirms what we have already started to see.
Figure 10:Large vegetation ergs selected
Figure 11:The geometry drawn by the large vegetation ergs
The calls for the far vegetation were much slower. Together, they accounted for 32.2% of the frame time, while rendering only 2.1 million pixels (of 6.4 million total).
These suggested the far vegetation could be rendered more efficiently.
To render the vegetation faster, the geometry could be simplified, perhaps by adding LODs. Rendering fewer primitives, especially for the far vegetation, could speed up the game significantly. As long as the primitives didn’t get too large, there would not a major impact on rendering quality. The vegetation could also be rendered with depth writes disabled, with little change in rendering quality. Finally, the grass shaders could be simplified so they run faster.
With these details, Eugen made several changes to make the grass/vegetation render much faster. They implemented LOD for vegetation, disabled depth writes on far grass LODs, and sped up the grass/vegetation shaders.
After vegetation, shadows were the second large part of rendering. The game has relatively complex shadow rendering, and includes multiple shadow passes plus post processing.
Figure 12:Shadow passes
Shadows took ~17% of the frame time. If shadows could be simplified, using shadow mapping or by simplifying or removing the post-processing, there could be a significant speedup. This is especially interesting because players will often spend much of the game zoomed out. Since high-quality shadows are really only visible when zoomed in close, the overhead of shadows was unnecessary much of the time.
On a typical map, much of the geometry is horizontal and does not cast shadows at all. Perhaps this meant that grass and terrain could use projective textures, instead?
After studying the data, shadow shaders were sped up by moving instructions from pixel shaders to vertex shaders. This can often be a good optimization, to move any linear calculation so that it’s run per-vertex and not per-pixel.
Shadows were also dramatically sped up by preventing very small geometry from rendering to shadow maps.
After applying the geometry and shadow changes, the game renders much faster. At the same resolution and quality settings (1366x768, Medium quality), the game now runs at 27.7 FPS, a 1.3x speedup for the average case.
On top of that, even when zooming in to the closest camera settings, the game maintains a playable frame rate; it no longer drops to unacceptable levels.
Between these two sets of changes, the overall gaming experience is much better. Thanks to some analysis by GPA and some development work by Eugen Systems, the game runs great on Intel platforms.
Wargame: European Escalation can be completely played with touch. You can use touch exclusively, use just the keyboard and mouse, or use a mix of the two; whatever works for you!
The code already had some touch support, from earlier releases of the game engine. Because the code had already run on Windows 7, it was important to Eugen Systems to maintain backward with Windows 7 while adding support for Windows 8 Desktop.
They used WM_TOUCH Windows messages to receive individual touch events, and added code to recognize gestures. They chose to not use WM_GESTURE (the other option for supporting touch on both Windows 7 and Windows 8), because it didn’t support all of the touch interactions they needed, and the gestures are not as configurable as the custom gestures that Eugen built with WM_TOUCH. For example, single-tap events are not available through WM_GESTURE. They’re presented to your app as mouse events instead. Eugen Systems dedicated one engineer for about a month to fine-tune the multi-touch code.
The game fully supports touch for map navigation (move, zoom, and rotate), unit selection and commands, and menu navigation.
There were several challenges:
- Some unit selection and commands used the left and right mouse clicks. This required both the single click and double hold+click gestures.
- The existing menu system used tooltips to display details during mouse-over events. This didn’t map well to touch input. Although it’s possible with most touch message interfaces on Windows to detect when a finger is near the screen but not yet touching it, it’s very difficult to fine-tune and difficult for the player to use properly. They used a quick touch to trigger the menu selection, and touch+hold to trigger the tool tip, to let the player navigate through the menu properly.
- Touch events cause touch messages to be issued to your game, but your game also gets duplicate mouse clicks. There is a technique for checking whether a mouse click came from the mouse or from a touch input device, but they discovered this technique did not work all the time. Eugen engineers implemented a heuristic to track the mouse location, to double-check if a mouse message might have come from the mouse. Spurious mouse clicks, which were already handled as touch events, were ignored.
After they had written and fine-tuned the custom gesture recognizers for all of the gestures mentioned above, the game had robust touch support for all elements of gameplay and configuration.
While the game ran well on Intel graphics hardware initially, the vegetation and shadow systems were limiting the game’s performance. A number of vegetation LOD and shadow changes let the game run much faster. The game now includes best-in-class touch support, too. We hope you are inspired by this success, to speed up your game and add touch support!
About the Authors
Philipp Gerasimov is a Senior Game/Graphics Application Engineer in Developer Relations at Intel.
Paul Lindberg is a Senior Software Engineer in Developer Relations at Intel. He helps developers all over the world to ship kick-ass games and other apps that shine on Intel platforms.
Touch development and performance measurements were taken on an Ultrabook, running Microsoft Windows 8. It was running an Intel 3rd Generation Core i7 processor, at 2.0 GHz (model 3667U), with Intel HD Graphics 4000 and a touch screen. It has 4 GB of memory.
Eugen Systems - http://www.eugensystems.com/
Intel Graphics Performance Analyzers (Intel GPA) - http://www.intel.com/software/gpa
Graphics Developer Guide - http://software.intel.com/en-us/articles/intel-graphics-developers-guides
Microsoft GPUView - http://msdn.microsoft.com/en-us/library/windows/desktop/jj585574(v=vs.85).aspx
Comparing Touch Coding Techniques - Windows 8 Desktop Touch Sample – http://software.intel.com/en-us/articles/comparing-touch-coding-techniques-windows-8-desktop-touch-sample
Intro to Touch on Ultrabook: Touch Messages from the Windows 7 interface – http://software.intel.com/en-us/blogs/2012/08/16/intro-to-touch-on-ultrabook-touch-messages-from-the-windows-7-interface