by Brad Hill and Leigh Davies.
Sonic & All-Stars Racing Transformed is a game from Sumo Digital* published by Sega* on the PC and several other gaming platforms including Xbox* 360, PS3*, PS Vita* and Wii* U. Optimizing the PC version of the game proved a sizable task for Sumo Digital that yielded additional benefits for other target platforms. Intel® engineers worked with Sumo Digital to ensure the PC version runs on par with the other platforms and is optimized to take full advantage of PC technology. This case study describes the techniques used to identify and overcome some of the obstacles encountered in adding sensor control, touch support, and Intel® 3rd Generation Core™ optimization. GPUView and Intel® Graphics Performance Analyzers (Intel® GPA) were used to identify GPU stall periods and track down the causes. Eliminating these nearly doubled the average frame rate from 13FPS to over 25FPS on our test PC.
Figure 1. Screenshot of initial gameplay displaying frame rate of 13 FPS in upper-left corner
Sonic & All-Stars Racing Transformed is a fast-paced cross-platform multiplayer racing game. This case study illustrates how the issues in performance and creating a balanced control experience were identified, addressed, and resolved. Similar approaches can be useful in your own game development.
To maximize the reach of this game, Sumo Digital needed to port it to PC and touch-enabled and sensor-enabled devices including tablets and Ultrabooks. Some touch and tilt support was available from another game, but time constraints had limited previous development. This project allowed us to build on that foundation.
In developing a PC title, it’s most efficient to support the widest range of platforms and environments possible. Ideally, the game should run on all versions of the Windows* OS that are still used by a considerable number of users - everything from XP* up to and including Windows 8. This was an important factor in the choice of touch API, as the Windows 7 version was more widely supported than that of Windows 8. Since the Windows 7 API is ‘lower level’ with access to the raw touch data, this allowed existing touch functionality from the console versions to be repurposed.
The goal of setting a high bar for a PC product targeting processor graphics gave us a new opportunity: to go beyond simply tweaking it for generic PC gaming and instead, fully optimize it for Intel 3rd Generation Core. The main tools we used for performance analysis were GPUView, a tool to monitor GPU and CPU activity and Intel Graphic Performance Analyzers (Intel GPA). These utilities proved invaluable in identifying and addressing GPU stalls which were causing drops in frame rate.
None of the console versions were purely controlled by touch, so the design starting point was a UI that treated touch as an ‘additive’ control system. The need for a touch-only front end lead to the addition of back buttons, large touch zones, and the rework of many screens to enlarge buttons for touch. A more advanced in-game control settings screen was added, as seen in Figure 2.
Figure 2. Custom controls for touch
With the removal of a requirement to use a gamepad or keyboard, racing controls are primarily driven by a virtual joystick or tilting the device, augmented by buttons displayed on the touch screen as seen in Figures 2 and 3.
Figure 3. In-game controls showing virtual joystick and primary action buttons
To implement touch in a Windows 8 Desktop app with backwards compatibility, there are two event models to choose from: WM_GESTURE and WM_TOUCH. A detailed article on Windows 8 touch input is available in References.
WM_GESTURE has many gestures already defined, but it is more appropriate for navigation and manipulation than real-time game control for multiple reasons. It uses a time delay to determine whether a touch is a single press or the start of a gesture, such as a pan gesture, and it doesn’t allow for multiple simultaneous gestures to be tracked. Sumo Digital had designed the touch interface to use both hands for independent controls and wanted to repurpose as much of their existing console code as possible.
The more suitable method was WM_TOUCH, which registers the raw touch events themselves. This allows not only finer control but more robust options as multiple individual fingers can be tracked, limited only by the touch screen hardware itself. This tradeoff was an exchange of gaining more control at the cost of a more complex implementation effort.
Due to a wide range of devices (and hands!), Sumo Digital opted to go with dynamically repositionable controls, tied to the dominant steering hand’s touch contact point. Using dynamically repositionable controls meant the key controls were always in a place suitable for both the hand size and the posture of the person holding the device and could adapt if the player changed grip on the device while playing.
Using the older Win7 touch API meant limited touch points, but Sumo Digital had already intended to keep the controls simple and intuitive. The number of buttons was reduced by implementing auto-accelerate and using drift to act as a brake when not turning. Clever clustering of key controls allows one touch zone to be used to detect multiple button presses. Sumo Digital also added simple gesture support; the player can begin a swipe gesture on the stunt button to control in game stunts, which removed the need for a second analogue joystick.
To make use of the Ultrabook PC’s additional input methods, control was expanded beyond simple touch to include inclinometers for steering the vehicles. These sensors measure the tilt of the device on all 3 axes. Since the vehicles in the game transform among land, sea, and air modes, this tilt control is ideal to seamlessly transform 2-dimensional racing into 3 dimensions when the players take to the air.
Much of the underlying sensor code was created by Intel. The sensor library directly polls the sensor once per frame providing a very fast response time when detecting changes in the device orientation. Details on the sensor library can be found in the "Blackfoot Blade" case study listed in References with many of the lessons learned in that title benefiting Sonic & All-Stars Racing Transformed. An article with sample code that uses the library is also listed in References.
With the technical challenges of adding sensor support mostly solved using the Intel libraries, the actual game play issues were expected to be the next area in need of attention. Unfortunately, sensor control raised some interesting problems. First, many users would hold the same device in different ways; some held the device like a wheel, some like a tray. They’d also steer by turning the device in different axes too. The problem wasn’t too hard to solve for the car and boat as the game only has to worry about steering in one axis, but planes were another story. This required dynamic recalibration of the default sensor position, both whilst playing but also at key points, for example when the vehicle transforms or when the game is paused. Dynamic recalibration also handled ‘fringe’ cases like when the device is passed to a friend so they can have a go, or the player lies back in bed, or pauses the game, puts the device down, then comes back and holds the device in a different way.
A second problem occurred when comparing the responsiveness of touch and sensor controls to the gamepad experience. With touch and sensor input, minor user errors lead to exaggerated negative results. This was remedied by adding a steering assistance mechanic. This is an ‘additive’ system that purely adds a varying amount of input to what the player is inputting, but in a way that doesn’t ‘fight’ the player or play the game for them.
Once the game’s component systems were largely complete, GPUView was used to check the GPU performance. We noticed that there were significant gaps in the GPU hardware queue, where the graphics are processed and rendered, as depicted in Figure 4. Ideally, the GPU would be running all the time unless deliberately limited to conserve power, constantly queuing new frames while the current frame is being rendered to maximize the frame rate.
Figure 4. GPUView shows GPU stalls as gaps among the top green bars
The bars on the top represent GPU activity, with recurring patterns indicating frames. The GPU should be constantly active when running, but here we see gaps of 5-6 milliseconds per frame. It may seem like a small delay, but this constitutes about 20% of the total frame drawing time and makes a significant impact to the game’s frame rate. These gaps coincided with stop start behavior on the CPU. Thus the CPU and GPU were virtually serialized, causing the delays. Using GPUView to investigate the DirectX events around the stall points, it was found that Lock Allocation events were happening that were configured for the CPU to wait for the GPU to complete its work. See Figure 5 for the event details that line up with the red line in Figure 4.
Figure 5. GPUView metric for the Lock Allocation event most likely to be the cause of the stall.
GPUView also allows the developer to show details of the memory that was being locked at this point. This is shown below in Figure 6. Note the allocation handle is the same at 0xFFFFFFA800B784330. The important thing to notice is the lock is on a resource that is a D3DDDIFMT_R32F texture format and is 1x1 pixels in size.
Figure 6. GPUView metric for the memory being locked.
This was enough information to investigate the likely cause of the lock in GPA. In GPA we could view all the Render Targets and Textures to find anything that was 1x1 and a 32bit Float, we could also look at the API log to find problematic “LockRect” calls that caused the CPU to wait on the GPU. The Lock calls are shown below in Figure 7.
Figure 7. GPA API log call showing all LockRect calls
The problem was traced to the CPU polling the GPU for data every frame. In this case, the CPU was waiting until the GPU had rendered data into a 1x1 texture that was being used to calculate the average luminosity of the screen for a technique called tone mapping. The GPU would then sit idle while the CPU calculated the data needed to be used in the Tone Mapping Post processing effect, and then built up enough information to create a new DMA packet of information to send to the GPU and restart the hardware queue. Ideally, the GPU should always have data prepared so that the CPU does not have to wait to retrieve the data.
This problem was fixed by ensuring the CPU worked on data from a previous frame. The GPU resource was first copied to a CPU readable resource using the DirectX function StretchRect. Two frames later, this resource was locked, ensuring the GPU had completed the work before the CPU requested it. The CPU lockable rendering surface would be selected from several spare surfaces in a “round robin” manner, ensuring that the CPU was never asking for data that the GPU had not yet calculated.
Figure 8. Optimized code metrics show smoother performance
As shown in Figure 8, the result is a much smoother frame workload from having removed the gaps in both the GPU and CPU processing.
The optimization was further enhanced when Sumo Digital streamlined the post processing by combining techniques. The original shadow and lighting calculation system generated and used a stencil buffer for a three-pass system. A new platform-specific version of the code was created using a different set of shader and zbuffer commands that streamlined the processing to a two-pass system without any visual compromise.
In addition, GPA hardware metrics showed the pixel shaders to be bandwidth-limited in the texture samplers. This code was reworked to allow some of the less complex shaders to pre-calculate the values and store these into unused alpha channels. This allowed use of fewer textures in the post-process shaders, giving a better ratio of math instructions to texture fetch instructions (which introduce latency).
The combination of the improved post processing, new shadow and lighting system, the elimination of the GPU stall together with many other smaller optimizations resulted in the frame seen in Figure 9. Not only is the frame rate more than doubled, the visual quality has also been improved with higher quality lighting and with additional post processing effects including Ambient Occlusion.
Figure 9. Screenshot of completed gameplay, with frame rate of 29FPS denoted in the upper-left corner
This case study demonstrates some solutions for typical obstacles in creating and optimizing touch-based games. The work done on the PC version allowed Sumo Digital to back port many of the control improvements to other versions of the game. The PC with its larger heavier devices when compared to phones and devices such as the PS Vita raised control issues that weren’t previously noted. Solving these problems benefitted all devices. The self-calibration of the inclinometer happened in time to ship with the PS Vita version and made a big difference to the control. Making the right decisions in implementation of sensors and touch can solve many problems in performance and user experience. Tools such as Intel GPA are vital to find and capitalize on opportunities for optimization, preventing unnecessary delays and taking full advantage of the hardware.
About the Authors
Brad Hill is a Software Engineer at Intel in the Developer Relations Division. Brad investigates new technologies on Intel hardware and shares the best methods with software developers via the Intel Developer Zone and at developer conferences. He is currently pursuing a Master of Science degree in Computer Science at Arizona State University.
Leigh Davies is a senior application engineer at Intel with over 15 years of programming experience in the PC gaming industry. He is a member of the European Visual Computing Software Enabling Team providing technical support to game developers, areas of expertise include 3D graphics and recently touch and sensors.
Comparing Touch Coding Techniques - Windows 8 Desktop Touch Sample: http://software.intel.com/en-us/articles/comparing-touch-coding-techniques-windows-8-desktop-touch-sample.
Implementing Touch and Sensors for Windows* 8 Desktop Games: Confetti Interactive’s* experiences developing "Blackfoot Blade": http://software.intel.com/en-us/articles/implementing-touch-and-sensors-for-windows-8-desktop-games-confetti-interactive-s.
Accessing Microsoft Windows* 8 Desktop Sensors: http://software.intel.com/en-us/articles/accessing-microsoft-windows-8-desktop-sensors
Test PC Specifications
Ultrabook, Intel CoreTM i7-3667U CPU @ 2.00Ghz with HD4000 Graphics, 4GB Memory. Windows 8 Pro 64-Bit OS. 5 point Touch support.
Ultrabook™ products are offered in multiple models. Some models may not be available in your market. Consult your Ultrabook™ manufacturer. For more information and details, visit http://www.intel.com/ultrabook
*Other names and brands may be claimed as the property of others.
Copyright© 2013 Intel Corporation. All rights reserved.
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks