Few computer games have the audacity to encompass the entire span of human civilization within their boundaries or to resurrect famous leaders across centuries of human existence to offer guidance to gamers. Sid Meier’s Civilization* franchise doesn’t shirk from giving players an epic journey through the victories and perils of building an empire, whether through diplomacy, ingenuity, or physical force.
Civilization V (Civ5), the latest version of this long-running, turn-based computer strategy game, introduces a brand new architecture built from the ground up. This architecture substantially improves the performance of the game engine and uses parallelism to enhance the responsiveness of gameplay on a wide range of hardware platforms. Given a processor with two, four, or six cores, Civ5 will distribute tasks and workloads to maximum advantage. Efficient threading is the key to rapid-fire gameplay (with up to 12 threads available on some Intel® processors) and tools to streamline threading proved to be an essential element of Firaxis’ development process.
As a key tool in building this ground-breaking enterprise, Intel® Graphics Performance Analyzers (Intel® GPA) 3.0 gave Firaxis developers a window into the game’s processor core usage, platform operations, and frame complexity. This enabled Firaxis developers to tune and optimize the game and graphics performance for a wide variety of hardware platforms. The result is a monumental advance in the Civilization franchise and a gameplay experience that scales well whether played on a laptop or a scorchingly fast desktop gaming rig. The new Intel® Core™ processor family featuring Intel® HD Graphics offers a choice of gaming platforms for prospective empire builders, spanning entry-level, mainstream, and enthusiast PCs.
New Game, New Architecture, New Engine
No game publisher can expect to thrive in the competitive landscape of the PC games industry today without knowing how to take advantage of the latest software tools to exploit the ever-evolving platform capabilities. With games growing more complex each year, coupled with new APIs and hardware platforms, three cardinal rules guide developers:
Achieve scalable game performance Optimize for a range of platforms to increase the size of the target audience Create a roadmap to accommodate future platforms
During the development of Civ5, Firaxis took those factors into account and re-architected its game engine to take maximum advantage of multi-core processing—with an eye to the future as many-core processors become even more prevalent. Intel GPA, Intel® Threading Building Blocks, and the Intel® VTune™ Performance Analyzer proved valuable in maximizing threading performance. Firaxis was able to tune Civ5 for a wide range of PCs, eliminating graphics bottlenecks hurt performance and distracted from the gameplay experience. Intel Threading Building Blocks 3.0, preinstrumented for the Intel GPA 3.0 Platform View, effectively complemented the depth of performance visualization provided by Intel GPA and helped Firaxis developers improve parallel code efficiency.
The throughput challenges presented by Civ5 were particularly demanding because of the number of character interactions taking place at any given time. For example, in a first-person shooter, a typical maximum of 20 fully animated, unique characters might interact in a scene. In a massively multi-player online role-playing game, the number of animated characters could climb to 100. In real-time strategy games, up to 500 characters might be interacting. In comparison, Civ5 had to support up to 10,000 characters at a time. To accommodate this massive level of throughput, the development team embarked on a mission to increase the scalability of the game engine and to take better advantage of multi-threading.
Identifying and Eliminating CPU and GPU Bottlenecks
“When you are writing a game,” Dan Baker, graphics lead at Firaxis Games, said, “two things can potentially slow you down: the CPU or the GPU. You can’t run any faster than whichever one of those two is slowing you down. We have to use a two-prong approach and worry about what is happening on both the CPU and the GPU.”
With the addition of the new Platform View feature, Intel GPA 3.0 now provides developers with a system-level view of operations, offering a clear, visual depiction of CPU and GPU activities on the same scale. This capability, coupled with the ability to zoom down to a low-level to pinpoint performance issues, provides an extremely useful macro-level and microlevel perspective to developers. Firaxis capitalized on these features during Civ5 development to scope out large-scale problems and then address them at the code level.
Graphics Performance Example
As part of the graphics performance tuning for Civ5, Dan and the development team optimized the game to run smoothly on the integrated graphics built into Intel® Core™ i5 processors.
“We analyzed what was slowing it down,” Dan said. “With graphics, the non-performance of something is almost never related to its visual impact. Sometimes a really subtle visual effect is actually very expensive. And something that everyone thinks is awesome is actually relatively cheap [in performance terms]. You find out real fast that some things that are providing a little bit of visual quality are very expensive. If you turn those down, you can really get the gameplay performance running a lot better.”
The team ran Intel GPA to figure out where the biggest performance hits were taking place and quickly determined that effects such as water simulation, with its heavy shaders and distance fog, were strongly impacting performance. Figure 1 shows one of the frames included in the analysis, before the code was changed to simplify the graphics processing tasks.
“Intel GPA lets you go in,” Dan said, “and experimentally try different approaches. First you find out where the performance hotspots are. Then you can try different techniques to reduce the graphics processing tasks for that spot to see if it still holds up visually. For example, we looked at the shadow and overlay resolutions for the user interface. By reducing them from 2K x 2K to 1K x 1K, the game speeds up tremendously.”
Dan pointed out the changes between two equivalent frames (Figure 1): the original frame that was analyzed and the same frame after some adjustments were made based on analysis results.
“You can see [in the optimized frame] that the selection icons on the units are a little less crisp, so that looks a little different. The country borders are also a little less crisp, but, by and large, often these things are very subtle. Even though a high-end graphics card doesn’t have any problem with these issues, when you start to talk about an integrated graphics chip, some things can slow it down a lot.”
This is an area where Intel GPA delivers exceptional value: detecting and resolving the kinds of issues that can slow gameplay on more modest platforms and open games up to a much wider range of PCs. For most developers, it should seem obvious that reducing a 2K x 2K texture would improve performance. The value of Intel GPA is that it provides context to developers, so that they can focus optimization efforts on the performance bottlenecks that really matter. Using Intel GPA, developers can visualize which specific draw calls are eating up a disproportionate amount of frame rendering time and target those draw calls in a priority that makes the most sense.
As a result of the information acquired from the analysis, the team made several changes to improve performance on the Intel Core i5 processor. These changes boosted the frame rate from a virtually unplayable 5 frames per second (fps) to a respectable 30 fps with a good overall visual experience. The development team implemented the following changes:
Set the target resolution to approximately 720p Turned off idle animations Reduced terrain tessellation and texture detail slightly Reduced shadowmap/border overlay from 2K x 2K to 1K x 1K Turned off LEAN mapping Turned off distance fog Enabled the cloaking driver The resulting frame rendering (Figure 1, right) shows the distinct changes that boosted playback performance considerably.
Using Intel® GPA Effectively
Configuring gameplay for different platforms is easier with the analytical insights provided by Intel GPA. “When we launch the game,” Dan said, “we have a whole host of settings, and we detect what configuration you have and then make adjustments. In several cases we can shift which processor—CPU or GPU—we do something on, moving code from the CPU to the GPU as required.” The analysis provided by Intel GPA makes it possible to balance workloads effectively, giving gamers the best experience possible given the capabilities of their PC.
“I use the graphics side of [Intel] GPA primarily for two things,” Dan noted. “One is that you can get very accurate time-ins on different systems. For example, you can get the amount of cost that the water takes up. We ran Civ5 through [Intel] GPA when we did a breakdown of a scene and immediately saw that the water was 50 percent of our frame time on the integrated chip.”
“That meant,” he continued, “that by dealing with that problem, we could double our frame rate. So, we said, OK, let’s back off what we are doing on water and sure enough, the frame rate jumped way up. You can quickly find your performance issues at a very granular level with just a few minutes of work.”
Dan explained that software optimization has two halves. Each half is approximately the same level of difficulty. One key focus involves optimizing, but, as a part of that, you need to figure out what is slowing you down, which is a problem that Intel GPA can help solve. Although the tool doesn’t perform the optimization, it pinpoints what is causing the code to run slowly—not an easy problem to solve in some cases.
“Secondly, as a debugging tool,” Dan said, “Intel GPA is pretty useful for just walking through everything that happened and finding out why something didn’t render correctly. I’ll get a pixel on the screen that is wrong or something is broken and it is not rendering correctly. [Intel] GPA can capture the whole frame so I can get a tree and walk through everything that happened on that pixel or set of pixels and figure out what could be causing the problem.”
Yannis Minadakis, a senior graphics engineer at Intel, has followed the progress on the game engine enhancements and provided engineering assistance on the redesign. “The thing that impressed me the most about Firaxis is,” he said, “they took the long-term view on developing their engine. They said, well, we want to scale to any number of cores and divide up our work in such a way that it doesn’t matter if you have six cores, four cores, eight cores. It will all run consistently and at the maximum throughput of your system. That is the kind of architecture that we need going forward across all applications on the PC because you will have these very different configurations and the experience must scale with the customer’s expectations.”
Because of this tuning and optimization work, when Civ5 hits the streets in late 2010, PC gamers worldwide— casual, mainstream, and enthusiast—will enjoy a great Civ5 experience.
Other Intel® Tools for Building Efficient Code
Besides Intel GPA, Firaxis regularly relies on other tools from the Intel Software Development Products suite to increase the productivity of the development team and tighten the code. Intel VTune Performance Analyzer, an invaluable contributor to this development effort, streamlined the threading process and provided fine-grained data on task performance that helped the Firaxis team re-engineer their architecture to better scale to current and future processor hardware.
“We use [Intel] VTune [Performance Analyzer] quite extensively throughout the project,” Dan Baker said. “Our Lead Systems Engineer, Tim Kip, basically runs an analysis every week. Every system that comes in, we run it through VTune and we look at it extensively to make sure that we are executing high instruction efficiencies. We have tried to make sure that we are always retiring large numbers of instructions.
“If we see code that comes in,” Dan said, “and we are not retiring at least one instruction per cycle, we ask: what is wrong with this code? Are we missing cache? Are we arranging our data badly? Are we doing a lot of unnecessary branching? We go through a bit of analysis [with VTune]. We found that by doing this very early in the project and as every system comes in, we have been able to eliminate large numbers of performance traps that most studios fall into.”
The wonderful thing about [Intel®] GPA 3.0 is that it works on all the PC platforms, so whether you are optimizing for Intel® architecture and Intel® graphics or you are optimizing for Intel architecture and other graphics solutions, this information is valuable for you and helps you optimize for all the platforms you are targeting for your game.
—YANNIS MINADAKIS, SENIOR GRAPHICS ENGINEER, INTEL
Intel Threading Building Blocks also proved useful to the Firaxis development team for optimizing the parallelization of the game code to achieve maximum scalability. “We do use everything that we can get our hands on to make sure that our game is running as fast as we can,” Dan noted.
Intel GPA Prospects for the Future
Game developers trying to identify where performance hotspots or visual effects may be slowing graphics rendering performance can benefit from the analytics delivered by Intel GPA. Experts and novice developers alike can take advantage of the analysis and optimization capabilities of Intel GPA, making it possible to extend games to a wider range of PCs and appeal to customers on a variety of platforms.
The development team at Intel envisions ongoing enhancements to Intel GPA and intends to offer features and new tools so that developers can fully exploit the capabilities of today’s Intel® graphics products as well as future ones. Feedback and collaboration with the development community helps shape each successive release of Intel GPA to better meet the performance and optimization needs of developers. Among the product enhancements underway for Intel GPA are improved support for OpenGL* and Microsoft DirectX* 11. If you have features or additions that you’d like to see in Intel GPA, let your voice be heard on the Intel GPA Support Forum (http://software.intel.com/en-us/forums/intel-graphics-performance-analyz...).
Sign up today for Intel® Visual Adrenaline magazine: http://va.softwaredispatch.intel.com »