You face a constant balancing act between features and performance as you build your game. The GPU is the most obvious bottleneck you'll encounter as you add graphical effects and features to your game, but your game can also become CPU-bound.
In addition to the usual CPU loads from game logic, physics and artificial intelligence (AI) calculations, the graphical effects that make your game feel immersive can be CPU-intensive, typically making the bottleneck shift back and forth between the GPU and the CPU throughout development of the game itself.
Modern microprocessors have great single-core performance, but depend on multiple CPU cores to give better overall performance. To use all that available CPU compute power, applications run fastest when they're multithreaded so that code runs concurrently on all available CPU cores.
This video showcases Ashes of the Singularity*, a recent real-time strategy (RTS) game from Oxide Games and Stardock Entertainment. You'll see how it delivers excellent gameplay and performance on systems with more CPU cores.
Figure 1: Ashes of the Singularity* shows how a well-threaded game can get better frame rates on systems with more CPU cores.
By building a new engine and using Direct3D* 12, Oxide made it possible for Ashes of the Singularity to use all available processor cores. It runs great on a typical gaming system and scales up to run even better on systems with more cores. You can use these same techniques in your game to get the best performance from your CPU.
To get the best frame rates in Ashes of the Singularity, the Oxide team used Direct3D* version 12. Earlier versions of Direct3D run well, but have a few bottlenecks. In version 12, the API incorporated several changes that remove these bottlenecks that tend to slow down games: multiple objects are now simplified into pipeline state objects; and a smaller hardware abstraction layer minimizes the overhead in the API and makes it possible to remove resource barriers from graphics drivers.
It's possible to create commands from multiple threads in Direct3D 11. However, there's so much serialization required that games never got much speedup from multithreading with the older version. With the API changes in Direct3D 12, this fundamental limit doesn't exist anymore. Without that serialization, it's now practical for games to fill command lists from multiple threads and have much better overall threading.
By taking advantage of these changes to the API, Ashes of the Singularity runs best on Direct3D 12.
Oxide wanted to create a more complex RTS than any built before, with support for larger armies with more units and larger maps. To build this next-generation RTS game, the development team knew that they needed a new game engine; existing game engines couldn't support the unit counts or map sizes they wanted. They started from scratch to build the Nitrous Engine* to make Ashes of the Singularity possible.
Any new engine must first deliver high-performance rendering. With that in mind, the Nitrous Engine has well-tuned support for the latest graphics APIs, multiple GPUs, and async compute.
The game supports many units for each player, as well as large maps. Simulating the physics of this many in-game objects across a large terrain generates a large CPU load. More importantly, the AI workload is massive since the behavior of each unit needs to be simulated. There's also an emergent property from the large number of units the game supports. With more units, it becomes harder for the player to directly manage units. Oxide built a layered approach to AI where armies cooperate in sensible ways that use the relative strengths of each unit while paying attention to their relative weaknesses.
To make this kind of scale possible, Nitrous threaded their engine by breaking work up into small jobs. The job system is designed for flexibility, and the small jobs can be spread out among as many CPU cores as possible. Oxide carefully tuned the job scheduler for speed since most Intel® processors include Intel® Hyper-Threading Technology, the scheduler also looks for locality between jobs. Jobs that share cached data are scheduled on different logical cores of the same physical CPU core, which is an approach that yields the best performance and job throughput.
Regardless of approach, there will always be a bottleneck somewhere when you add complexity to a game. As you develop your game, think about the relative CPU and GPU loads that you might expect. Understand how your game will work when the GPU is the bottleneck and consider how it will behave when the CPU is the bottleneck.
Intel® Core™ processors can help make your game shine like Ashes of the Singularity. As you design and optimize your game, target mid-range processors and design for scalability up to the most powerful processors.
By using the techniques we describe here, Ashes of the Singularity runs great on the best-in-class Intel® Core™ i7-6950X processor Extreme Edition, which has 10 physical CPU cores and a large cache for the best overall performance. With the work divided into jobs and a great GPU, the game's frame rate increases with more CPU cores. On identical systems with varying numbers of CPU cores, the frame rate improves steadily up to the max of 10 physical cores.
The game also includes some massive maps. Since the player and unit count get so large, the AI burden for a fully-outfitted set of players becomes huge. After careful tuning, Ashes of the Singularity allows these maps only for systems with large numbers of CPU cores (six or more) through the job scheduler, which automatically puts work on all available cores. This is a great approach for you to pursue: detect your system's core count with a function like GetsystemInfo() if you need to selectively enable features.
Although this game is mostly focused on ever-faster frame rates with more cores, there was a little extra CPU room for some bonuses. With more cores, Ashes of the Singularity will automatically enable advanced particle effects on some units as well as temporal motion blur.
Figure 2: Advanced particle effects on two large dreadnought units look awesome.
The particle effects give added visual impact, but they don't affect gameplay.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804