The Inner Game of Concurrency Programming: Optimizing for Intel's Dual Cores

by Alexandra Weber Morales


Abstract

While much has been written about the whys, hows, and whether-or-not-to's of threading, there's been little focus on the most productive way to pound a paradigm shift like this one into place. Here's a peek into the Zen of threading for game developers.


Introduction

In the Spring of 2003, object-oriented programming and refactoring guru Martin Fowler published "Errant Architectures" in my old, now dead, magazine, Software Development. In it, he posits his "First Law of Distributed Object Design: Don't distribute your objects!" Rather, he recommends, try clustering on multiple processors: "Put all the classes into a single process and then run multiple copies of that process on the various nodes. That way, each process uses local calls to get the job done and thus does things faster. You can also use fine-grained interfaces for all the classes within the process and thus get better maintainability with a simpler programming model."

For much of the last decade, the developers I met felt that code maintainability, as Fowler suggests, was more important than optimization. "Hardware is today, software is forever," is the truism many clung to as codebases spread their tentacles ever wider. Once the OO paradigm had been digested, however, it became clear that no single concept could constrain complexity's curve. Tools, methodologies and new abstractions such as the Unified Modeling Language or design patterns were proposed. Rivalries sprung up between those who were technology focused and those who saw human interaction and learning as the greatest challenge. And just three years ago, optimizing code was lower priority than enhancing programmer productivity through tools or abstractions, at least in enterprise development circles.


Closed for Renovation

All that's about to change. As Herb Sutter put it in his 2005 article, "The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software," developers can no longer merely ride Moore's Law to streamline application performance. While Fowler's patterns for enterprise application development continue to be critical, a generation of developers, Sutter and others believe, must shed their distaste for distribution and embrace threaded architectures. Nowhere is this imperative more obvious than in the gaming space. While much has been written about the why's, how's, and whether-or-not-to's of threading, there's been little focus on the most productive way to pound a paradigm shift like this one into place.

"People will say, 'this is not new-it's been around for decades,'" says Paul Lindberg, a Portland, Ore.-based senior software engineer in Intel Global Developer Relations, Client Entertainment Enabling. "High-performance computing and enterprise developers have been doing distributed computing for a long time, but the consumer app developer community has not been involved. That's a huge body of developers compared to those programming enterprise server-class applications. There's an existing body of knowledge-but it's not all in the hands of the game developers."

Nor are current languages and tools necessarily up to the task. Ask Tim Sweeney, CEO and chief architect of Epic Games. While Swe eney has been an eager proponent of optimization for Intel dual core technology, he warns things will only get more unwieldy for programmers who, in just a few years, will face CPUs with scores of cores and hardware threads. At the 2006 POPL conference, Sweeney begged for new programming languages better equipped for concurrency: "If we are to program these devices productively, you are our only hope!"


Words to Code By-Concurrently

There are a number of "naïve first-time threading mistakes," according to Intel's Paul Lindberg. Avoid these and you're on your way:

  • Trying to thread everything: "There's some coordination overhead involved; it's silly to create 200 threads," Lindberg says.
  • Ignoring dependencies throughout the code. "Say you've got variable a, and later something depends on it. In some cases you can use classic code optimization techniques to improve those things, such as loop unrolling."
  • Not carefully using critical sections. "There's this notion that you can have separate regions of code. You can protect that by wrapping a critical section of code around that region. You have to watch the ownership of that lock, too."
  • Threading at very low or high levels of abstraction. "In gaming code, often we would like to thread at the highest level of abstraction, but fundamental dependencies make that impossible. You end up with unsafe use of buffers and global variables. But you also don't want to thread at the very lowest level because there are startup and per-instance overheads for threading," Lindberg explains.
  • Not understanding messaging. "Work queues and message queues are our primary building blocks," says Epic CEO Tim Sweeney. "Most people think about threading in terms of locks and mutexes, but you really don't want to scatter those throughout thousands of objects in your program, lest deadlock scenarios become hopelessly difficult to visualize and guard against."

 

Meanwhile, Epic's not letting the grass grow underneath their Unreal Engine. It's a big leap, Sweeney tells me, but worth it: "Concurrency will require a much more tumultuous transition than object orientation. Large-scale concurrency requires moving away from the imperative procedural programming model-away from the 'programming by side-effects' style that pervades mainstream programming today. The natural setting for concurrency is a pure functional language, with infrequent imperative and transactional features layered on top, as is done in Haskell. That requires a huge change, not just in abstraction facilities, but in the very way we formulate algorithms. I expect the move will be as dramatic as it was a couple decades ago when programmers moved from assembly language to C."

So how did his team achieve threading expertise? "It was a learning experience! Our rendering architect began by writing a new 3D rendering interface that runs concurrently with the other engine systems, while a couple of other programmers tackled concurrency in a number of simpler systems."

"This was our team's first major experience with multithreading, and it was a success in that we scaled our engine up to three to four threads with linear performance gains," Sweeney enthuses-with a caveat: "In all we only touched about 10 percent of our existing codebase. Achieving much greater concurrency-the tens of threads that will be available early next decade-would be a very different experience."


Patterns for Performance

The paradigm may be novel, but the multi-core optimization payoff is oh-so-sweet, gamers find. And as more makers find ways to send physics engines or AI to separate threads, producing spine-chilling visual effects and behaviors, knowledge about how concurrency works best is spreading among developers.

"It's funny you should call," says Douglas C. Schmidt, author of "Pattern-Oriented Software Architecture, Patterns for Concurrent and Networked Objects, Vol. 2" and associate chair of computer science at Vanderbilt University in Nashville. "Just last week I was giving a tutorial for a massive multiplayer gaming company. They're really interested in patterns for concurrency. It turns out that the Proactor pattern is a nice model or design template of how to go about building high performance for gaming on Windows platforms. It's a thread pool concurrency model with overlapped I/O on Windows. Listening to I/O completion ports using the Proactor pattern can be a very effective design."

If patterns are useful, could UML diagrams be far behind? I ask modeling expert and author Scott Ambler if state charts could be the key to simplifying concurrency concepts. The intrepid globe-trotter e-mails me back instantly from his current position (Siberia):

"The answer is, it depends on the developer. If they are visual thinkers, and if they understand state charts, then there is a good chance that they can help. The challenge is that everyone thinks differently and has different backgrounds, so there's no one right answer. This is a fundamental concept that I focus on in Agile Modeling, but many traditionalists keep striving for the 'one right methodology to rule them all'-good luck with that."


Are Languages the Answer?

Is Fowler's first law of distribution contradicted by Sutter's "concurrency revolution"? I ask Ambler. "I think that his first law should be modified for concurrency-do it only if you absolutely have to. Remember when Java introduced threading in the mid-1990s? A lot of people thrashed on it and came to the conclusion that they should use it only when it's absolutely needed. However, game programming might be one of those few situations where it is absolutely needed."

But the usability, type safety and garbage collection inherent in Java and C# have not played a role in game concurrency. "We haven't seen a lot of signs that people are using interpreted languages for game programming," Lindberg says. "It's mostly C++. There's Managed DirectX, but we haven't seen mainstream games built on it yet. Managed runtimes don't solve any threading problems. Say I asked a game team to rewrite a game in C#. This doesn't change the nature of the problem in any way. Threading done well is all the way at the other level of abstraction. We'll see in five years if there will be some .NET games."

There are mo vements afoot to raise the abstraction level and reduce the complexity of concurrency, however. "Transactional memory is by far the most interesting and practical abstraction here. Locks and mutexes are useful for very low-level programming, but don't scale to high-level. Java's synchronized methods were a horrible mistake," Epic's Sweeney argues. On the other hand, he claims, is the move toward "a more concurrency-friendly set of building blocks for programs." The language he's partial to here is the aforementioned Haskell.

"This approach looks less attractive on the surface because they take away significant features you're accustomed to and make up for the lost power with additional features (such as far more versatile recursion capabilities). So this requires a complete change rather than an incremental refinement. But, ultimately, this is the only way software will scale up to the hundreds of cores of eventual future CPUs."


Shift Happens

While increased discussion and concurrency experience will help, game developers can also turn to today's tools to prevent deadlocks, race conditions and the like. While generic debuggers work for many, thread checking aids such as OpenMP, Intel VTune Performance Analyzer, and Intel Threading Tools are other resources.

It's funny how the software world has struggled to find a post-OO paradigm. Until recently, the invention of the World Wide Web was what most computer scientists shruggingly hailed as the next big thing. Could multi-core processors and concurrent programming be that big, big thing they were waiting for?


Para obter informações mais completas sobre otimizações do compilador, consulte nosso aviso de otimização.