4,392 Posts served
10,722 Conversations started
- Academic

- Android

- Art, Music, & Animation

- Embedded Computing

- Events

- Game Development

- Graphics & Media

- Intel SW Partner Program

- Intel® AppUp Developer Program

- Manageability & Security

- Mobility

- Open Source

- Parallel Programming

- Performance and Optimization

- Power Efficiency

- Site News & Announcements

- Software Tools

- Association for Computing Machinery TechNews (ACM)
- Go Parallel! (Dr. Dobbs)
- HPCwire (Tabor Communications, Inc.)
- insideHPC (John West)
- Joe Duffy's Weblog (Microsoft)
- Microsoft Parallel Programming Development Center (Microsoft Germany)
- MultiCoreInfo.com
- scalability.org (Scalable Informatics)
- Software Dev Blog (Intel Germany)
- Soft Talk Blog (Intel United Kingdom)
- The Moth (Microsoft)
nulstein v2 plog - divide and surrender
By Jérôme Muffat-Méridol (Intel) (21 posts) on August 27, 2010 at 8:05 am

(note: this is slide 4 of the nulstein plog)
I like calling the time when I started writing games "the good old days", it was in the nineties, DOOM's era, I had quit doing IT development work for hire to join a crazy team, doing creative stuff, pushing machines and people to the limits of what they could achieve. Everything felt heroic, there were no ready-made bricks, you'd start from almost scratch every time. As Hervé Lange used to put it: "imagine you're making movies and building the camera is part of the process". It felt good and it feels like long time ago now (it is) and, to be fair, the only thing that's now false in this statement is that you don't have to build the camera any more (but still, you can...)
Back on topic. The sequence of operations in a video game is very straight forward: frame, frame, frame, frame... We keep rendering frames, in a loop, evolving based on the players' inputs and the state of the previous frame (ie "render-farm" style is not the way to go). The work for a frame splits in two main parts: updating game state (advancing time) and drawing it. And inside these parts, we find modules that are loosely connected : collision/physics, AI, audio, scripted mechanisms, visibility/culling, the actual rendering and more... When everything was sequential, it was possible to use the order in which these were processed to make our lives simpler, but order wasn't so critical. So, when CPUs with multiple cores started to appear, the natural thing to try was to split per module and spread work like that. GPUs had been around for a while too, and the idea of a pipeline that would go from one core to the next to then move on to the GPU felt like it made sense.
This approach sorts of work with two cores: one core runs the update side of things, the second deals with drawing things and off everything goes to the GPU. Nice and easy breakdown, data always flowing in the same direction, looks cool on the paper. Until you start tuning... In the really old days when the term "3D accelerator" hadn't even be invented, everything happened on the same chip, it was like juggling with one ball (I can do that). Add a GPU: depending on what you do you might be CPU bound or GPU bound and the optimal use of the machine is to get both to spend about the same amount of time working, you can achieve this by varying the amount of graphics detail, it's as easy as juggling with two balls (I can do that too!). Now, in a pipelined model like I was describing, you want the core that updates the game, the one that draws the frame and the GPU rendering polygons to all spend about the same amount of time working, and I'll call that juggling with three balls (which I can't do, but hope to achieve one day). But then there are four cores, six cores, hyper-threading, and it's clear that pipelining won't work.
The next logical step is to keep the principle of functional decomposition and get rid of the pipeline idea only. Also, because some modules can leverage parallelism internally, like physics that can process islands separately, it becomes possible to spread work a little more easily. Data flow is more complicated but the real difficulty is synchronisation between the various modules. Together with the need to balance work between all cores, this converges to require the engine to split work in ever smaller jobs. The ideal chunk of work being one that executes in a reasonably small amount of time and never needs to wait for any other...
This is how one goes from moving AI to a separate thread, to splitting it in sub-groups, and again until reaching the level of the individual and moving on to split behaviours in separate "aspects". And then only do you finally have jobs that never wait (and are small). Cores can be kept busy and dependencies are implicitly managed by jobs firing as their prerequisites happen. The main difficulty that arises now is complexity, you have to think in terms of objects at a level of granularity below the individual unit (from the player's point of view). The resulting jobs are also likely to end up being so small they become too small and the overhead of managing them starts to weight on performances. Like I said last time: this is hard!
I think the best description available online of a system that works is the one about Dice's Frostbite engine. Slide 26, the CPU job graph looks scary at first and probably isn't that bad but it does illustrate really well both the need for breaking things down in small blocks and for keeping track of dependencies. Slide 29 shows that even in one of the most elaborate engines today, it is unavoidable to have cores going idle waiting for other cores to finish.
But do we have to go down to that level of complexity? In this project, I'm showing an alternative approach that attacks the problem from a different angle. But, this post is already long, you'll have to wait for next time...
Next time we'll see how work is subdivided in nulstein
Spoiler (slides+source code): here
Categories: Game Development, Parallel Programming
Tags: nulstein
For more complete information about compiler optimizations, see our Optimization Notice.
Comments (0)
Trackbacks (3)
- nulstein v2 plog – parallelism, not quite there yet – Intel Software Network Blogs
August 27, 2010 10:06 AM PDT - nulstein v2 plog – back to sequential ! – Intel Software Network Blogs
August 30, 2010 4:03 AM PDT - nulstein v2 plog – back to sequential ! - News IT&C
August 30, 2010 1:00 PM PDT

