nulstein v2 plog - parallelizing at the outer loop

(note: this is slide 10 of the nulstein plog)

The traditional approach to tuning code is to look for hot spots and squeeze them. It usually turns out to be the inner loop of some iterative process: loops or recursions. Parallelizing code is the contrary: looking for large iterations and breaking them into chunks, at the outer loop (opposites are exactly the same, but for one thing...)

If you look at how you code for the GPU, you provide shader code for what ultimately are inner loops of bigger processes. You don't loop on vertices, you don't rasterize triangles or loop on pixels, all the outer work remains hidden and this is what makes it possible to have everything run in parallel efficiently. Here, we want to do the same: the engine is responsible for running the outer loops and the game code only deals with each entity's behaviour. This implies that there must be no special-case: the world must be an entity, rooms must be entities, sounds, the UI, everything must be implemented as entities. And we shouldn't be scared about that, the notion of entity in nulstein is really a boundary, an interface between the engine and the game. The definition of an entity comes down to "something that updates and/or draws".

In nulstein, each entity holds two pieces of data. First, its State, which together with all the other entities represents all of the game's state. And second, it's Mind, which is an area totally private to itself. I'm not talking "C++ private" here, where every instance of the same class could have access to the data, I'm saying data that only this instance would have access to. I'm also not implying that there is any language construct to help enforce this rule, it must be thought of as part of a contract: "nobody reads anybody else's mind, ever". This rule makes it safe for an entity to read from and write to its Mind at any point in time because, the entity being the unit of subdivision, we can assume there is only one thread accessing it at a time.

With this, you have the three pillars that serve as foundation for this engine:

  • a task scheduler to dispatch work

  • entities that constitute the game state and define its behaviour

  • dissociation between State that is shared, and Mind that is not


Next time, the Update Phase
Spoiler (slides+source code): here



For more complete information about compiler optimizations, see our Optimization Notice.


jerome-muffat-meridol (Intel)'s picture

Are you suggesting we could, knowing the update time of an entity, predict its update time for the next frame? That's not true at all. The typical kind for of situation you have in a game is that entities have a minimal update time, carrying out some routine task, then something happens and they need to make a decision (and that can consume orders of magnitude more CPU), then they are in a new routine. Even during these transitions, the amount of work to do is usually not worth spreading over several threads, it's easier to load-balance at the entity level and, in practice, TBB does a really good job at spreading the work.

Now, it is true that we could collect information about how long these calls last, and use that information later to try and process the entities that are known to spike earlier than the others, but is the time spent sorting entities going to be significantly less than the time saved? This is less than certain... My recommendation is to keep the engine simple, and let the game developer identify tasks that turn out to be too long and decide if he wants to parallelize his code, or spread the work over several game frames (which might be both easier and result in a more fluid frame rate).

note: entities in nulstein are not really "nested", they do have dependencies but these are not carried over from one frame to the next, as we'll see in a couple of slides. Forcing everything to be dynamic is more powerful than forcing everything to be static, and trying to provide the two possibilities seems to me like added complexity without a clear benefit (let alone the fact that any game designer saying "yup, always like that, never changes" is bound to come back once it seems all nicely implemented "it never changes except, of course, when [shot][wielder dead][pick your favourite exception]" ;) ).

jimdempseyatthecove's picture

Since entities are apparently nested, and the application is a simulation (many iterations advancing entity state), the entity itself (when appropriate) can keep track of the compute time for the entity (and/or completion time). During simulation the entity can gather heuristics (e.g. your guard looking at sky or horizon). Based on these heuristics, the entity itself can decide if to run as one thread or multiple threads (number, if any, dependent nature of entity). Due to the nesting nature of the entities, each entity can perform a reduction operator (without interlocks) to construct the task list for the next iteration. Once "universe" entity has completed a time state, it now contains the task lists for the next iteration. IOW, there is no thread scheduling in the sense of a tasking model.

Jim Dempsey

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.