(note: this is slide 20 of the nulstein plog)
Once we have an up to date world, we move on to rendering it. Let's start by looking at how I attacked the problem for DX9, in the previous version of nulstein. In this context, all draw calls, all render state changes, everything needs to be submitted through the main thread, so we know there has to be a render phase that is purely serial. Still, we can spread the work related to deciding what to draw and prepping it: occlusion culling, LOD selection and setup, matrix palettes, particle systems, etc.
It all starts with Christer Ericson's solution to keeping things in order (Order your graphics draw calls around!) which really is about generating them in no particular order and sorting afterwards. Please go read this article if you haven't already, as there is no point in me paraphrasing all this good wisdom.
The result is
- subphase 1: entities register (key,parm) pairs for what they need to draw
- subphase 2: sort
- subphase 3: talk to DX
The first phase can be a big nested set of loops depending on how your engine works, and these loops have a property we like: one iteration is totally independent of the next as an entity doesn't need information from another to decide what it wants to draw where (this should all have been calculated during the update phase). Also, because we're going to sort the list, we can have one list per thread and deal with adding items to it without a need for locks. Same pattern as before, zero contention everything happily happens simultaneously.
Sorting a bunch of such small keys is an easy job, especially that you don't really want to be sending tens of thousands of draw calls over DX... So, not only can this be spread over available cores, the item count is small enough that it's unlikely this phase will cause performance issues.
The last phase is serial:
The important thing to note is how the engine can manage most of the pipeline state by simply comparing the current key with the last one sent.
If the render target fields are different, change render target.
Translucency type changes, change states.
Material ids are different? set material up.
Then, when calling back the entity, it really only has to setup and dispatch its DrawPrimitive(s) which makes its job really simple.
Things get rendered in order, state changes are minimal and we still give entities the flexibility to run whatever code makes sense for them, which in most cases is only a few generic functions (mesh, light, billboard...)
An extension to this way of doing things is auto-generating instanced draws, and it would make perfect sense in this demo as it features only six models (cube-hi, cube, cube-lo, UFO-hi, UFO, UFO-lo). Add a few bits to the key to identify the template and the engine would just need to look keys ahead and call an alternate rendering routine when a sequence is detected. This would both make instancing relatively transparent to programmers, and keep the benefit of ordering, especially when translucency is involved.
It's all very simple and that's why I like this approach so much. But how well does it perform ?
Next time, performance of nulstein 1
Spoiler (slides+source code): here