In Part 3 we discussed new efficiencies for Resource Bindings in Microsoft Direct3D* (D3D) 12. How resource hazards, lifetime and memory management tracking has been simplified. We are not done discussing the changes to resource binding however. There is one more important change to talk about. At the end of which we will finally see what the new D3D 12 render context looks like. Taking us closer to the “Console API efficiency and performance” goal we talked about in part 1. Focusing on more efficient utilization of the CPU’s multiple cores, and threads.
Redundant Resource Binding:
After analyzing several games the D3D development team observed that typically games use the same sequence of commands from one frame to the next. Not only the commands, but the bindings tend to be the same frame over frame. The CPU generates a series of bindings, say 12, to draw an object in a frame. Often on the next frame the CPU has to generate the same 12 bindings again. Why not cache those bindings? Give the developers a command that points to a cache so the same bindings can be reused?
In part 3 we talked about queuing. When a call is made the game believes the API immediately executes that call. That is not the case however. The commands are put in a queue where everything is deferred and executed at a later time by the GPU. So if you make a change to one of those 12 bindings we talked about earlier. The driver copies all 12 bindings to a new location, edits the copy then tells GPU to start using the copied bindings. Among those 12 bindings many are probably static values with only a few dynamic ones requiring updates. When the game wants to make a partial change to those bindings it requires the copying of all 12 bindings. Even if a change to only one is needed. That is a heavy CPU cost to make a small change. What’s the solution? Read on to find out.
What is a descriptor? Simply put, it is a piece of data that defines resource parameters. Essentially it is what is behind the D3D 11 view object. There is no operating system lifetime management, it is just opaque data in GPU memory. It contains type and format information, mip count for textures and a pointer to the pixel data. Descriptors are the center of the new resource binding model, or as Max McMullen, D3D Development Lead at Microsoft, puts it ‘the atom’.
In D3D 11 when a view is set it copies the descriptor to the current location in GPU memory that descriptors are being read from. If you set a new view in the same spot D3D 11 will copy the descriptors to a new memory location and tell the GPU in the next draw command to read from that new location. As has oft been repeated in this series D3D 12 gives explicit control to the game or application when descriptors are created, copied, etc.
Heaps are just a giant array of descriptors. You can reuse descriptors from previous draws or frames, you can also stream in new ones as needed. The layout is owned by the game and there is little overhead to manipulate the heap (it is just data copies after all). Heap size is dependent on the GPU architecture, older and low power GPUs size may be limited to 65k, while higher end GPUs are memory limited. For lower power GPUs exceeding the heap is a distinct possibility. So D3D 12 allows for multiple heaps and switching from one descriptor heap to the next. However switching between heaps in some GPUs causes a flush, so it is a feature best used minimally.
We have covered what descriptors and heaps are. The next question is how do we associate shader code with specific descriptors or sets of descriptors? The answer? Tables.
Tables are a start index and size in the heap. They are essentially context points, but they are not API objects. You can have 1 or more tables per shader stage as required. For example, the vertex shader for a draw call can have a table pointing to the descriptors in offset 20 through 32 in the heap. When work begins on the next draw the offset may change to 32 through 40.
Using modern hardware D3D 12 has the ability for multiple tables per shader stage in the PSO. You could have a table with just the things that are changing frequently call over call. With a second table containing the things that are static from call to call, frame to frame, etc. Doing this avoids copying all the descriptors from one call to the next, just the descriptors that are being changed frequently. However older GPUs are limited to one table per shader stage. Multiple tables is only possible on modern, and future hardware.
Bindless and Efficient:
Descriptor heaps & tables is the D3D team’s take on bindless rendering, save for it scales across PC hardware. From the low end SoCs to the high end discrete cards, D3D 12 supports them all. This unified approach opens up game developers to many binding flow possibilities. Everything from D3D11 streaming changes every frame, and caching static bindings for reuse in multiple frames. Additionally the new model includes multiple frequency of updates. Allowing for a cached table of static bindings for reuse and a dynamic table with data that is changing with each draw. Removing the need to copy all the bindings with each new draw.
Render Context Review:
Below is the render context with the D3D 12 changes discussed so far. It shows the new PSO and the removal of the Gets. Yet it still has the D3D 11 explicit blind points.
Now let’s remove the last of the D3D 11 render context and add in descriptor tables and heaps. Now we have a table per shader stage, or multiple tables as shown by the pixel shader.
The diagram above is the D3D 12 render context. Gone are the fine grain state objects, replaced with a Pipeline State Object. Hazard tracking and state mirroring is removed. Explicit bind points are gone in favor of application/game managed memory objects. Still, there is more to discuss about D3D 12.
Next up in Part 5: Bundles
Diagrams from BUILD 2014 presentation. Created by Max McMullen, D3D Development Lead at Microsoft.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804