In Part 2 we discussed the Pipeline State Object, or PSO, and the benefits of removing the hardware mismatch overhead. Now we move on to Resource Binding and how the Microsoft Direct3D* (D3D) 12 teams plans to reduce CPU overhead in this area. To being this discussion we need to quickly review the resource binding model used in D3D 11. Below is the render context diagram with the D3D 12 PSO on the left and the D3D 11 resource binding model on the right:
Looking to the right of each shader we see explicit bind points. An explicit bind model means that each stage in the pipeline has specific resources it can refer to. Those bind points in turn reference resources in GPU memory. These can be textures, render targets, buffers, UAVs, etc. Resource binding has been around a long time, it predates D3D in fact. The idea being to handle multiple properties behind the scenes and help the game efficiently submit rendering commands. However the system needs to do many binding inspections in 3 key areas. Let’s review these areas and how the D3D team optimized them for D3D 12:
Hazards are usually a transition, like moving from a render target to a texture. Say a game is rendering a frame and it is meant to be an environment map around that scene. The game finishes rendering the environment map and now wants to use it as a texture. During this work both the runtime and driver are tracking when something is bound as either a render target or texture. If they ever see something bound as both they will unbind the oldest setting and respect the newest. This way a game can make the switch as needed and the software stack manages the switch behind the scenes. In addition the driver must flush the GPU pipeline in order for the render target to be used as a texture. If not, if the pixels are read before they are retired in the GPU you will not have a coherent state. In essence a hazard is anything that requires extra work in the GPU to ensure coherent data. This is only one example, there are of course many more possible examples. For the sake of brevity we will use just this one.
As with other features and enhancements in D3D 12 the solution here is giving more control to the game. Why should the API & driver do all this work and tracking when it is one point in a frame? We are talking about roughly a 60th of a second where you are switching from one resource to another. By giving control back to the game all that overhead is reduced and the cost only has to be paid once, when the game wants to make a resource transition. Below is the actual resource barrier API they added to D3D 12:
D3D12_RESOURCE_BARRIER_DESC Desc; Desc.Type = D3D12_RESOURCE_BARRIER_TYPE_TRANSITION; Desc.Transition.pResource = pRTTexture; Desc.Transition.Subresource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES; Desc.Transition.StateBefore = D3D12_RESOURCE_USAGE_RENDER_TARGET; Desc.Transition.StateAfter = D3D12_RESOURCE_USAGE_PIXEL_SHADER_RESOURCE; pContext->ResourceBarrier( 1, &Desc );
The API is rather straight forward, it declares a resource and both its source and target usage. Followed by the call to tell the runtime and driver about the transition. Instead of something that is tracked across the frame render with lots of conditional logic, it becomes something explicit. Something the game already knows. With the added benefit of pulling out all that additional logic. Making it a single time per frame, or whatever frequency the game needs to makes transitions.
D3D 11 (and older versions) act like calls are queued. When a call is made the game believes the API immediately executes that call. That is not the case however. There is a queue of commands where everything is deferred and executed at a later time by the GPU. While this allows for greater parallelism and efficiency between the GPU and CPU it requires a lot of reference counting and tracking. All this counting and tracking requires CPU effort.
To fix this the game gets explicit control over the resource lifetime. D3D 12 no longer hides the queued nature of a GPU, the game knows it submis a list of commands that will be executed at a later time. A Fence API has been added to track GPU progress. The game can check at a given point (maybe once per frame) and see what resources are no longer needed, that memory then can be freed up. Instead of tracking resources for the duration using additional logic to release resources and free memory when no longer needed.
Resource residency management:
GPUs like to use a lot a video memory, more in fact that is often available. This can be more of problem with discrete video cards with a set amount of memory. So we have resource residency management that pages things in and out of memory as the commands flow. To the game it looks like there is unlimited memory, when really it is just memory management. Once again this comes at a cost of reference counting and tracking.
Just like resource lifetime, the game gets explicit control over resource residency. D3D 11 tracks the residency counts and control flow in the operating system. Typically though the game already knows that a sequence of rendering commands refers to a set of resources. In D3D 12 the game can now explicitly tell the operating system to move them into memory. Later on when the commands have been executed the game can have the resources removed from memory.
Once the 3 areas above were optimized it revealed a 4th spot that can be made more efficient (though with smaller performance gains). When a bind point is set the runtime tracks that so the game can call a Get later to find out what is bound to the pipeline. The bind point is mirrored, or copied. A feature designed to make things easier for middleware. So componentized software can find out the current state of the rendering context. Once resource binding was optimized mirrored copies of states was no longer needed. So in addition to the flow control removed from previous 3 areas, the Gets for state mirroring have also been removed.
This covers the improvements and efficiencies in D3D 12 resource binding. A lot of churn has been removed. Resource binding previously required control flow and logic to track, set and get resources within the runtime and driver. All that has been removed in favor of giving control back to the game. No longer is D3D going to hide the queued nature of the GPU. All that granularity in resource management now will happens in the game as it and in turn the game developer deems necessary.
Next up in Part 4: Heaps & Tables
Diagrams from BUILD 2014 presentation. Created by Max McMullen, D3D Development Lead at Microsoft.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804