Download Code Sample
Using swap chains in D3D12 has additional complexity compared to D3D11. Only flip model  swap chains may be used with D3D12. There are many parameters that must be selected, such as: the number of buffers, number of in-flight frames, the present SyncInterval, and whether or not WaitableObject is used. We developed this application internally to help understand the interaction between the different parameters, and to aid in the discovery of the most useful parameter combinations.
The application consists of an interactive visualization of the rendered frames as they progress from CPU to GPU to display and through the present queue. All of the parameters can be modified in real time. The effects on framerate and latency can be observed via the on-screen statistics display.
Figure 1. An annotated screenshot of the sample application
These are the parameters used to investigate D3D12 swap chains.
|Fullscreen||True if the window covers the screen (i.e. borderless windowed mode). NOTE: Different than SetFullscreenState, which is for exclusive mode.|
|Vsync||Controls the SyncInterval parameter of the Present() function.|
|Use Waitable Object||Whether or not the swap chain is created with DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT|
|Maximum Frame Latency||The value passed to SetMaximumFrameLatency. Ignored if “Use Waitable Object” is not enabled. Without waitable object, the effective Maximum Frame Latency is 3.|
|BufferCount||The value specified in DXGI_SWAP_CHAIN_DESC1::BufferCount.|
|FrameCount||The maximum number of “game frames” that will be generated on the CPU before waiting for the earliest one to complete. A game frame is a user data structure and its completion on the GPU is tracked with D3D12 fences. Multiple game frames can point to the same swap chain buffer.|
These parameters were included in the swap chain investigation. However, these parameters had fixed values. As their value was fixed, we additionally list why each value was fixed and not variable.
|Exclusive mode||SetFullscreenState is never called in the sample because the present statistics mechanism does not work in exclusive mode.|
|SwapEffect||The value specified in DXGI_SWAP_CHAIN_DESC1::SwapEffect. Always set to DXGI_SWAP_EFFECT_FLIP_DISCARD. DISCARD is the least specified behavior, which affords the OS the most flexibility to optimize presentation. The only other choice, DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL, is only useful for operations which involve reusing image regions from previous presents (e.g. scroll rectangles).|
BufferCount is the number of buffers in the swap chain. With flip model swap chains, the operating system may lock one buffer for an entire vsync interval while it is displayed, so the number of buffers available to the application to write is actually BufferCount-1. If BufferCount = 2, then there is only one buffer to write to until the OS releases the second one at the next vsync. A consequence of this is that the frame rate cannot exceed the refresh rate.
When BufferCount >= 3, there are at least 2 buffers available to the application which it can cycle between (assuming SyncInterval=0), which allows the frame rate to be unlimited.
FrameCount is the maximum number of in-flight “render frames,” where a render frame is the set of resources and buffers that the GPU must perform the rendering. If FrameCount = 1, then the CPU will not build the next render frame until the previous one is completely processed. This means that FrameCount must be at least 2 for the CPU and GPU to be able to work in parallel.
Latency is the time between when a frame is generated, and when it appears on screen. Therefore, to minimize latency in a display system with fixed intervals (vsyncs), frame generation must be delayed as long as possible.
The maximum number of queued present operations is called the Maximum Frame Latency. When an application tries to queue an additional present after reaching this limit, Present() will block until one of the previous frames has been displayed.
Any time that the render thread spends blocked on the Present function occurs between frame generation and frame display, so it directly increases the latency of the frame being presented. This is the latency which is eliminated by the use of the “waitable object.”
Conceptually, the waitable object can be thought of as a semaphore which is initialized to the Maximum Frame Latency, and signaled whenever a present is removed from the Present Queue. If an application waits for the semaphore to be signalled before rendering then the present queue is not full (so Present will not block), and the latency is eliminated.
The results of our investigation gave three different “best” values depending on your requirements. These are the parameter combos we thought are best suited for games.
Game mode is a balanced tradeoff between latency and throughput.
Classic Game mode
This implicitly happens under D3D11 with triple buffering, hence “classic.” Classic game mode prioritizes throughput. The extra frame queueing can absorb spikes better but at the expense of latency.
The absolute minimum amount of latency without using VR-style vsync racing tricks. If the application misses vsync, the frame rate will immediately drop to ½ refresh. CPU and GPU operate serially rather than in parallel.
The source code includes a project file for building the sample as a Windows 10 Universal App. The only difference in the Direct3D code is calling CreateSwapChainForCoreWindow instead of CreateSwapChainForHWND.
If you wish to try the app version without compiling it yourself, here is a link to the Windows Store page: https://www.microsoft.com/store/apps/9NBLGGH6F7TT
1 - “DXGI Flip Model.” https://msdn.microsoft.com/en-us/library/windows/desktop/hh706346%28v=vs.85%29.aspx
2 - ”Reduce Latency with DXGI 1.3 Swap Chains.” https://msdn.microsoft.com/en-us/library/windows/apps/dn448914.aspx
3 - DirectX 12: Presentation modes in Windows 10. https://www.youtube.com/watch?v=E3wTajGZOsA
4 - DirectX 12: Unthrottled Framerate. https://www.youtube.com/watch?v=wn02zCXa9IU
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804