Mike Apodaca, Sumit Bhatia, Matthew Goyder, John Gierach, Rama Harihara, Anupreet Kalra, Yaz Khabiri, Pavan Lanka, Katen Shah, Bill Sadler, Prasoon Surti, and Travis Schuler contributed to the developer guidelines provided in this document.
This document provides guidelines for developing and designing a virtual reality (VR) application and obtaining optimal performance. This guide is based on performance characterizations across several VR workloads and describes common bottlenecks and issues. It provides solutions to address black-and-white-bound choice-of-texture formats and fusing shader passes, and instructions on how to use post-anti-aliasing techniques to improve the performance of VR application workloads.
|Triangles / Frame||200‒300 K visible triangles in a given frame.1
Use aggressive culling of view frustum, back face, and occlusion to reduce the number of actual triangles sent to the GPU.
|Draws / Frame||500‒10001
Reduce the number of draw calls to improve performance and power. Batch draws by shader and draw front-to-back with 3D workloads (refer to 3D guide section).
|Target Refresh||At least 60 frames per second (fps), 90 fps for best experience.|
|Resolution||Resolution of head-mounted display (HMD) can downscale if needed to hit 60 fps but cannot go below 80 percent of HMD resolution.1
Dynamic scaling of render target resolution can also be considered to meet frame rate requirements.1
|Memory||180‒200 MB per frame (DDR3, 1600 MHz) for 90 fps.1|
1This data is a work in progress and is to be used as a placeholder.
Uncompressed formats—sRGB and HDR—consume greater bandwidth. Use linear formats if the app becomes heavily bandwidth-bound.
We recommend that you compress textures in the geometry and use compression formats, such as DXT.x.
For example: In Unity* use the following screenshot (Project > Models > Textures) to compress textures:
For non-shared resources, to save on memory bandwidth, do not use the D3D11_RESOURCE_MISC_SHARED flag to enable compression on the resource to save on memory bandwidth.
To avoid an unnecessary read/modify/write situation, the use of Render Target Write Mask (D3D11_RENDER_TARGET_BLEND_DESC) for all the channels (RGBA) is encouraged instead of using individual color components.
Lighting mode controls the lighting precomputation and composition. To save on computation during runtime, we recommend using Mixed or Baked mode instead of real time. For more information, refer to the Unity documentation: https://docs.unity3d.com/Manual/LightModes.html
The use of R10G10B10A2 over R16G16B16A16 and floating point formats is encouraged.
Filtering modes, like anisotropic filtering, can significantly impact performance, especially with uncompressed formats and HDR formats.
Anisotropic filtering is a trade-off between performance and quality. Generally anisotropic level two is recommended based on our performance and quality studies. Mipmapping textures along with anisotropic levels add overhead to the filtering and hardware pipeline. If you chose anisotropic filtering, we recommend using bc1‒5 formats.
Temporally stable anti-aliasing is crucial for a good VR experience. Multisample anti-aliasing (MSAA) is bandwidth intense and consumes a significant portion of the rendering budget. Anti-aliasing algorithms that are temporally stable post-process can provide equivalent functionality at half the cost and should be considered alternatives.
For Applications using Media and 3D pipelines, we recommend that you disable the MSAA for media workloads, because it has no impact on media quality.
Gen hardware supports object-level preemption, which usually translates into preemption on triangle boundaries. For effective scheduling of the compositor, it is important that primitives are able to be preempted in a timely fashion. To enable this, draw calls that take more than 1 ms should usually have more than 64‒128 triangles. Typically, full-screen post-effects should use a grid of at least 64 triangles as opposed to 1 or 2.
In the ideal case for a given frame, the app will have ample time to complete its rendering work between the vsync and before the Late Stage Reprojection (LSR) packet is submitted. In this case, it is best that the app synchronize on the vsync itself, so that rendering is performed on the newest HMD positional data. This helps to minimize motion sickness.
When the frame rendering time no longer fits within this interval, all available GPU time should be reclaimed for rendering the frame before the LSR occurs. If this interval is not met, the compositor can block the app from rendering the next frame by withholding the next available render target in the swap chain. This results in entire frames being skipped until the present workload for that frame has finished, causing a degradation of the app’s frames per seconds (fps). The app should synchronize with its compositor so that new rendering work is submitted as soon as the present or LSR workload is submitted. This is typically accomplished via a wait behavior provided by the compositor API.
In the worst case, when the frame rendering time exceeds the vsync, the app should submit rendering work as quickly as possible to fully saturate the GPU, allowing the compositor to use the newest frame data available, whenever that might occur relative to the LSR. To accomplish this, do not wait on any vsync or compositor events to proceed with rendering, and if possible build your application so that the presentation and rendering threads are decoupled from the rest of the state update.
For example, on the Holographic API, pass DoNotWaitForFrameToFinish to PresentUsingCurrentPrediction, or in Microsoft DirectX*, pass SyncInterval=0 to Present.
Use GPU analysis tools, such as GPUView, to see which rendering performance profile you have encountered, and then make the necessary adjustments detailed above.
Half float versus float: For compute-bound workloads, half floats can be used to increase throughput whenever precision is not an issue. Mixing half and full resolution results in performance penalties and should be minimized.
Based on the GPUView analysis, we recommend using FF decode to achieve performance and power gains as compared to software decode.
Leverage VPP FF for converting decoded output from NV12 to RGB
Perform CSC in sync with decoded content fps to save on power and performance.
The following tools will help you identify issues with VR workloads.
The following tools will help you identify issues with VR workloads.
GPUView: GPUView is a tool in the Microsoft Windows* Performance Toolkit that is installed by the Windows software development kit. GPUView provides specifics on identifying issues with scheduling and dropped frames.
Intel® Graphics Performance Analyzers: Gives specifics on analyzing VR workloads and the expected patterns we see, for example, two sets of identical calls for the left and right eyes.
A Graphics API Developer Guide for 6th Generation Graphics Processors: https://software.intel.com/en-us/articles/6th-gen-graphics-api-dev-guide
A Unity Optimization Guide for Processor Graphics: /content/www/us/en/develop/articles/unity-optimization-guide-for-x86-android-part-1.html
Compute Architecture for 6th Generation Graphics Processors: https://software.intel.com/sites/default/files/managed/c5/9a/The-Compute-Architecture-of- Intel-Processor-Graphics-Gen9-v1d0.pdf
The biggest challenge for VR workload performance comes from being bandwidth-bound. The texture format, fusing shader passes, and using post anti-aliasing techniques help reduce the pressure on bandwidth.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.