In terms of processing power and user experience, a virtual reality (VR) system falls into three types: premium, mainstream, and entry-level. Premium VR represents high-end VR and includes products on the market with high configuration, high-performance PCs, or game consoles. The main VR peripherals that support premium VR are HTC Vive*, Oculus Rift*, and Sony PlayStation* VR.
The hardware performance of mainstream VR is not up to the quality of high-end VR, but still uses PC processors for VR computing power. Entry-level VR comprises mobile VR devices, such as Gear* VR, and Google Cardboard* and includes VR glasses and all-in-one machines that use mobile phone chips as computing devices.
This article described the methods of testing and profiling VR games based on HTC Vive* and Oculus Rift on a PC. Compared to traditional PC games, VR games differ in gameplay design, input mode, and performance requirements. Gameplay and input are not within the scope of this tutorial. Instead, we look at the different aspects of performance requirements of a VR game versus a traditional game.
The size of pixel processing per second is an important measurement of VR experience. Because the screen resolution of the current HTC Vive and Oculus Rift CV1 is 2160x1200, when doing the actual rendering, more sampling is needed to offset the resolution loss caused by lens distortion. For HTC Vive and Oculus Rift CV1, this loss is as high as 140 percent. The size of pixel processing for VR reaches a surprising 457 million per second.
Performance testing and analysis are important parts of VR games. These tasks help achieve the necessary requirements and ensure full utilization of all CPU and GPU processing capabilities. Before testing, Oculus needs to close the Asynchronous Spacewarp (ASW) and SteamVR* needs to close the asynchronous reprojection so that the VR runtime compensation intervention doesn’t affect the performance analysis behind it.
To disable ASW, in the Oculus SDK, you can open and run Program Files\Oculus\Support\oculus-diagnostics\OculusDebugTool.exe.
Figure 1. Asynchronous Spacewarp configuration.
To disable the reprojection function in SteamVR, use Settings/Performance.
Figure 2. SteamVR* configuration.
Software tools play a vital role in testing and analyzing VR games. The main tools for these tasks include Fraps*, GamePlus*, Unreal Engine* console command, Windows* Assessment and Deployment Kit (ADK), SteamVR frame timing, and Intel® Graphics Performance Analyzers (Intel® GPA).
The Fraps FPS (frames per second) counter is a traditional test and frame-time tool, which developers can use to test the maximum frame rate, minimum frame rate, and the average frame rate over a period of time. The results can be easily imported into an Excel* file to generate graphics (see the top of Figure 3). As shown in the graph, we can see whether the frame rate change is smooth throughout the entire process. In addition, Fraps is handy for taking screenshots, which can be saved for reporting purposes.
Figure 3. (Top) Fraps FPS (frames per second) shows the frame rate change over time. (Bottom) The maximum frame rate, minimum frame rate, and average frame rate of the game’s frame rate change generated in Fraps over time.
The bottom of Figure 3 shows the maximum frame rate, minimum frame rate, and average frame rate of the game's frame rate change generated in Fraps over time.
As the data shows, the frame rate of the scene in this VR game is low most of the time—only about 45 FPS—and does not meet HTC’s requirements. With this kind of performance, the player will experience dizziness or have motion sickness (In order to prevent discomfort or so-called motion sickness when playing the VR game, helmet manufacturers are required to reach a stable frame rate of 90 FPS.) At this point, we can use the GPUVIEW in the Windows ADK to determine whether the problem is due to the GPU or CPU.
Although Fraps is free-sharing software, it has not been updated for a long time. GamePP* is a similar benchmark utility from China (http://gamepp.com/) that you can use. When running this utility, a tool window automatically displays at the top of the game window. This window displays FPS, CPU temperature, graphics occupancy rate, CPU, graphics card, and memory usage, and so on, in real time (see Figure 4). Another disadvantage of Fraps is that it cannot be used to test DirectX* 12 games, but you can use PresentMon, another tool, to collect FPS data.
Figure 4. GamePP* real-time data display interface.
The tool window at the top of the game window provides real-time monitoring of the running game and its performance. But both Fraps and the GamePP utility are designed for traditional games and can only be displayed on a monitor.
VR gamers wearing helmets or head-mounted displays (HMDs), who cannot see a game’s data changes in real time on the monitor, have two options if they want to see real-time performance data in a helmet:
Figure 5 shows the SteamVR frame timing data.
Figure 5. The missed frame in the head-mounted display.
When dropped frames occur in the game scene frame, a Missed Frames box will display on the HDM, as shown in Figure 5. The thicker the density of the red bar that displays in this box, the more frequent the frame drops occurred.
Figure 6. The CPU and GPU running on the PC.
Figure 6 shows more detailed data on the PC display. You can also configure the display in the HDM on the above show in headset. In the data readout, blue indicates the GPU rendering time and tan indicates the GPU free time.
As shown in Figure 6, the GPU rendering of some frames exceeds 11.11 milliseconds (ms), which will miss the time of Vsync and cause the missed frames. These frames cannot reach 90 FPS. Using this SteamVR frame timing tool, we can learn more about GPU bound, but it cannot determine whether the CPU render thread did not pass the rendering command in time, which caused the GPU to be a bubble or GPU rendering time to be too long.
If the game was developed using the Unity Engine 4 engine and is a development version—not a release version—you can view the real-time performance data of the game using the Unity Engine’s console command tool.
You can press the ~ button in the game to display the command line window. The following are some of the common console commands:
Figure 7. Screenshot of the Stat Unit command.
Stat SceneRendering: Shows the various parameter values on the game’s render thread (see Figure 8).
Figure 8. Screenshot of the stat SceneRendering command.
Stat Game: Shows the real-time view of parameter values running on the game logic thread, such as artificial intelligence (AI), physics, blueprint, memory allocation, and so on (see Figure 9).
Figure 9. Screenshot of the Stat Game command.
Stat GPU: Shows the time parameters of the GPU main render content in each frame in real time (see Figure 10).
Figure 10. The Stat GPU Command Display Screen
Stat InitViews: Shows the time and efficiency data that culling takes (see Figure 11).
Figure 11. Screenshot of the Stat InitViews command.
Stat LightRendering: Displays the render time required for lighting and shading (see Figure 12).
Figure 12. Screenshot of the Stat LightRendering command.
Additional commands, such as Stat A and Stat, can be referenced from the Unreal official webpages:
Further analysis can be done using GPUVIEW and Windows Performance Analyzer (WPA) in the Windows ADK. GPUVIEW is a powerful tool (for more information, refer to https://graphics.stanford.edu/~mdfisher/GPUView.html).
Of the above commands, Stat Unit gives a preliminary indication of whether a Frame is GPU bound or CPU bound. Sometimes the results are inaccurate, such as when a thread of a CPU causes a bubble in the middle of a GPU frame. If this happens, the GPU rendering time seen in the Stat Unit command is actually the add-ons of the real-time rendering time and bubble time. In this case, both GPUVIEW and WPA analysis are needed.
For example, as shown in Figure 13, the middle of each frame has a 2 ms bubble, thus the GPU is not working. Originally, frame rendering time was less than 11 ms, but with the bubble the rendering time is more than 11.1 ms as required by 90 FPS, which leads to the following frame missing the Vsync time. As a result, the frame drop occurred.
Figure 13. GPUVIEW interface: The middle of each frame has a 2 ms bubble
At this point, we can open the same Merged.etl file using WPA and find the time window of the bubble through the timeline to locate which thread of the CPU is heavier and what is running at this time on that thread (see Figure 14).
Figure 14. Windows* Performance Analyzer interface: the time window of the bubble through the timeline
If the rendering time of a GPU frame in GPUVIEW is more than 11.11 ms, the GPU bound can be determined, and then Intel® GPA can be used to analyze which parts of the pipeline are overloaded.
Intel GPA is a powerful, free graphics performance analyzer tool, which can be downloaded at https://software.intel.com/content/www/us/en/develop/tools/graphics-performance-analyzers.html. Intel GPA includes the following independent tools:
Intel GPA Graphics Frame Analyzer is used in conjunction with GPUVIEW. It can view the draw call, render target, texture map, overdraw, and shader of a certain frame in a game. By simplifying the shader, you can design an experiment to detect which part of the rendering affects performance, in order to identify the key part to optimize (see Figure 15).
Figure 15. Interface of the Intel® Graphics Performance Analyzers Graphics Frame Analyzer.
Let’s use an example to showcase how we can test and analyze a VR game.
You can use dxdiag command to view the machine configuration before testing:
|CPU||Intel® Core™ i7-6700K processor 4.00 GHz|
|GPU||NVIDIA GeForce* GTX 1080|
|Memory||1x8 GB DDR3|
|OS||Windows* 10 Pro 64-bit (10.0, Build 10586)|
First we run the Fraps test for a period of time and draw frame rate changes. As shown in Figure 16, we can see that during the first half of the test, there are some scenes that can reach 90 FPS, but in most of the latter half, the frame rate is fluctuating around 45, which does not meet the required standard. Further analysis is required.
Figure 16. FPS display.
Use the Unity Engine console stat FPS command to find a scene with a lower frame rate to conduct the analysis. If you think that the game is changing too fast to grab the data, you can use the console command PAUSE to PAUSE the game to make it easy to open the tools you need. In combination with the parameters of the stat Unit command, there are probably bottlenecks in both the CPU rendering thread and the GPU (see Figure 17).
Figure 17. Screenshot of stat unit command.
We must use GPUVIEW and WPA for simultaneous analysis.
Figure 18. GPU rendering time.
The first thing we can see from GPUVIEW is that the rendering time of a frame is 13.69 ms, over 11.11 ms, so the performance is not likely to reach 90 FPS (see Figure 18).
Next, we see that there is about 10 ms time on the CPU where there is only audio thread running (see Figure 19). Other threads are basically free, which means the audio thread did not make full use of CPU resources, which provides an opportunity to use the CPU for special effects, such as more AI, physics, materials, or particle effects.
Figure 19. CPU idle time.
This is also true from the WPA, where the game and render threads are basically idle.
Figure 20. CPU running thread displayed on the Windows* Performance Analyzer.
Using Intel GPA, you can see that there are less than 1,000 draw calls, which is a reasonable number.
Figure 21. All the draw calls in the Intel® Graphics Performance Analyzers Frame Analyzer.
Select all the Target to do experiments, where the time of the frame spent can be roughly seen.
|Test Target||Before the Test||After the Test|
1x1 Scissor Rect
Simple Pixel Shader
The 2x2 textures experiment uses simple textures instead of textures in the real scene. Experiments have shown that simple textures don't have significant performance improvement, so texture optimizations can be ignored.
The 1x1 scissor rect experiment is to remove the pixel processing stage in the rendering pipeline. From this experiment, the performance has been improved significantly.
The simple pixel shader experiment, as the name suggests, replaces the original shader from a simplified pixel shader, and the performance is greatly improved through experiments.
From the experiments above, the pixel processing task in the GPU rendering pipeline is rather heavy. Another way to view which operation in a frame takes most of the time is to use ToggleDrawEvents at command line input in Unity Engine 4.
Taking the aspect analysis by making the specific function names on each draw call attach, and then catch Frame using Intel GPA, the time spent by each draw call is shown in the Intel GPA Frame Analyzer.
Figure 22. Intel® Graphics Performance Analyzers shows the execution functions of each Draw call.
Below is a table of a few time-consuming modules. This information will help you focus on the modules that have a high time ratio and selectively do the optimization.
|Modules||Time Spent Ratio|
For more detailed Intel GPA analysis, please refer to another article from Intel® Developer Zone: https://software.intel.com/en-us/android/articles/analyze-and-optimize-windows-game-applications-using-intel-inde-graphics-performance
Optimization is one method you can use to experience a high-quality game if the hardware is not yet fully capable of achieving the high performance required for an immersive VR experience. Finding the bottlenecks during game optimization requires a comprehensive use of various tools and methods that were described in this article. This article also provided some insights through a variety of experiments and parameter adjustments to locate the CPU or GPU performance bottlenecks of the game, so as to improve the experience of the game.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804