Virtual Reality (VR) is an amazing experience. However, it’s also a solo experience that can be hard to describe to anyone yet to don a headset and make the leap into that virtual world. As VR continues to expand its horizons in games, art, and a whole string of commercial applications from real estate to health, one of the enduring challenges for its cheerleaders is effectively showcasing the experience to those without access to VR hardware. Regular 2D videos shot from the first-person perspective of the user don’t do justice to the real experience; their limited field of view prevents them from truly giving a sense of the immersion into a 360-degree world.
To tackle this, VR hardware producers, application developers, and video makers have created a new VR video production paradigm, with green-screen, mixed-reality video. Shot from a third-person perspective, the technique allows the production of 2D videos that show the user in the heart of the experience—immersed in a virtual world, and interacting with the elements in it—in a way that first-person perspective videos simply can’t do. Until the day when every home has a VR headset, green-screen, mixed-reality video is likely to remain the best way to share the incredible VR experiences developers around the world are creating.
Figure 1: Screenshot from the mixed reality HTC Vive* VR demo trailer released in April 2016.
This paper introduces developers (and anyone else interested in the medium) to the basic principles and techniques for the creation of green-screen, mixed-reality videos for VR experiences. It will look at the hardware and software stack, and the process of enabling and producing mixed-reality video for VR games and applications, with a view to equipping developers to take their first steps with the technique. Additional companion articles and videos will follow in the future, and readers can stay up-to-date with developments by joining the Intel® Game Dev program at software.intel.com/gamedev.
Trailers and Streamers
Even though virtual reality has benefited from waves of hype and attention over recent years, it is still a relatively young technology. In that context, the technique of creating VR mixed-reality video is very new, having only come to the fore since early 2016, when HTC Vive* released its own VR mixed-reality demo video, and mixed-reality trailers appeared for games—including Job Simulator* from Owlchemy Labs, and Fantastic Contraption* from Northway Games.
The applications of VR will continue to be explored for many years to come, but, right now, the ability to capture the essence of the experience in a 2D video using mixed reality is of great interest to developers—such as those behind Owlchemy Labs’ Rick and Morty: Virtual Rick-ality*—as they seek to communicate the appeal of their experience to a broad audience, and to differentiate from the growing number of VR games hitting the market. It’s also a vital tool in effectively showcasing VR apps like Google’s 3D virtual art creation tool Tilt Brush*, as demonstrated in season two of SoulPancake’s Art Attack* series on YouTube*.
Figure 2: Screenshot from a mixed-reality video for Art Attack* in which artist Daron Nefcy uses Google Tilt Brush*.
Mixed-reality video is, without doubt, the best technique to use in a promotional trailer for a VR experience—as the work of one of the genre’s leading trailer producers, Kert Gartner, amply demonstrates. Legions of streamers and YouTubers—including Barnacules Nerdgasm* and DashieGames*—have begun to exploit this technology to great effect in their videos of playing VR experiences, bringing a new dimension to the presentation, and leaving increasing numbers of VR converts in their wake.
Figure 3: Screenshot from Kert Gartner’s mixed-reality trailer for Space Pirate Trainer* by I-Illusions.
From a developer’s point of view, enabling a VR app or game for mixed-reality video creation requires at least a degree of forethought, and potentially some more serious programming. In the context of a developer’s limited resources, it may not always seem the highest priority—but it’s worth making time for in the schedule.
“Being able to show the experience to people outside of the headset is the biggest benefit of this technique,” said Josh Bancroft, Community Manager in the Developer Relations Division at Intel. “A secondary benefit is that you’re enabling that vast army of content creators, streamers, and YouTubers by making it easy and attractive for them to stream your VR game. That helps you increase your reach and get more people seeing, and hopefully playing, your game.”
Josh and his colleague Jerry Makare, who runs the Developer Relations Division video team at Intel, have been working with VR mixed-reality video for over a year, including setting up a live demo of the technology for developers to try at the 2017 Game Developers Conference. Josh and his team’s natural attraction to any exciting new technology drew them to the first examples of the technique, with the ultimate goal of helping to make it more accessible and to facilitate its adoption by their community of developers.
Building the Mixed Reality Stack
Mixed-reality video of a VR experience requires two central components that must be perfectly synchronized with one another, in high quality. The components include live footage of the user interacting with the app, and footage from the virtual environment generated by the VR application or game. The tasks required include running the VR app (including the generation of an additional third-person camera view), capturing the live green-screen video, chroma key processing, compositing, encoding, and output of the final mixed-reality video for recording or streaming.
The first stage requires instructing the VR app or game software to add a virtual third-person, in-game camera, which points toward the player’s virtual position in the app environment. This is vital to being able to show footage of the user actually within the virtual game environment, and is in addition to the standard first-person camera that produces the immersive 360-degree image the user sees in the headset. In software terms, this additional camera is implemented in much the same way that any virtual camera is placed in a game, namely by setting the seven variables that decide where it is, and what it sees, in the virtual 3D volume: X, Y, and Z position; X, Y, and Z rotation; and field of view (that is, how narrow or wide the shot is).
In a VR application, the user’s first-person camera position and rotation is controlled by the movement of the headset, as registered by the sensors in the physical space around the user, while the field of view is set by the developer depending on what the experience is, and how much of the environment is optimal for them to see. Defining an additional, virtual third-person camera is a relatively straightforward task in most game engines; the new camera must be adjusted to give the desired view of the user within the environment—that is, not too close that we lose sight of them easily when they move, and not too wide that they’re too far away or start to become lost in the environment.
In addition to the first-person headset view and the third-person view, the app also needs to be instructed to output two further views, namely background and foreground views, as seen from the new virtual third-person camera. These are the layers that will be combined with the live-action footage of the user for compositing the final mixed-reality output.
Once the app’s virtual cameras and video outputs have been dealt with, it’s time to look at what needs to be done in the real world. The first prerequisite is a suitable space that’s big enough for the user to move around in, as required by the app or game. To create the full mixed-reality video experience where the user is shown fully immersed in the environment of the app, shooting the user on a green screen, which can them be removed in the video-processing stage using a chroma key filter, is necessary.
Figure 4: The green-screen studio setup used by the Intel team at Computex in Taipei, May 2017.
The area covered by the green screen needs to be large enough that the physical camera can move around the space without getting in the way of the user, and without reaching the edges of the volume—otherwise, a sudden burst of real-life studio interior can appear on the edges of the final composited image, breaking the immersive illusion. The larger the closed green space, the more options the camera has to move around the user. The lighting of the green space also needs to be as even as possible across all surfaces in order that the chroma key filter can accurately register and remove all the green color from the image, without any patches remaining.
It is possible, however, to create very effective mixed-reality video without using green screen, depending on the VR experience in question. For example, a VR art application such as Tilt Brush is exclusively about the creation of foreground elements, so there is no requirement regarding the background. This means that while the user could be shot on a green screen with any background added in virtually, they could just as easily be shot in any physical environment with a suitable background. The foreground objects would appear superimposed on the real environment (as in augmented reality), and the user would be able to interact with them.
Figure 5: Screenshot from Kert Gartner’s mixed-reality trailer for Fantastic Contraption*, made without green screen.
When the new in-app third-person camera and the physical shooting space are figured out, the next step is setting up the physical camera. A webcam could be used if it’s the only thing available, but the quality of the final video will be directly impacted by the quality of the camera used; a DSLR or professional video camera outputting a signal at a resolution of 1080p or higher will deliver a significantly better result. The crucial part at this stage is binding the physical camera to the newly created virtual camera so that they see the same thing. This requires an exacting calibration process.
Figure 6: The upper image is the raw green-screen footage; the lower image is the final composited image with background and foreground layers added in real time.
The third-person image from the app needs to be seamlessly composited with the image from the physical camera, so that the elements align precisely—particularly the player’s hands, which will be holding the two VR controllers that are tracked in space. If this isn’t done correctly, it’s possible to end up with virtual in-game hands, or other handheld items that are anything from a few centimeters to a couple of feet away from the user’s real hands. This, of course, completely breaks the illusion of immersion and makes the resulting video look very odd.
In the case of the HTC Vive that Josh and his team have worked with extensively, the virtual and physical cameras are bound together by attaching a Vive Tracker* (a hockey puck-like sensor) or a Vive controller to the physical camera so that the Vive sensors can map space to sync with the virtual in-game camera. This third tracker, or controller, attached to the physical camera is the linchpin between the virtual and physical worlds, facilitating the entire process.
The calibration process is concerned with lining up the in-game virtual camera with the physical camera. The seven key variables (as described previously for the virtual third-person camera) need to be perfectly matched so that the two cameras are pointed in exactly the same direction, see the same thing with the same field of view, and track perfectly with each other when they move. These variables are the X, Y, and Z positions; the X, Y, and Z rotation; and the field of view.
The calibration process can be done by simple trial and error; changing the values and iterating until the position of the controllers matches with the virtual hands in the app. Manual calibration is extremely difficult, however, and as the efficacy of the mixed-reality illusion relies on the accuracy of this calibration process, it’s better to enlist help where possible.
Figure 7: Screenshot from MixCast VR Studio* showing the seven values for position, rotation, and field of view.
A number of tools are available, but, for the various demos and related work produced to date, Josh and Jerry have been using MixCast VR Studio* by Blueprint Tools. It’s a purpose-built suite of tools developed to help those producing VR and mixed-reality videos, and includes good calibration support. It works by first being told which input device your camera is, then using its green-screen chroma key functionality to support the calibration process.
According to Josh, calibration works best when it is a two-person process. Prior to starting, it’s important to ensure that the physical camera and the attached tracker are as level as possible. “Use a bubble level, or a phone with a compass app, to tell you whether it’s level or not,” said Josh. “The process will be more precise, and much easier, if you start with the camera and the tracker as close to perfectly level and square as you can.”
Next, one of the hand controllers, or a tracker puck, is fixed on the body of the camera, and MixCast is told which device is the one attached to the camera. Next, Quick Setup is launched, followed by two sets of crosshairs appearing on screen which need to be lined up. Once that’s done, a click begins the calculations for the field of view and the rough position and approximate alignment of the virtual and real cameras, followed by a fair amount of necessary fine tuning to make it perfect.
Josh also emphasized the importance of the user position when going through the calibration process. “When you’re standing in front of the camera, try to stand perfectly square to it, so your shoulders are square and you’re lined up with the center of the lens,” he explained. “It makes things that much easier in terms of not having to compensate for those positional differences when you’re trying to line up everything in three dimensions.”
Figure 8: Using MixCast VR Studio* to perform the calibration process.
Once everything is lined up inside MixCast VR Studio, it will look right when the user picks up and interacts with objects—and when the physical camera is moved around in the real world, the virtual in-app camera will move with it in perfect sync. The calibration values can then be copied from MixCast VR Studio to the configuration file and reused later, or used with any application that has the same kind of mixed-reality enablement (at the time of writing, this includes most Unity* titles that use the SteamVR* plugin). As long as the camera and the tracker stay physically locked together, and their physical relationship doesn’t change in space, those values will remain accurate, although fine tuning may be required. In live demo environments, Josh and Jerry recalibrate the cameras at least once a day to ensure accuracy.
At this point, the app is providing background and foreground visual feeds, and the position, rotation, and field of view of the virtual and physical cameras are synchronized, allowing the user to interact with the VR app and to have their actions accurately rendered in space from a third-person perspective. The next task is to bring the physical camera feed (green screen) into the PC using a video capture device, along with the two feeds from the app (background and foreground).
The image feeds can be brought into any software program capable of performing the chroma key and compositing operations required to produce a single mixed-reality video output. A number of software suites designed for streamers and video producers can perform the necessary tasks, including XSplit Broadcaster* and OBS Studio*, with Josh and Jerry having worked primarily with the latter.
The third-person view from the app is displayed in a single window divided into quarters, comprised of the third-person background and foreground views; the standard first-person headset view; and a fourth view, which is an alpha mask. The first-person view is not required for the mixed-reality compositing, but provides a useful reference of what the user is actually seeing, and can be used to cut to during live streaming and recording situations. The alpha mask is also non-essential unless there is black in the foreground image, but, while being relatively complex to implement, can improve the overall visual quality by making edges smoother.
Figure 9: Screenshot from OBS Studio* showing the four quadrants from the Rick and Morty: Virtual Rick-ality* game. Top-left is foreground, bottom-left is background, top-right is the mask, and bottom-right is the first-person view.
An important point regarding the window that displays the quadrant screen is that it needs to be at least four times the resolution of the final video output (for example, 4K for a final 1080p output). This is because the compositing software takes the individual quadrants of the screen into the compositing process, meaning that the images used for compositing are a quarter of the entire screen size. Anything less than 4K, and the final video will be sub-1080p HD resolution.
The individual screen quadrants, and the live camera green-screen feed, are then all brought into OBS Studio. The chroma key filter is applied to the live footage to remove the green background (much as when YouTubers show only their head and shoulders superimposed onto game footage), giving a cutout of the user, which can be placed directly on top of the app background layer. The foreground layer from the app engine is comprised of the foreground elements on a black background, so a key-color filter is applied to remove the black, creating a foreground layer with a transparent background that can be directly applied onto the image as the final third layer. This will work unless there is black in the foreground image, in which case the key-color filter will remove foreground elements that need to be retained. In this case, the alpha mask layer should be used instead of the key-color filter. The final composited video can then be encoded and output for recording and/or a live stream.
Figure 10: Using the key-color filter in OBS Studio* to remove the unwanted black area of the foreground layer.
The entire process of producing mixed-reality VR video is extremely hungry when it comes to processing power. Josh describes it as extreme megatasking—a collection of processes that go well beyond the average requirements for a PC. Individually, running a VR game and video compositing are already extremely demanding tasks. With this process, the system needs to do both, in parallel, in minimum 4K resolution, while maintaining a high, 90 frames per second frame rate (to lower the risk of motion sickness), and simultaneously encoding and outputting the signal for streaming and recording.
The load on the CPU and graphics processing unit (GPU) is such that, in normal circumstances, it’s too much for a single computer to handle at anything better than low quality. At the Game Developers Conference 2017, Josh built a custom PC to the highest possible specification without going into the realms of the extreme. Equipped with a sixth-generation, water cooled, quad core Intel® Core™ i7-6700K processor, overclocked to run at 4.2 gigahertz, a top-of-the-range Nvidia GTX* 1080 GPU, and fast solid state drive memory, “it could handle any game you threw at it with ease, including VR titles,” said Josh.
However, it wasn’t enough to handle the end-to-end workflow for creating VR mixed-reality videos without seriously compromising video quality. A second system was brought in to handle the encoding and streaming, while the main rig ran the VR app and performed the capture and compositing.
There is, however, a new option emerging, in the form of the recently announced Intel® Core™ X-series processors with 12 and 18 cores, which are now rolling out commercially. Josh and his team have run the entire green-screen, VR mixed-reality video workflow successfully on the systems, both at the world premiere of the Intel Core i9 processors at Computex Taipei in May 2017, and a couple of weeks later at the Electronic Entertainment Expo (E3) in Los Angeles. These powerful processors reduce the need to split tasks across multiple machines while maintaining the quality—greatly simplifying the process, and allowing creators to replace multiple PCs and specialized equipment with a single, very powerful PC.
Figure 11: Intel stage presentation at Computex in Taipei, May 2017, showing the green screen studio (top-left), and the live mixed-reality stream (top-right).
Enabling for Mixed Reality
Josh, Jerry, and the team at Intel have the most experience working with HTC Vive VR applications built in Unity using the SteamVR plugin, which facilitates the requirements for mixed-reality video—the third-person virtual camera, binding it to a controller or tracker, and outputting the background and foreground layers. Using the Unity 5* engine with the SteamVR plugin on HTC Vive ensures that the heavy lifting of mixed-reality enablement is already done for the developer.
As a result of the number of different platforms and engines, and the young nature of the technology, no single standard or consistent way to implement the technology and workflow across all of them currently exists, and a greater or lesser amount of programming may be required depending on which is used. While it’s understood that other platforms and engines—such as Unreal Engine* and Oculus Rift*—are developing their mixed-reality offerings, Josh recommends checking directly with the makers for up-to-date information regarding their specific capabilities and requirements.
One useful tip from Josh for developers who want to optimize their app for mixed reality is to avoid drawing hands in the game (or at least make it possible to switch them on and off), and stick to objects that the hand can hold instead. This is because if there is a hand, it’s never going to be completely matched with the user’s real hand, with the result being that it simply looks off, and breaks the illusion of immersion in the virtual environment. “If there’s no virtual hand, and your real hand is seen picking up objects or interacting with the world, it’s usually close enough that the illusion works,” explained Josh.
Figure 12: In this example, the user is holding blasters and no virtual hands are visible, which supports the overall illusion.
The Intel team sees enormous future potential for the kind of VR mixed-reality video-production techniques that have been pioneered over the past 18 months by Vive, Google*, Kert Gartner, and an increasing number of independent developers, streamers, and YouTubers.
“There’s a lot of cool potential here for filmmaking and storytelling,” enthused Josh. “I can’t imagine it’s going to be very long before we start seeing the first films that are produced in VR mixed reality, with a virtual environment that people can be immersed in, and mixed reality used to tell a story inside that world.”
Josh can also envisage its adoption in journalism and weather reporting, with a reporter pictured live in an environment that would in reality be uninhabitable—for example, the eye of a storm, or a contaminated site. Meanwhile, Jerry has his eyes on a different prize: “I wonder how episodic TV shows would look if you could insert yourself into them somehow; for example, live shows that you join in VR, and stream your own version of.”
Running with that thought, Josh expects to see the adoption of mixed reality in the world of eSports, as more VR titles follow in the footsteps of magical dueling simulator The Unspoken*. “Imagine being able to essentially put yourself in the arena through remote VR technology, then doing your own commentary that you stream live, in mixed reality,” said Josh. “I think there’s a ton of potential.”
As developers begin to understand the value of mixed-reality video to communicate their VR experience to a wider audience, and its uses extend beyond the gaming and app world, the market is going to open up for specialist video production companies using the technology to show those experiences to their absolute best advantage.
“In the Developer Relations Division of the Software and Service Group at Intel, we live by the idea that these advanced processors—with billions of transistors—really aren’t good for much more than converting electricity into heat without providing great software experiences,” said Josh.
To help ensure those experiences get made, Josh, Jerry, Bob, and the team are committed to working closely with the developers lighting up Intel’s silicon. In the field of VR, there is probably no better way to showcase a new experience than with a mixed-reality video, which is why the team has been exploring the technique’s potential, and working to share their knowledge and inspirations with as many developers as possible.
“We want to work with VR developers to make their VR experiences enabled for mixed reality, and blow people away with these trailers,” said Josh. “We’re trying to help developers make the most amazing software experience possible, because those amazing software experiences are what unlock the potential of the hardware products.
“We’re always talking to developers large and small, working with them, and getting their input,” continued Josh. “We listen, and we try to make the things that will help them improve their software. That is the heart of what we do.”
More stories, tutorials, case studies, and other related materials are planned around VR mixed-reality technologies. To stay up-to-date with all the latest news, join the Intel developer program at: https://software.intel.com/gamedev.