This article explains techniques for using advanced audio hardware and virtual reality (VR) microphones to collect real-time binaural audio samples. Binaural recording uses two microphones to capture and later recreate a 3D soundscape; these field-recording techniques, specifically for use in VR applications and games, are described here—as well as the process of recording, editing, mastering, and creating environmental sound assets for Unreal Engine* 4. Walkthroughs and procedures are provided, using software examples with MAGIX ACID* Pro 8, MAGIX SOUND FORGE Audio Studio* 12, and Unreal Engine 4.
It's not every day you get to travel all over the world collecting spatial-audio from dozens of countries for VR projects, in much the same way as you might take photos on a vacation. I was tasked with capturing the 3D sonic space of jungles, castles, catacombs, caves… all kinds of fascinating and mysterious places around the planet (some of these travel sites are shown in Figure 1). These atmospheric recordings, impulse responses (reverb), and foley (everyday sound effects) for sound design were all captured in high-resolution audio, specifically for use in VR applications and experiences. Think of the ambient space of places as sonic textures; this spatial-audio helps drive the realism (and hyperrealism) of sound design for interactive VR.
Figure 1. Author Justin Lassen using a variety of recording equipment in different settings around the world.
The equipment I brought along for this expedition—which involved traveling through more than 17 countries—included:
My field-recording expedition took me through South East Asia, Europe, and North America. I acquired multiple terabytes of fresh new textures, sounds, audio, and content. To help VR sound designers create soundtracks for virtual worlds, I recorded natural indoor and outdoor environmental audio, impulse responses (for creating custom reverbs), foley design (raw sounds), sound effects sources, musical performances, and other audio content in many different places and conditions (see below).
Figure 2. Recording on a jungle river in Kanchanaburi, Thailand.
I captured original, high-resolution resonances in castles, cathedrals, catacombs, churches, caves, forests, jungles, temples, cities, markets, mansions, halls, concert venues, rivers, streams, tunnels, passages, and other unique venues. The expedition proved to be a real adventure.
Figure 3. Toting gear through the Underground in Prague, Czech Republic.
VR popularity has accelerated over the past few years, with the introduction of SteamVR*, HTC Vive*, Oculus Rift*, and others. All these interactive programs use sound in ways that standard AAA games never had to. Sure, in a first-person shooter (FPS) game, you would want footstep sounds to be cinematic, or be able to hear bullets flying off in the distance. Multichannel audio is used to add reality to a standard video-game experience; in VR, however, new rules and expectations have been established. For the player wearing headphones, enjoying a full immersive experience, realistic sounds make the virtual world come to life. Visuals can only go so far, and audio must take us the rest of the way through the uncanny valley. Surround audio, binaural audio, and hyperrealism in sound design have now become a necessity to help those dreams and graphics come to life.
Figure 4. Recording VR audio on top of a sacred mountain overlooking Seoul, South Korea.
As an audio director and sound designer on several high-profile VR projects (including Spider-Man*: Homecoming Virtual Reality Experience with Sony, Shapesong with Intel, and Space Dock VR demo with Qualcomm), I have had to learn and adapt to the best practices for audio content creation on many different platforms and engines (Unreal Engine, Unity*, and so on). Because of these experiences, I have successfully engaged in numerous projects involving custom sound editing, performance foley work, and ambient sound design for various storylines and narratives in VR worlds.
Figure 5. Shapesong VR (Intel), Space Dock VR Mobile Experience (Qualcomm), and Spider-Man*: Homecoming Virtual Reality Experience (Sony Pictures VR)
Surround sound, at its core, is a method of using many different audio channels to surround the listener. This could mean movie-theater surround (such as hearing bullets whiz behind your head while watching the movie), or more interactive surround in video games that changes the positioning of audio sources in real time (via the game engine) and in relation to the player. It can also refer to the kinds of interactive audio that you find at exhibitions and installations.
Binaural audio is a method of recording sound that uses two microphones arranged to mimic the natural human form to create 3D stereo sound. This gives the listener the sensation of being on the scene, in the place, or in the room that was being recorded (like hearing instruments or conversations, up close). This effect is sometimes created with a mannequin head that has microphones placed in each ear or with handheld recorders that have in-ear monitoring, with microphones fitted to the outside of each in-ear headphone.
Ambisonic audio is a 3D, surround-sound format with full-sphere coverage. It includes the horizontal plane as well as sounds below and above the listener. The channels in this format do not carry speaker signals. They contain a speaker-independent representation of a sound field, called B-format, which can later be decoded to the listener's speaker setup. This very flexible format can give the sound designer or audio producer options to use the audio for speakers, headsets, or VR.
There are several different brands of VR microphones on the market today, equipped with differing qualities and characteristics. Generally, these VR microphones capture 360-audio for video content producers by default. However, with the conversion software that is included with many mics, the audio can be changed into other formats, or separated for further augmentation and editing. With the wide selection of VR microphones available, you have many options for capturing beautiful soundscapes and surround content. The mic I brought with me on my last adventure was the Sennheiser AMBEO VR mic, which includes a suite of conversion tools and software. This microphone has four capsules arranged in a 360-degree pattern to capture sound waves below, above, and horizontal to the placement of the microphone. The recording device I used (shown below) was the Zoom F4 Field Recorder that has ambisonic recording formats incorporated in the firmware of the device.
Figure 6. Microphones and a recorder used to capture 360-degree surround audio.
An easy and affordable way to capture 3D audio is to use ambisonic headsets, or binaural recorders with binaural headphone and microphone combinations (examples shown in Figure 7). The products I brought with me on the trip were off-the-shelf recorders and mics, as well as the AMBEO Smart Headset that connects directly to my Apple iPhone via a Lightning* cable. Using the built-in software, I could easily capture realistic binaural content “on the go.”
Figure 7. Binaural recorder and ambisonic headsets
Capturing audio during field-recording missions is a fun exercise that almost anyone can do. The equipment is easily accessible for most teams and what you capture is completely up to your imagination. Given the wide variety of audio capture tools and recorders, you can find efficient ways to use them in very different settings and circumstances, as shown in the following examples.
If the day is windy, or you are traveling at higher elevations (such as hiking in the mountains), use a windscreen on the VR microphone. When using in-ear binaural microphones, I found that my hat/beanie was effective as a windscreen in almost every situation. In jungle settings, because of the canopy and trees, it was more effective to remove the windscreens from the microphone (see Figure 8).
Figure 8. Recording with an AMBEO* VR mic in the jungle.
The binaural microphones were much more useful in certain indoor locations. Because they are hidden microphones—easy to prepare and trigger quickly—I found that I had them ready to use almost all the time, no matter where I was, even if I had planned to use the VR microphone (see Figure 9). For certain locations—such as the Sistine Chapel—it was much easier to record the stellar binaural ambience respectfully, when using the in-ear microphones.
Figure 9. Justin with in-ear binaural 3D audio mics/headphone inside a temple in Taiwan.
In underground or enclosed locations, it was easy to use the Bluetooth® technology capability of the Roland binaural recorder to set the recorder in place and trigger from afar. Alternatively, I could hold it closely and quietly while descending into narrow corridors (see Figure 10).
Figure 10. Descending into a narrow corridor.
When recording foley design, I was able to remove the shielding[SK4] from the microphone and capture resonances up close. An example is when I was recording Asia's top female chef Bee Satongun (see Figure 11), capturing the sound and atmosphere of her Michelin-starred restaurant.
Figure 11. Justin capturing up-close audio resonances of chef Bee Satongun's award-winning cuisine (Bangkok, Thailand).
When I was challenged by larger-than-life instruments or bells, I was able to use a combination of handheld binaural recorders and in-ear AMBEO 3D microphones to capture quality sound (see Figure 12).
Figure 12. Recording earth-shattering temple bells, cathedral rings, and traditional ceremonial bells in Korea, Thailand, and Austria.
To realistically model sound propagation for VR applications, the recording of impulse responses, ambiences, and reverb in different environments is a very helpful addition to a sound designer's toolkit. I recorded spatial room impulse-responses in a wide variety of settings across several different countries (see Figure 13).
Figure 13. Recording impulse responses, ambiences, and reverb in Italy, Czech Republic, Taiwan, Korea, and Ireland.
(OPTIONAL) Binaural Audio Examples (4K Videos with Binaural Sound[LP6]):
NOTE: Grab your headphones so that you can experience raw captured AMBEO binaural 3D audio content from the smart headset.
Forest in Kanchanaburi, Thailand (4K)
Jungle pond in Kanchanaburi, Thailand (4K)
Binaural is, by its nature, two channels: left and right. Because of this, it can be used as-is in most applications. It is most effective if it accompanies video; however, it can be edited and changed in most digital audio workstations (DAWs), including PreSonus Studio One* and MAGIX ACID Pro 8. The following sequence shows how to separate the stereo/binaural file into two mono files, with MAGIX SOUND FORGE Audio Studio 12. This can be accomplished in a similar way using Audacity* (a free, open source, cross-platform audio editor).
Step one: Open the binaural .WAV file. In this example, I used a binaural recording of a dark street environment. As you can see (below), there are two audio channels (left and right).
Figure 14. Two channels of audio
Step two: Select the first channel (the left channel) by double clicking (Figure 15). Press CTRL-C (Copy).
Figure 15. Selecting the left channel
Step three: In SOUND FORGE 12, press CTRL-E (Paste to New). This takes the left-channel data and automatically pastes it into a new MONO file (Figure 16). Repeat this step for the right channel so that you have two individual .WAV files (left and right, respectively).
Figure 16. Creating a new MONO file
Step [LP7] four: When opened in ACID Pro 8 (see Figure 17), you now have the left and right channels of the binaural file as separate files that you can edit and add effects to, individually or together. It is best to keep the effects symmetrically applied. Also, because they are standard .WAV files, you can re-export them with your edits and effects into game engine-friendly formats. In this case, the binaural recorder captures audio at a resolution of 24-bit/96 kHz, which is too high for the supported Unreal Engine 4 format. When saving your edits and changes in the DAW, make sure that you convert to PCM 16-bit/44.1 kHz for best results in game engines (Unity, Unreal).
By default, Ambisonic audio is a good format for videographers who want to showcase 360-degree audio for 360/VR video content, or for mixing to surround speaker applications (such as concert venues). It is, of course, also good raw audio for interactive games and VR projects. There are a few extra steps required for using these recordings in VR or a DAW. Ambisonic audio can be as simple or as complex as you need it to be. The software that is included with the Sennheiser AMBEO VR mic (see Figure 18) is simple and efficient in converting the A-recorded format into spatial B-format, as [SK8] useful 3D audio for videos or VR, or simply the 4 channels down to stereo.
Figure 18. Interface of the AMBEO* software provided by Sennheiser.
The great thing about these AMBEO audio processors from Sennheiser is that they are free, and they come with a lot of tutorials, documentation, techniques, and videos. They are, surprisingly, very simple to use in almost any DAW software (I used them in ACID Pro 8). The AMBEO Orbit plugin enables you to change reflections, clarity, width, elevation, and binaural pan of ambisonic audio. The AMBEO A/B converter enables [SK9] straight-forward conversion.
Note: When recording with the AMBEO VR mic, make sure to note the position you were holding the microphone when on location (upright, upside down, endfire/sideways).
With VR environments, the most effective way to make a scene come to life is by combining both ambisonic and binaural audio together, in-engine. By separating the channels into their own MONO objects/files (as demonstrated in the previous discussion), I can attach them into interesting but symmetrical positions in a game environment (see Figure 19). Players experiencing this in-game feel as though they can actually move around in the space. If you put a “baked-in” spatial-audio object into a scene, the audio would stay in the position of the user's head. However, if you attach the individual channels to sound emitters/objects in the scene, you can create a sense of space and environment. The more MONO objects with surround definitions and parameters used in a scene, the more the player feels they are immersed in a real world.
Figure 19. Adding MONO files into an environment.
Note: Import the individual .WAV (MONO) files into Unreal Engine. This enables them to be used independently from each other in a scene to create a pseudo-binaural experience (Figure 20).
Figure 20. Creating a pseudo-binaural experience.
Note: Adding sound files to a VR project in Unreal with the right attenuation settings that fit a particular scenario and placement in a VR scape/scene (along with other sound effects, foley, and music design) allows you to create a blend of hyper-realism composed of many sources of surround, binaural, and mono audio content (see Figure 21).
Figure 21. Blending sounds for a hyper-realistic effect.
Over the course of this article, I've explained how to capture audio in field-recording missions, bring it back to the lab, convert it, and edit the audio with effects. Finally, the files are imported into Unreal Engine for use in VR applications. The complexity of a scene will be very different based on your project or individual needs. Some people will only use several mono sources, while others will use a mix of binaural and ambisonic audio. It can be as simple or as complex as your imagination guides you.
I have had VR projects with literally hundreds of individual mono audio files and sound effects, and others that were more streamlined and abstract, with just a few files. The tools and hardware that are available to me are just as easily available to you. Get out there in the world, travel, capture the surroundings—not only in video and photos, but also in audio.
You can check out more video footage, pictures, stories, and images on my official VR Instagram account (@nihilstudios) where I documented the entire trip around the world for this VR audio library project. My VR audio library will become available next year to developers and sound designers, so watch for it!
Composer. Remixer. Sound Designer. Visionary. With over 20 years of experience in the music, film, tech, and video game industries, Justin is currently working as an award-winning spatial audio designer for VR/AR/MxR projects. He has lent his production talents to iZotope*, Cakewalk*, Intel, DTS*, Sony*, Disney*, Konami*, Skybound*, Hasbro*, Lakeshore*, Interplay*, the United States Department of Defense, and many more.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserverd for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804