For more than a decade, we've enjoyed the evolution of the First Person Shooter (FPS) Genre, looking at games through the eyes of the protagonist and experiencing that world first hand. To exercise our control, we've been forced to communicate with our avatar through keyboard, mouse, and controllers to make a connection with that world. Thanks to Perceptual Computing, we now have additional modes of communication that bring interaction with that world much closer. This article not only covers the theory of perceptual controls in FPS games, but demonstrates actual code that allows the player to peek around corners by leaning left or right. We will also look at using voice control to select options in the game and even converse with in-game characters.
A familiarity with the Intel® Perceptual Computing SDK is recommended but not essential, and although the code is written in Dark Basic Professional (DBP), the principals are also suited to C++, C#, and Unity*. The majority of this article will cover the theory and practise of augmenting the First Person experience and is applicable not only to games but simulations, tours, and training software.
In this article, we’ll be looking at augmenting the FPS game genre, a popular mainstay of modern gaming and one that has little to no Perceptual Computing traction. This situation is partly due to the rigid interface expectations required from such games and partly to the relative newness of Perceptual Computing as an input medium.
As you read this article, you will be able to see that with a little work, any FPS can be transformed into something so much more. In a simple firefight or a horror-thriller, you don't want to be looking down at your keyboard to find the correct key—you want to stay immersed in the action. Figuring out the combination of keys to activate shields, recharge health, duck behind a sandbag, and reload within a heartbeat is the domain of the veteran FPS player, but these days games belong to the whole world not just the elite. Only Perceptual Computing has the power to provide this level of control without requiring extensive practice or lightning fast hand/eye coordination.
Figure 1. When reaction times are a factor, looking down at the keyboard is not an option
We’ve had microphones for years, but it has only been recently that voice recognition has reached a point where arbitrary conversations can be created between the player and the computer. It’s not perfect, but it’s sufficiently accurate to begin a realistic conversation within the context of the game world.
Figure 2. Wouldn’t it be great if you could just talk to characters with your own voice?
You’ve probably seen a few games now that use non-linear conversation engines to create a sense of dialog using multiple choices, or a weapon that has three or four modes of fire. Both these features can be augmented with voice control to create a much deeper sense of emersion and create a more humanistic interface with the game.
This article will look at detecting what the human player is doing and saying while playing a First Person experience, and converting that into something that makes sense in the gaming world.
2. Why Is This Important
As one of the youngest and now one of the largest media industries on the planet, the potential for advancement in game technology is incredible, and bridging the gap between user and computer is one of the most exciting. One step in this direction is a more believable immersive experience, and one that relies on our natural modes of interaction, instead of the artificial ones created for us.
With a camera that can sense what we are doing and a microphone that can pick up what we say, you have almost all the ingredients to bridge this gap entirely. It only remains for developers to take up the baton and see how far they can go.
For developers who want to push the envelope and innovate around emerging technologies, this subject is vitally important to the future of the First Person experience. There is only so much a physical controller can do, and for as long as we depend on it for all our game controls we will be confined to its limitations. For example, a controller cannot detect where we are looking in the game, it has to be fed in, which means more controls for the player. It cannot detect the intention of the player; it has to wait until a sequence of button presses has been correctly entered before the game can proceed. Now imagine a solution that eliminates this middle-man of the gaming world, and ask yourself how important it is for the future of gaming.
Figure 3. Creative* Interactive Gesture Camera; color, depth and microphone – bridging the reality gap
Imagine the future of FPS gaming. Imagine all your in-game conversations being conducted by talking to the characters instead of selecting buttons on the screen. Imagine your entire array of in-game player controls commanded via a small vocabulary of commonly spoken words. The importance of these methods cannot be understated, and they will surely form the basis of most, if not all, FPS game interfaces in the years to come.
3. Detect Player Leaning
You have probably played a few FPS games and are familiar with the Q and E keys to lean left and right to peek around corners. You might also have experienced a similar implementation where you can click the right mouse button to zoom your weapon around a corner or above an obstacle. Both game actions require additional controls from the player and add to the list of things to learn before the game starts to feel natural.
With a perceptual computing camera installed, you can detect where the head and shoulders of your human player lie in relation to the center of the screen. By leaning left and right in the real world, you can mimic this motion in the virtual game world. No additional buttons or controls are required, just lean over to peek around a corner, or sidestep a rocket, or dodge a blow from an attacker, or simply view an object from another angle.
Figure 4. Press E or lean your body to the right. Which one works for you?
In practice, however, you will find this solution has a serious issue., You will notice your gaming experience disrupted by a constantly moving (even jittering) perspective as the human player naturally shifts position as the game is played. It can be disruptive to some elements of the game such as cut-scenes and fine-grain controls such as using the crosshair to select small objects in the game. There are two solutions to this: the first is to create a series of regions that signal a shift to a more extreme lean angle, and the second is to disable this feature altogether in certain game modes as mentioned above.
Figure 5. Dividing a screen into horizontal regions allows better game leaning control
By having these regions defined, the majority of the gaming is conducted in the center zone, and only when the player makes extreme leaning motions does the augmentation kick in and shift the game perspective accordingly.
Implementing this technique is very simple and requires just a few commands. You can use the official Intel Perceptual Computing SDK or you can create your own commands from the raw depth data. Below is the initialization code for a module created for the DBP language and reduces the actual coding to just a few lines.
rem Init PC
normalx#=pc get body mass x()
normaly#=pc get body mass y()
The whole technique can be coded with just three commands. The first initializes the perceptual computing camera and returns whether the camera is present and working. The second command asks the camera to take a snapshot and do some common background calculations on the depth data. The last two lines grab something called a Body Mass Coordinate, which is the average coordinate of any foreground object in the field of the depth camera. For more information on the Body Mass Coordinate technique, read the article on Depth Data Techniques (http://software.intel.com/en-us/articles/perceptual-computing-depth-data-techniques).
Of course detecting the horizontal zones requires a few more simple lines of code, returning an integer value that denotes the mode and then choosing an appropriate angle and shift vector that can be applied to the player camera.
rem determine lean mode do leanmode=0 normalx#=pc get body mass x()/screen width() if normalx#<0.125 leanmode=-2 else if normalx#<0.25 leanmode=-1 else if normalx#>0.875 leanmode=2 else if normalx#>0.75 leanmode=1 endif endif endif endif leanangle#=0.0 leanshiftx#=leanmode*5.0 select leanmode case -2 : leanangle#=-7.0 : endcase case -1 : leanangle#=-3.0 : endcase case 1 : leanangle#= 3.0 : endcase case 2 : leanangle#= 7.0 : endcase endselect pc update loop
Applying these lean vectors to the player camera is simplicity itself, and disabling it when the game is in certain modes will ensure you get the best of both worlds. Coding this in C++ or Unity simply requires a good head tracking system to achieve the same effect. To get access to this DBP module, please contact the author via twitter at https://twitter.com/leebambertgc. The buzz you get from actually peering around a corner is very cool, and is similar to virtual/augmented reality, but without the dizziness!
4. Detect Player Conversations
Earlier versions of the Intel® Perceptual Computing SDK had some issues with accurate voice detection, and even when it worked it only understood a U.S. accent. The latest SDK however is superb and can deal with multiple language accents and detect British vocals very well. Running the sample code in the SDK and parroting sentence after sentence proves just how uncannily accurate it is now, and you find yourself grinning at the spectacle.
If you’re a developer old enough to remember the ‘conversation engines’ of the 8-bit days, you will recall the experimental applications that involved the user typing anything they wanted, and the engine picking out specific trigger words and using those to carry on the conversation. It could get very realistic sometimes, but often ended with the fall-back of ‘and how do you feel about that?’
Figure 6. A simple conversation engine from the adventure game “Relics of Deldroneye”
Roll the clock forward about 30 years and those early experiments could actually turn out to be something quite valuable for a whole series of new innovations with Perceptual Computing. Thanks to the SDK, you can listen to the player and convert everything said into a string of text. Naturally, it does not get it right every time, but neither do humans (ever play Chinese whispers?). Once you have a string of text, you can have a lot of fun with figuring out what the player meant, and if it makes no sense, you can simply get your in-game character to repeat the question.
A simple example would be a shopkeeper in an FPS game, opening with the sentence, “what would you like sir?” The Intel® Perceptual Computing SDK also includes a text-to-speech engine so you can even get your characters to use the spoken word, much more preferred in modern games than the ‘text-on-screen’ method. Normally in an FPS game, you would either just press a key to continue the story, or have a small multi-choice menu of several responses. Let’s assume the choices are “nothing,” “give me a health kit,” or “I want some ammo.” In the traditional interface you would select a button representing the choice you wanted through some sort of user interface mechanism.
Using voice detection, you could parse the strings spoken by the player and look for any words that would indicate which of the three responses was used. It does not have to be the exact word or sentence as this would be almost impossible to expect and would just lead to frustration in the game. Instead, you would look for keywords in the sentence that indicate which of the three is most likely.
NOTHING = “nothing, nout, don’t anything, bye, goodbye, see ya”
HEALTH = “health, kit, medical, heal, energy”
AMMO = “ammo, weapon, gun, bullets, charge”
Of course, if the transaction was quite important in the game, you would ensure the choice made was correct with a second question to confirm it, such as “I have some brand new ammo, direct from the factory, will that do?” The answer of YES and NO can be detected with 100% certainty, which will allow the game to proceed as the player intended.
Of course this is the most complex form of voice detection and would require extensive testing and wide vocabulary of detections to make it work naturally. The payoff is a gaming experience beyond anything currently enjoyed, allowing the player to engage directly with characters in the game.
5. Detect Player Commands
An easier form of voice control is the single command method, which gives the player advance knowledge of a specific list of words they can use to control the game. The Intel® Perceptual Computing SDK has two voice recognition modes “dictation” and “command and control.” The former would be used in the above complex system and the latter for the technique below.
A game has many controls above and beyond simply moving and looking around, and depending on the type of game, can have nested control options dependent on the context you are in. You might select a weapon with a single key, but that weapon might have three different firing modes. Traditionally this would involve multiple key presses given the shortage of quick-access keys during high octane FPS action. Replace or supplement this with a voice command system, and you gain the ability to select the weapon and firing mode with a single word.
Figure 7. Just say the word “reload”, and say goodbye to a keyboard full of controls
The “command and control” mode allows very quick response to short words and sentences, but requires that the names you speak and the names detected are identical. Also you may find that certain words when spoken quickly will be detected as a slight variation on the word you had intended. A good trick is to add those variations to the database of detectable words so that a misinterpreted word still yields the action you wanted in the game. To this end it is recommended that you limit the database to as few words as that part of the game requires. For example, if you have not collected the “torch” in the game, you do not need to add “use torch” to the list of voice controls until it has been collected.
It is also recommended that you remove words that are too similar to each other so that the wrong action is not triggered at crucial moments in the game play. For example, you don’t want to set off a grenade when you meant to fire off a silent grappling hook over a wall to escape an enemy.
If the action you want to perform is not too dependent on quick reaction times, you can revert to the “dictation” mode and do more sophisticated controls such as the voice command “reload with armor piercing.” The parser would detect “reload,” “armor,” and “piercing,” The first word would trigger a reload, and the remaining ones would indicate a weapon firing mode change and trigger that.
When playing the game, using voice to control your status will start to feel like you have a helper sitting on your shoulder, making your progress through the game much more intuitive. Obviously there are some controls you want to keep on a trigger finger such as firing, moving, looking around, ducking, and other actions that require split-second reactions. The vast majority however can be handed over to the voice control system, and the more controls you have, the more this new methods wins over the old keyboard approach.
6. Tricks and Tips
- Using awareness of the player’s real-world position and motion to control elements within the game will create an immediate sense of connection. Deciding when to demonstrate that connection will be the key to a great integration of Perceptual Computing.
- Use “dictation” for conversation engines and “command and control” for instant response voice commands. They can be mixed, providing reaction time does not impede game play.
- If you are designing your game from scratch, consider developing a control system around the ability to sense real player position and voice commands. For example a spell casting game would benefit in many ways from Perceptual Computing as the primary input method.
- When you are using real world player detection, ensure you specify a depth image stream of 60 frames per second to give your game the fastest possible performance.
- Do not feed raw head tracking coordinates directly to the player camera, as this will create uncontrollable jittering and ruin the smooth rendering of any game.
- Do not use voice control for game actions that require instantaneous responses. As accurate as voice control is, there is a noticeable delay between speaking the word and getting a response from the voice function.
- Do not detect whole sentences in one string comparison. Parse the sentence into individual words and run string comparisons on each one against a larger database of word variations of similar meaning.
7. Final Thoughts
A veteran of the FPS gaming experience may well scoff at the concept of voice-activated weapons and real-world acrobatics to dodge rockets. The culture of modern gaming has created a total dependence on the mouse, keyboard, and controller as lifelines into these gaming worlds. Naturally, offering an alternative would be viewed with incredulity until the technology fully saturates into mainstream gaming. The same can be said of virtual reality technology, which for 20 years attempted to gain mainstream acceptance without success.
The critical difference today is that this technology is now fast enough and accurate enough for games. Speech detection 10 years ago was laughable and motion detection was a novelty, and no game developer would touch them with a barge pole. Thanks to the Intel Perceptual Computing SDK, we now have a practical technology to exploit and one that’s both accessible to everyone and supported by peripherals available at retail.
An opportunity exists for a few developers to really pioneer in this area, creating middleware and finished product that push the established model of what an FPS game actually is. It is said that among all the fields of computing, game technology is the one field most likely to push all aspects of the computing experience. No other software pushes the limits of the hardware as hard as games, pushing logic and graphics processing to the maximum (and often beyond) in an attempt to create a simulation more realistic and engaging than the year before. It’s fair to suppose that this notion extends to the very devices that control those games, and it’s realistic to predict that we’ll see many innovations in this area in years to come. The great news is that we already have one of those innovations right here, right now, and only requires us to show the world what amazing things it can do.
About The Author
When not writing articles, Lee Bamber is the CEO of The Game Creators (http://www.thegamecreators.com), a British company that specializes in the development and distribution of game creation tools. Established in 1999, the company and surrounding community of game makers are responsible for many popular brands including Dark Basic, FPS Creator, and most recently App Game Kit (AGK).
The application that inspired this article and the blog that tracked its seven week development can be found here: http://ultimatecoderchallenge.blogspot.co.uk/2013/02/lee-going-perceptual-part-one.html
Lee also chronicles his daily life as a coder, complete with screen shots and the occasional video here: http://fpscreloaded.blogspot.co.uk
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
Copyright © 2014 Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.