Perceptual Computing and why we are excited

Download Article

Perceptual Computing and why we are excited [PDF 374KB]

By Chris Skaggs (Founder and CTO of Code Monkeys)

Back in mid-2012 we had the privilege of seeing some cutting-edge hardware and software many months before they were widely announced, a perk of being an Intel® Black Belt Developer. In a tight little room with about a dozen other developers, we watched as an Intel code-genius was demonstrating his newest toy: a slick little package about the size of an Altoids* box that packed a surprising number of sensors. It was the Creative* Interactive Gesture camera, and we were watching in real time as the multi-lens package was “seeing” things I didn’t know were possible at that size or at that
minimal distance of about 8-24 inches. Things like facial position and orientation, hand motions, gestures, and even the implied IK ‘bones’ in the user’s hand. It saw fingertips, smiles, and frowns, and was able to respond to gestures like pointing, thumbs-up, hand waves, and peace signs.


It was a quick demonstration, just 20 minutes or so, but I was intrigued. The tools, a camera package and an SDK, seemed to offer a fascinating glimpse of the future. We hadn’t seen a ton, just 20 minutes or so, but the demonstration was impressive. The developers in the room immediately started brainstorming applications and use cases. American Sign Language? Virtual keyboards? Virtual CAD design? It was a fun free-for-all about what this kind of technology could provide when it was really tuned. That’s by far the best part of the Black Belt community—the wide open collaboration with a bunch of exceptionally talented programmers and visionaries, all of whom approach their work with a geek’s eye for making the world a better (and more fun) place through technology. Several months later, with that demonstration still percolating in my mind, we were invited to participate in the Intel Ultimate Coder Challenge: Going Perceptual, and without hesitation we said, “Where’s my camera?!”

We were definitely excited.

Enthusiasm aside, the challenge was one of those ‘simple but hard’ propositions: take the tools and do whatever you can to make an app ‘perceptual.’ In our case, we were in the last stages of developing a touch-based game called Stargate SG1 Gunship, and we thought it would be fun to make the game play hands-free. After a quick examination of the Unity* 3D SDK and some experimentation to see what was possible we finalized that decision and started writing scripts.

First Thoughts

It’s fair to say that the availability of a Unity 3D plug-in was an important factor in our decision to participate. While we have some great coders on the team, time constraints would have made a native coding project a lot less attractive. But instead we were able to start ‘playing’ with the camera right out of the box. The plug-in was easy to install, and our ‘hello world’ experience to get an initial taste of success was done in less than six hours. That ‘Eureka moment’ is an underestimated phenomenon in my opinion, but when it comes at the right moment it builds an ephemeral and encouraging ‘we can do this’ confidence early on.

I must say that Intel’s decision to have a Unity plug-in right out of the chute was a shrewd play. The game development community has a very high tolerance for experimentation and bleeding-edge experiences. Likewise, there is a core group of gamers willing to try anything that’s new. Between the two I think the critical mass needed to bring perceptual computing to the mainstream is possible. If folks were willing to buy a set of bongos to play music with Donkey Kong, you can bet there are folks willing to buy an experimental camera to explore never-before-seen interface methods.

Still, initial hooting aside, we quickly found out that we would have to cut back from our Buck Rogers expectations on exactly what we could do with the generally rough SDK as it stood in beta. For instance, the Holy Grail for us, and indeed for many others in the competition, was gaze-tracking: the ability to detect what the user was looking at on the screen. Simply put, as cool as the tech was, it was not possible. So most of us started working on head-tracking instead, which also proved to be frustratingly elusive and difficult to provide in a reliable, high-FPS way. Other unrealistic expectations included dynamic dual-hand controls, rapid gesture recognition, and high-resolution depth perception.

But while a few features were out of immediate reach, the things that did work, and worked well, proved to be a powerful and catalytic set of tools that led to some fascinating insights for our entire game development methodology.

Vestigial UI/UX

We were having a lot of fun with the challenge, and Stargate SG1 Gunship was coming along nicely, but there was a point where we all experienced a sensation that something just wasn’t working. It was hard to put your finger on (so to speak) but the feeling was that the player was jumping through gestural and interactive hoops that were plainly inefficient and it was dissatisfying in a kind of existential way. The UI wasn’t ‘working’ even though it worked just fine. We were all adjusting to the UI in a way that felt contrived instead of it adjusting to or accommodating us. Writing that out loud sounds like a juvenile expectation but if this thing was good enough to know what I was pointing at, why wasn’t it doing something more meaningful with that information?

The ah-ha moment came when we were working on a menu system that really had very little to do with the game. We were trying to organize various layers of menus and information and connect them with a typical chain of forward and back buttons. It was the button that really did it. Sitting there asking dumbly, “why am I here?”…and we didn’t have an answer. Or rather, all we had was a dumb answer that went something like, “that’s the way we do this thing.”

But the more we looked at this menu problem, the more we realized that our entire line of habitual thinking was built on UI design going back 40 years at the very least and probably more like 100 years of industrial design. A button is a metaphorical link to a function we want to perform. The ‘E’ button on an old typewriter is a mechanical link that actuates a gear that swings a lever that bangs a ribbon that adds ink to paper that indicates a vowel. The Button is always an abstraction of a more complicated process or action. At its simplest, perhaps it dings a bell or closes a circuit. At its most abstract it’s the “Go” button that makes some complicated magical thing happen. But it is never the thing itself. Except for the fact that our society has trained itself to understand the common metaphor, it really is not the way we experience the world. So why, in a fully realized 3D virtual world coupled with an input that allows the world to experience us as other 3D objects, are we letting this 2D abstraction get in the way? Why are we organizing information in 2D hierarchical layers of menus and submenus with a complex system to ‘drill down’ when we can organize our information spatially—the way we naturally seek to organize our natural world?

The conversation about menus and buttons led to other conversations about touches and clicks and a whole slew of UI conventions that we came to see as leftovers from previous technologies. For example, why were we clicking anything in a game with no mouse? We’d had some of these conversations a few years prior as we were exploring the impact of touch technology, but what was intriguing with touch was compelling with Perceptual Computing.

Robust touch interfaces allow us to interact with our virtual stuff as if they were 2.5D objects. Not exactly flat since we intuitively understand that these things can be stacked and the stacks can be rearranged. But because the interface object itself, a screen, is two dimensional, our interactions are naturally flattened out. Still, it’s a huge leap from even the best mouse driven GUIs because touch gets us one layer of abstraction closer to what we’re trying to manipulate. When Bill Gates infamously stated that the iPad* was missing a stylus, it made me think that he’d failed to grasp the real potential of touch interfaces done well. And what made the iOS* touch interface work so well was that it worked the way we intuitively wanted it to work. It was built from the ground up FOR touch interaction. The alternative, and what several less successful UIs have done, is to take tried-and-true mouse & keyboard GUIs and turn a fingertip into an inaccurate mouse. An example of this kind of thinking is the virtual thumb-sticks that appear in many mobile shooter games. Those UIs are passable; we get how they work because they remind us of UIs we’re used to like mice and joysticks, but they fail to meet our desire for something that feels natural. Perceptual computing offers a way to remove yet another level of abstraction by bridging our 3D real world and the virtual 3D world, using the communication and interaction tools we are most comfortable with from our birth—our faces and hands.

It’s no overstatement to say that this train of thought radically changed the way we pursued the Intel Ultimate Coder Challenge: Going Perceptual, but it also radically affected the way we’re approaching UI design from this point forward on all platforms.

The Unbearable Light of Clicklessness

Once we started looking seriously at what could be done with a perceptual input, we saw input as a core mechanic as opposed to a gimmick that we had been tinkering with before. It was new territory and required that we rebuild the entire game and meta-game UI to reflect the new paradigm.

As mentioned above, we started with the menu system. Instead of creating a flow chart of two-dimensional panes or information, we started collecting information into ‘rooms.’ There was the Team Room that held all the data related to the team of soldiers a player sent into the level: their gear, their upgrades, their buffs, etc. We had a ‘Ship’ room to configure your gunship, a ‘Lab’ for your tech research, etc. Then we arranged the rooms spatially around a central ‘mission object,’ the sum of the other rooms, and a discreet kit for each specific mission. The player spun, grabbed, pushed, and zoomed around rooms and objects that felt distinct, non-abstract, and ‘real.’

The most immediate benefit was in ease of comprehension. We found our minds were eminently capable of holding this kind of information within a spatial organizational structure. The kinesthetic sense of touching and moving things with the PerC camera made the experience even more accessible.

Building from this experience, we went back to the core game UI and reconsidered things there as well. One massive ah-ha was realizing a convention we’d inherited from console first-person shooters: the hard link between where I look and where I aim. By decoupling the two components we saw a completely unlooked-for mechanic, and the pace of the game shot up dramatically with fewer cycles, a better frame rate, and controls that were far more natural.

What I found particularly intriguing was how these UI conversations actually made our touch version of the game much better as well. Almost everything that worked well for PerC was easily translated to touch as well, making the commercial product far more fun to play.

Ultimately the reason for improvement in both interface worlds came down to one thing: fighting the urge to automatically lean to conventions and metaphors from previous UI schemes. It’s a tricky thing to do to be honest. The thing about assumptions is that you’ve stopped even noticing what you’re assuming so it’s not easy to get under all of that. We had to keep asking ourselves ‘what do I want to do?’ ‘How do I want to interact with this thing?’ In most cases we found that our UI desires leaned toward simple and kinesthetic. And we also found that more often than not, there was a way to do that with the tools we had, or at least something satisfyingly close. A great example is the retreat function in a particular level of the game. The goal of the level is to hold a center position as long as possible as foes increasingly press your position. Part of your job is to call the retreat so your soldiers can get to safety. What I want to do is scream “Go! Go! Go!” like I see in the movies—and with the PerC kit’s voice commands I can do exactly that.

Some Nitty-Gritty Details

Planning great perceptual UI is fun. Building artwork for a new kind of UX is really fun. But it’s all hypothetical if you can’t make the code work, and we had very few tools or precedents to look at as we sought to build a new way of interacting with our game world.

The ability to sense and differentiate between hand gestures was a core component of our plan and one of the tasks that garnered the lion’s share of implementation effort. What I’d seen in prototype those months before was a controlled lab setting where the demonstrator was already intuitively familiar with the software’s limitations and boundaries. As a result, he intuitively avoided pushing it, by motion speed or visual angle, beyond its ability. We, on the other hand, started with our Sci-Fi visions of perceptual control and quickly found that our expectations were a significant distance from what was realistic.

Ironically, one of the challenges was the fact that the camera was too accurate and sending us too much data. For one thing, it was a significant burden on the processor to parse all those values in every frame, but we especially found ‘jiggle’ to be a real problem with any hand functions. Most people don’t have the steady hands of a surgeon and a certain amount of jiggle is normal even when they are trying to hold their hands steady. The camera is quite capable of picking that up, and when we tried a simple 1:1 mapping of hand position to reticule position it was horribly jerky. The solution we came up with was a function that averaged the player’s hand position over several frames to smooth out the motion.

Sample Code:

   leftHandScreenTotal += leftHandScreenPositionTemp;
   leftHandScreenTotal -= (Vector2)leftHandScreenArray[0];
   leftHandScreenPosition = leftHandScreenTotal / totalArrayLerpStorage;
   rightHandScreenTotal += rightHandScreenPositionTemp;
   rightHandScreenTotal -= (Vector2)rightHandScreenArray[0];
   rightHandScreenPosition = rightHandScreenTotal / totalArrayLerpStorage;

Looking Forward

Our experience with the Intel Perceptual Computing Challenge Phase II and perceptual computing in general is just beginning. We are still finding our way through and haven’t anything even remotely like best practices, but it’s fun. The potential for PerC UI schemes is a wide-open field, and we can’t wait to see what novel approaches the next generation of programmers come up with to leverage this potentially paradigm-shifting technology.

Author Bio

Chris Skaggs is a 13-year veteran of the web and mobile software industry. As founder and CTO of Code-Monkeys and Soma Games LLC, Chris has delivered software applications to some of the country’s most discerning clients like Intel, Four Seasons, Comcast, MGM, and Aruba Networks. In addition to corporate customers, Code-Monkeys has programmed many casual games for the Apple iPhone* and iPad* and the Android* and Mac*/PC platforms. Inside Code-Monkeys is a group of dedicated technicians and artists dedicated to cutting-edge game design and development. A Black Belt in the Intel® Software Developer Network, Chris also writes and speaks on topics surrounding the rapidly changing mobile application environment at venues like GDC Next, CGDC, Casual Connect, TechStart, and Serious Play.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
Copyright © 2013 Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.

For more complete information about compiler optimizations, see our Optimization Notice.
PDF icon intelsgg-whitepaper-final.pdf373.52 KB