The availability of touch devices and simple communications protocols are shifting the way we interact with our computing devices. At the same time, the introduction of faster hardware running at lower power levels is improving and easing the interface hurdles of modern computing tasks. Perceptual computing—the next big wave of technology—is about to hit, bringing with it the promise of very engaging computing experiences. The Intel® Perceptual Computing Software Development Kit (SDK) and Intel’s work with Nuance and other companies are helping to turn this promise into a reality.
Reaching beyond the use of a touch-enabled screen, mouse, and keyboard, perceptual computing steps into the sensory world of voice commands, gesture control, and facial recognition where, for example, computers can understand our voices—not only from a set of specific commands—but also through our tone and phraseology. It also means 3D-object recognition, and hand and finger tracking at close range. The technologies that enable such interpretation will transform the human-computer interaction.
ENGINEERING CHALLENGES FOR A NEW WORLD
Today, most users communicate with their computers using the familiar keyboard and mouse, which offer a direct and recognizable collection of input for computers and present simple data points for the software environment to evaluate. When a user presses a key or clicks a mouse button, little about those actions can be misconstrued or misinterpreted. However, these actions limit users to a single interface.
Developers are now working to make computers as cognitive about their surroundings as we are (or at least as aware as we should be) so they can process much of the information around them and arrive at a logical conclusion based on a user’s intent. Extracting information from the environment will include data points such as the directionality of the voice (for example, is the user talking to the computer or to a friend nearby?), the ambient background noise, facial recognition for automatic user selection (and security), 3D maps of the environment for object and gesture recognition, and more.
“We believe the SDK will drive the ecosystem in helping bring up new human-computer interface experiences.”
— Anil Nanduri, Director of Products and Solutions in the Intel Perceptual Computing Group
To reach that goal, many important engineering tasks must be addressed, starting with the integration of new and improved sensors on PCs, Ultrabook™ devices, tablets, and smartphones. Cameras that can evaluate depth, microphones that understand directional audio, and touchpads with pressure sensitivity need to be standardized, miniaturized, and implemented across the ecosystem. For perceptual computing to take a firm hold, devices must be equipped with the next level of intelligence and capabilities.
Advances in environment awareness are becoming enabled by higher performance computing platforms, including the 3rd generation Intel® Core™ processor family found in Ultrabook devices. Intel engineers have developed improved process technology and CPUs that can maintain a real-time connection with different communication interfaces and can provide a high level of user experience. Although many hardware challenges remain—including how to address enormous data sets and further miniaturize sensors—Moore’s Law and Intel will continue to unlock the power of perceptual computing.
Perceptual computing is not about changing and remapping current interfaces, so don’t expect the keyboard and mouse to vanish from the world of PCs any time soon. Perceptual computing aims instead to create new modalities that redefine computing interactions.
THE INTEL® PERCEPTUAL COMPUTING SDK
In October 2012 Intel introduced a comprehensive SDK that focuses on several aspects of perceptual computing, including facial recognition, voice commands, and gestures. The SDK, which includes manuals, code samples, algorithms, example applications, and tutorials, helps developers integrate perceptual computing interfaces in as simple a way as possible. Intel has always been a proponent of improved software development; in this case, they found that a combination of sensor technology and hardware computing capability created the perfect opportunity to promote perceptual computing.
The Intel Perceptual Computing SDK supports several of the most popular human-to-computer communication modalities and focuses on those that interact with a user in the 6-inch to 3-foot range of the device. While the SDK will work with many platforms, it targets Ultrabook devices, clamshell notebooks, and tablets with the use of embedded microphones and add-on gesture recognizing cameras. (Systems with high-quality microphones and webcams with depth perception will soon be universally adopted; however, they’re not yet at the miniaturized level required for small device implementation.)
Combining the efforts of Intel’s internal software teams and the work of industry leaders, the Intel Perceptual Computing SDK supports a wide range of interaction types and will facilitate other software developers’ many integration methods. Close-range tracking of fingers will allow developers to define usages based on a person’s hand for augmented reality (grasping an object in 3D space) or for recognizing static and dynamic hand signals. Object tracking will allow developers to combine images from the camera in real-time with depth data so that “markerless” real-world objects can be used in virtual experiences. (See the “Now See This” article on augmented reality in this magazine.)
To handle face recognition and analysis, the SDK includes seven-point landmark detection and “attribution” detection including smiles, blinks, and even a user’s age. Finally, the SDK’s speech-recognition capabilities permit voice command and controls, as well as dictation and text-to-speech analysis.
Anil Nanduri, director of products and solutions in the Perceptual Computing Group at Intel, sees the company’s development and support of the Intel Perceptual Computing SDK pushing the industry forward. “We are about driving the natural user-interaction computing capabilities and working with the ecosystem for continued advancement. We believe the SDK will drive the ecosystem in helping bring up new human-computer interface experiences.”
The SDK is unique and is the first to address and combine several perceptual computing technologies, the result of Intel’s collaboration with other industry leaders. Total Immersion built the computer-vision tools, Nuance provided the speech and voice components, and SoftKinetic developed the depth-tracking component. Intel combined these tools into the SDK and linked them in such a way that developers can use them easily and use them simultaneously when their applications require it. The SDK additionally includes a program for putting Creative Interactive Gesture* cameras into the hands of developers.
The gaming industry was an early adopter of motion tracking and gesture control. Today, the SDK helps PC developers learn how to use and integrate these new interface technologies in a simple and straightforward way. Developers will be able to develop applications in the world of education, business, 3D modeling, 3D printing, and more.
For more information on the SDK, please visit: intel.com/software/perceptual
NUANCE DRAGON ASSISTANT*: A CASE STUDY IN PERCEPTUAL COMPUTING
Voice-control technology is one aspect of perceptual computing that is already experiencing significant progress. Nuance, famous for its Dragon Naturally Speaking voice-recognition software, helped guide and facilitate implementation of the SDK in the speech and voice capacities for Dragon Assistant. Nuance Dragon Assistant software will integrate with Ultrabook devices to enable voice commands spanning several applications, including Media Player*, Facebook*, Twitter*, and Web browsers. This early implementation allows users to speak a command to the computer. For example, after a user says, “Search Amazon for lawn chairs,” the Web browser opens, locates the Amazon.com site, and searches on the phrase “lawn chairs.” Sharing that result or a specific URL with Facebook friends or Twitter followers is also a simple phrase away: “Share this page on Facebook.”
Peter Mahoney, chief marketing officer at Nuance, explained to Intel® Software Adrenaline that, “Voice recognition is an extremely complex task—you must handle many different things, from audio, to voice models, to integrating with apps. We tried to make the tasks available at fairly high-level chunks to help make the developer’s job of integration easier.” With more power available, Nuance has greatly improved both the speed and accuracy of voice recognition by using multiple cores for better efficiency. That efficiency is becoming more important for Nuance and the Intel Perceptual Computing SDK as users increasingly expect longer battery life from their mobile devices.
As part of the SDK, Nuance’s technology will be made available to developers. They can use it to add and expand on the number of voice-controlled applications on Microsoft Windows* and Intel® platforms. Software developers can pass nearly all of the audio processing and handling to the development platform, providing Nuance an audio file and getting a basic text file result of the recognized terms. The application can then parse the result, using it in several ways. The goal is to make the use of voice-control technology simple and intuitive for developers to use.
As both Nuance’s technology and the Intel Perceptual Computing SDK evolve, expect voice recognition to expand and become more refined, too. The next big push will come in the form of natural-language processing and its requirement for more processing horsepower and much larger, cloud-based databases. Natural-language analysis will allow computers to not only understand commands but also to understand a speaker’s intent, an intensely more complex problem. Instead of being restricted to specific commands, natural language will allow users to, for example, say, “I need to find a new lawn chair,” and then see a search result in their favorite shopping application.
MULTI-MODAL INTERACTION AND THE FUTURE OF PERCEPTUAL COMPUTING
The modalities of voice, facial recognition, and gesture are important. But the true power and potential for perceptual computing reside in the ability to combine these interfaces and sensors, and to increase the amount of data accessible to the computer (from the user); and to increase the amount of information the computer can receive at one time. Imagine a computer that can “see” if you are looking away from the screen while talking to determine whether it should interpret your voice as a command or as ambient noise.
A natural interface will allow an understanding of true intent, which can be acted upon in a fluid manner. Many visionaries believe that extensive perceptual computing is an attainable goal in the near future, and the Intel Perceptual Computing SDK will help developers push forward toward that goal. Much like the way touch technology on mobile devices quickly gained momentum, wide-scale multi-modal perceptual computing is likely to be available soon. The ecosystem has evolved rapidly over the last 10 years, thanks to a surge of tools and computing power, dubbed by Nanduri as, “an innovation hockey stick.”
Intel’s support and creation of the Intel Perceptual Computing SDK shows a dedication to the expansion of computing technology. Because Intel, and companies such as Nuance, Total Immersion, and SoftKinect, created the SDK to be powerful and easy to use, application developers in all areas of expertise will be able to integrate and experiment with voice and gesture technology.
The combination of voice, facial recognition, and gesture will truly revolutionize the human-computer interface and help remove many of the computing boundaries today. Users of all skill types and knowledge levels will be able to effectively and efficiently interact with their devices to extract information and transparently get the results they want. The power and potential of perceptual computing are now beginning to take hold. In the coming years, watch for exciting developments that will transform the way we interact and use our computers and mobile devices.
INNOVATION AND PERCEPTUAL COMPUTING CONTEST
Intel created a USD 1 million innovation contest to feature the best examples of what’s possible with the Intel® Perceptual Computing Software Development Kit. This contest invites developers to be wildly creative with human-computer interaction designs. The ecosystem has many creative ideas for new interface technologies, and Intel is hoping to channel them into a public forum of collaboration. As with any new computing technology, Intel is looking for the handful of applications that will pop out of the screen, impress new consumers, and draw them into the scaling benefits of perceptual computing. To learn more about the contest, see: www.intel.com/software/perceptual
ABOUT THE AUTHOR
A contributing writer, roving reporter, podcast host, and blogger for the popular technology site PC Perspective, Ryan Shrout researches and writes product reviews for today’s leading-edge hardware and software products. When he’s not tracking down the pending changes to the various CPU and compute architectures, Ryan is on tap to assist the RH+M3 group when the need for a hardcore computer-geek writer arises.