The Intel® RealSense™ SDK has been discontinued. No ongoing support or updates will be available.
By Dominic Milano
Reaching deep into the video stream, Intel® RealSense™ technology heightens the model of interaction between humans and their machines
In today’s connected world, more people are grabbing their mobile devices and logging in from wherever they are to socialize with friends and work with colleagues. Video, voice, and text chat—along with file and screen sharing—have proven effective in enabling remote communication and collaboration. But imagine the possibilities if instead of viewing a PowerPoint* presentation on a shared desktop or looking at a concept drawing that you and your teammates downloaded from the cloud, you could all virtually “step into the presentation” and point to (or even manipulate) critical details in real time. Imagine you and other collaborators doing those tasks while sitting in front of your device, waving your arms, and pointing your fingers? By previous norms, that might sound complicated and inordinately expensive. By the end of 2014, such remote, immersive collaboration will not only be possible, it will be facilitated by inexpensive apps and browser plug-ins that tap into Intel® RealSense™ technology built into next-gen detachables, notebooks, 2 in 1s, all in ones, and Ultrabook™ devices.
Intel RealSense technology and the Intel® RealSense™ SDK 2014 (beta) combine several perceptual computing technologies and capabilities:
- Best-in-class depth sensing cameras give devices the ability to perceive depth the same way our eyes do
- Hand and finger tracking for controlling devices more naturally
- Voice and facial recognition for more intuitive control and biometric-based secure access
- 3D scanning of physical objects for combining real and virtual worlds as well as 3D printing
- Emotion recognition
- Heart-rate detection
Together, these technologies are enabling innovative new use cases in gaming, education, entertainment, content creation, communication, and collaboration. This article, the second in a series of five, provides insights into perceptual computing from two trailblazing ISVs that are using the Intel RealSense SDK 2014 (beta) and second-generation Intel® RealSense™ 3D cameras to develop immersive collaboration applications.
Enriching Collaboration with Depth Data
The Intel RealSense camera (F200) is 12.5mm high and just 3.75mm thick. “This front camera, which is designed to face the user and can understand distance and movement, will start to appear in devices in late 2014. Many OEMs, including Acer, Asus, Dell, Fujitsu, HP, Lenovo, NEC, and Toshiba have committed designs,” said Michael Liu, immersive collaboration product manager for Intel RealSense technology. “The technology can determine what’s background, what’s foreground, and where things are relative to the shoulder, hand, and face of the person in view.” Using depth data, exact pieces of the video image can be segmented and then removed, replaced, or enhanced algorithmically in real time. Intel calls the process background segmentation. Think of it as a dynamic, real-time green screen effect such as those used by TV weather reporters, but without the need for a green screen backdrop.
“We use a complicated set of algorithms to identify and segment a person,” Sanjay Patel explained. Patel is CEO and co-founder of Personify, the company that created the background segmentation library included in the Intel RealSense SDK 2014. The segmentation algorithms look at color and depth data together, determine what to display and extract, and then deliver the results in 60-fps HD video.
“Getting details like the user’s hair or following the exact contour of a finger in real time is a huge challenge,” Patel continued. “It’s even more complicated on a small form-factor computing device, although the new Intel® Core™ M processors will be able to handle the workload.”
Transforming Virtual Meetings: Background Segmentation
For video chat providers like ooVoo and Personify, background segmentation promises to transform video calls and virtual meetings by giving users the ability to replace or remove their background and project users into a collaborative virtual environment.
While you’ll never have to worry about participating in a video conference call while sitting in a cluttered room again, the use cases for being immersed in a virtual meeting span broadly. “Being onscreen with a PowerPoint presentation is one obvious use case,” Patel said. “Imagine watching a movie with your friends or playing a game and being able to see their faces, not by looking at them in separate video windows but together and overlaid in the same space, sharing the experience as a group.”
For consumers and professional users alike, Personify is an immersive video chat app that leverages the Intel RealSense 3D camera and background segmentation to transport users to virtual environments where they can play games, watch videos, study together, and more. Personify and Intel worked closely together to optimize the app for the Intel RealSense 3D camera.
In addition to sharing a virtual space together, Personify allows users to present their personas on top of a PowerPoint presentation or live demo to a client for a more engaging presentation. This feature is currently available as a standalone product. “For IT departments,” Patel explained, “having control over what’s visible in the background of a video image is key where security is concerned. What better way to have confidence that an employee isn’t accidentally revealing trade secrets than to automatically replace a background with a logo or a whiteboard that’s blank instead of covered with sensitive info?”
Additionally, existing users of Skype* and other video-conferencing clients that support the Intel RealSense 3D camera driver can leverage the background segmentation capability through Personify Cameo*, which is currently under development. According to Patel, Personify Cameo allows Skype users with Intel RealSense 3D cameras to replace their backgrounds to look like they are sitting at the beach or on a snowy mountain top.
Beyond Background Removal
ooVoo bills itself as the world’s largest privately owned video chat service provider. It currently supports up to 12 simultaneous HD video streams, real-time text chat, and sharing features that, for example, allow users to watch movies together. “Our platform is about real-time video communication,” said ooVoo CTO Chang Feng. “Intel RealSense technology gives us the ability to enhance the multi-user experience in a number of very interesting ways.”
Initially, background segmentation will play a central role in ooVoo’s implementation of Intel RealSense technology. Feng envisions taking advantage of other Intel RealSense technology-enabled capabilities to build an even more compelling communication and collaboration experience. In the future, ooVoo hopes to give users the ability to interact with the operating system to control UI elements with gestures and voice commands while being projected onto the desktops of friends and colleagues. “We also want to add a layer of intelligence. Our goal is to move beyond rendering what the webcam sees, into extracting and exchanging data that enhances the video in other ways.” For example, ooVoo users often complain that they need to prepare for a video chat session to look their best. Using depth data, ooVoo hopes to develop algorithms that enhance a user’s appearance or improve the appearance of a company’s logo.
With more than 100 million users, ooVoo aims to provide the best possible experience on legacy systems as well as on the latest mobile devices. To do so, they offer tiered services based on client machine capabilities. “Background replacement and other new features built on the Intel RealSense SDK will be available on the latest Intel platforms and SoCs to give users the best experience,” Feng said.
ooVoo and Intel engineers have been working together for several years, fine-tuning the ooVoo platform’s code and integrating it with the Intel® Media SDK, using its APIs to accelerate video capture and rendering. Details of how the Intel® VTune™ Analyzer, Intel Media SDK, and Intel® Performance Primitives were used to optimize performance on systems running the Intel® Atom™ processor and 4th generation Intel® Core™ processors can be found here.
Moving into the Cloud
For ooVoo, background segmentation and other computationally intensive operations are currently handled on the client side (and will require Intel® Core™ i7 or Intel Core M processors), but the company is working on moving those workloads into the cloud. ooVoo has a cloud infrastructure for handling video that’s capable of analyzing client-side capabilities, networking conditions, packet loss rate, and jitter. The ecosystem then adjusts dynamically to networking conditions to deliver a state-of-the-art quality of service framework that Intel Software and Services application engineers helped optimize for Intel platforms.
Integrating ooVoo and Intel SDKs
Considering itself both an ISV and a service provider, ooVoo offers its ooVoo Communications SDK for developers to build advanced services on top of the ooVoo platform. “We’re working with Intel to integrate the Intel RealSense SDK capabilities—support for face tracking, gesture recognition, and all those good things—into the ooVoo multi-point video communication and collaboration framework. We have plans for our own products that use background replacement and blurring, motion detection, and other new features, but we believe the impact of Intel RealSense technology on immersive collaboration is so huge that ooVoo won’t be able to cover all of the possible use cases on our own.”
According to Feng, they combined SDKs so that developers can implement whatever use case they have in mind to leverage the Intel RealSense technology capabilities, while simplifying the communication layers. As a result, developers won’t need to set up their own server environment or handle video streaming.
Extracting Intelligence from the Video Stream
Feng sees immersive collaboration as the beginning of a new wave of innovation based on intelligence that was previously locked within pixels. “Remember when Google released the page rank algorithm? A wealth of untapped information was suddenly unlocked,” Feng said. “Intel RealSense technology lets us reach into and between the pixels of a video stream. Looking at the real user and the device interaction camera, and accounting for all the other sensors, it’s a powerful new model of interaction between humans and our machines.”
Patel agrees, envisioning a time when perceptual computing, particularly emotion detection, is used in conjunction with big data analytics. “Imagine you’re playing a game with friends and you want to transform into a virtual character. Intel RealSense technology lets you control your avatar, moving its eyes when you look away, talking when you talk, gesturing when you gesture. Taking the idea further, an emotion-recognition engine kicks in, and your avatar changes color or the music becomes upbeat and rays of sunshine burst forth—all based on how you and your friends are feeling. When you combine big data analytics with Intel RealSense technology, it will be a golden opportunity for developers to deliver even more incredible experiences.”
Explore Intel RealSense technology further, learn about Intel RealSense SDK for Windows beta, and download a Developer Kit here.
Is your project ready to demonstrate? Join the Intel® Software Innovator Program. It supports developers who have forward‐looking projects and provides speakership and demo opportunities.
Check out this video demonstrating how Intel RealSense technology tracks emotion and recognizes 10-figure gestures.
Want more details on Intel RealSense technology? Read this.