Hi! I am the director of the University Collaborative Research in Intel Labs. I am starting up a series of blogs featuring the latest highlights of our collaborative research with universities around the world. I am not going to be doing “computer research for dummies,” but I will be writing about the latest exciting research topics in an easy-to-understand form. No jargon? Ok, maybe just a little. But, no matter what your level of tolerance for engineer-speak, I think you will find these topics interesting, informative, and thought provoking. And after reading about them you will also understand why the topics are important and how they may affect hardware, software, and computing products in the near future.
In this blog, I will tell you how Intel Labs is conducting collaborative research with several leading universities focusing on novel depth cameras, called RBG-D cameras, and what their future development possibilities might be. From 3D mapping and modeling to object and activity recognition, RBG-D cameras show promise for a wide variety of applications that researchers are just now scratching the surface of. Several RBG-D cameras have been introduced to the market since late 2010, including Kinect, PrimeSense, Creative* Interactive Gesture Camera and others. These have generated a huge amount of interest in software developer communities. This groundswell of interest has spawned a whole range of hacks and demos, especially in the gaming industry.
What is less known are the fundamental changes that RGB-D cameras are bringing to the broad research field of visual perception—using cameras as a generic sensing device to perceive the world in many more of its facets than just body gestures. Liefeng Bo and Anthony LaMarca from Intel Labs provided me with the background information on their research for this blog post. Liefeng and Anthony, in collaboration with University of Washington, Carnegie Mellon University, Stanford University, Cornell, UC Irvine, UC Berkeley, Saarland University and others have carried out a series of research projects to demonstrate that, by providing synchronized color and depth data at high frame rates, RGB-D cameras lead to breakthroughs and vast advances in visual perception, such as in 3D modeling, Object Recognition, and Activity Tracking. These advances are enabling novel perception-based applications that could change how we live our daily lives in the near future.
Research in Large-scale Three-Dimensional Mapping and Modeling with RGB-D:
The world is 3D and building 3D digital models is the dream of researchers and developers from many fields such as medical, computer-aided design and computer graphics. Three-dimensional modeling is a challenging problem. While 3D scanners exist for tabletop objects, modeling a large environment at the scale of rooms and buildings is a much harder problem and has been actively researched using either expensive laser rangefinders, like the Velodyne HDL-64E, or elaborate vision techniques, e.g., in Photo Tourism.
With an RGB-D camera, the 3D modeling problem becomes much easier and much more accessible to developers and consumers. At the Intel Science and Technology Center on Pervasive Computing hosted at University of Washington we have built prototype systems that allow a user to freely hold and move a RGB-D camera through a large indoor environment, such as a multi-room floor measuring 40 meters in length, and build a 3D model that is accurate in both geometry and color. The system runs in near real-time, merges a continuous stream of RGB-D data into a consistent 3D model, and allows user interactions on-the-spot such as checking partial results and rewinding to recover from mapping errors. Our RGB-D mapping work demonstrates that it is feasible to build a portable scanning device for large-scale 3D modeling and mapping in the near future.
What use could such a system offer? There is a long list of potential applications once we have easy access to 3D modeling capabilities. One example is home remodeling. For quite a while, people have wanted a visualization tool to show the effects of remodeling—moving walls, changing paint color and lighting, and arranging furniture before making costly mistakes. A related example is virtual furniture shopping, where instead of going to a furniture store, people download 3D models of furniture and “try it out” in their actual home setting. There are also plenty of opportunities for virtual reality, where accurate 3D modeling and 3D localization can deliver convincing experiences. Just as today we take for granted the availability of GPS coordinates and 2D maps outdoors, in the foreseeable future we could have applications that make indoor 3D maps and locations a reality.
Interactive 3D modeling of indoor environments, video, published at Ubicomp 2011:
Research papers on RGB-D mapping:
Robust Recognition of Everyday Objects with RGB-D:
For any system to intelligently operate in the world, it needs to understand the semantics of the world, such as objects, people, and activities. Object recognition is a fundamental problem that has been at the center stage of computer vision research. While (frontal) face detection and recognition are quickly becoming practical and being deployed in cameras and laptops, generic object recognition remains challenging. To recognize a coffee mug may seem easy, but it is difficult to build a robust application to handle all possible mugs, with viewpoint and light changes, especially when a mug, unlike a face, does not have a distinctive appearance pattern.
RGB-D cameras again make a fundamental difference in making object recognition robust as well as efficient. In our research, we have developed discriminative features for both color and depth data from an RGB-D camera, and used them as the basis to go way beyond the previous state of the art of recognition. We have evaluated our algorithms on a large-scale RGB-D dataset that covers 300 household objects viewed from different angles, and shown that we do much better than previous approaches, achieving ~90% accuracy for both object category recognition (i.e., is this a mug?) and instance recognition (i.e., is this Kevin’s mug?). In addition to classifying objects, RGB-D data also makes it much easier to extract multiple objects from complex scenes.
What are the uses of object recognition? In collaboration with Human-Computer Interaction (HCI) researchers, we have demonstrated an interesting scenario of object recognition in the case of OASIS (object-aware situated interactive systems). We have developed a system that “brings to life Lego toys” by identifying objects, e.g., a dragon and house, and their orientations, and using a projector to overlay interesting animations associated with the objects, e.g., a dragon breathing fire. Using our robust RGB-D algorithm as the underlying recognition engine, the Lego OASIS system has been successfully demoed on many occasions such as CES (Consumer Electronics Show) 2011. At the Intel Science and Technology Center on Embedded Computing hosted at Carnegie Mellon University, we have developed a robot that scans the shelves of a retail facility and can identify misplaced merchandise and build a planogram (products mapping) of the shop. We believe this is only the tip of the iceberg. Once we can reliably recognize generic objects, developers can create many applications such as monitoring elder care activities and assisting cooking in smart kitchens.
Lego OASIS Video:
Research papers on RGB-D object recognition:
Fine-Grained Activity Recognition with RGB-D:
Most recently, at the Intel Science and Technology Center on Pervasive Computing, we have started studying the problem of fined-grained activity recognition such as trying to use an RGB-D camera to understand every step in a human activity. To use cooking as an example, we want to track the hand locations and actions, the use of utensils, and the transfer of ingredients throughout a recipe. While previous approaches have used instrumentations, such as RFIDs on objects and accelerometers on hands, we show that it is feasible to do fine-grained activity recognition using only an overhead RGB-D camera, as shown in the following video:
By using mainly the depth data, our system reliably tracks the moving hands, with the active objects in them, as well as the inactive objects on the table. Objects on the table are identified by their appearance using both color and depth. Actions, such as scooping and mixing, are identified mainly using the hand trajectories. A recipe puts a high-level constraint on the set of plausible actions and their sequences. Altogether, with enough training data, everything that occurs during cooking may be recognized in real-time, including all the objects used, all the actions done to them, and the resulting state changes, e.g., as things are mixed or chopped.
Fine-grained activity recognition has great potential for many applications. Smart kitchens are an example, where we envision that a system could keep track of the cooking process, count the number of spoons of sugar that are added, issue warnings if one overcooks things, and provide suggestions if needed when working with a new recipe. Assembling furniture from IKEA is a related example where a smart system can “read” the instructions and offer assistance. Assembling Lego models is another such scenario. In general, being able to understand human actions and the objects involved is key to enabling seamless interactions between humans and automated systems.
If you want to know more, here is a good research paper on fined-grained activity recognition using RGB-D cameras: http://istc-pc-test-media.cs.washington.edu/papers/ubicomp2012.pdf.
That’s it. I hope you found this as fascinating as I did. Join me here each month. Let me know how I am doing and I will try to keep it interesting.