From the Labs - 3D scanning with a camera

By Sean Koehl (Intel) (1 posts) on November 6, 2009 at 2:19 pm

Hi -- I'm joining the ISN blog community to share some research efforts from Intel labs that may be of interest to curious software developers. Today I wanted to give a brief summary of one visual computing application that we are investigating in a project we usually call "3D content creation for amateurs."

In this project we are researching an increasingly popular trend -- stitching photos from standard digital cameras into 3D models. There is a lot of interesting work going on in this area. One you may have heard about is Microsoft's Photosynth, which is capable of searching the whole internet to find different people's photos of, say, the Roman Coliseum, and create a 3D model of it. Recent advances in computer vision algorithms to register and align 2D photos into a 3D space combined have made this possible, and more and more companies are creating tools based on this concept.

Our researchers at Intel Labs China are working to further advance this field. Why is Intel doing this? Two reasons, primarily. First, we need to understand emerging software algorithms in order to tune and adapt future Intel architectures to run them more efficiently and in a way that can be readily programmed. Second, because this lab is a center of expertise in computer vision research, and they have some ideas on how to make these algorithms run faster and with better visual results. Here's a demo of the work.

Why do we think this is an important capability? For one, in the labs we see the increasing popularity of online games like World of Warcraft and virtual worlds such as Second Life as the foreshadowing of a transition to a full 3D Internet. However, many barriers remain to making these experiences as appealing as, say, YouTube is today -- they need better graphics, more natural interfaces, and ways for people to create user-generated 3D content with the same facility that blogs are written and videos are edited.

Photo-based 3D content creation tools such as the one from Intel Labs China are one way to simplify content creation. Just snap some photos from different angles and you will be able to scan a real world object to put it into a virtual world. There are still some hurdles to overcome, such as reducing the polygon count (if you view the video you'll see the result is a very detailed mesh), animating the model, and neutralizing the real-world lighting present in the source photos. However, these appear to be tractable problems and a lot of institutions are working on them.

The result will be that, just as almost anyone can be a video editor today, almost anyone could become a 3D designer/animator in the future, using real world objects as a starting point and morphing them into cool avatars or simply pieces of 3D art. These tools are already becoming available for professional animators -- as they advance they will give these artists a new way of creating interesting, realistic visuals.

This is only one of the cool things happening in Intel Labs. I’ll blog about them regularly to share more new developments and how we think they could help to shape the future.

Categories: Art, Music, & Animation, Visual Computing

Comments (10)

November 9, 2009 10:17 AM PST

sulman
Total Points:
20
Registered User
I think this is a fantastic idea. If you look at the next iteration of C++ currently being worked on right, (C++0x), one of the things creator Bjarne Stroustrup found critical was to make the language easier to program for beginners because, as the technical docs state: "Attention to beginners is important, because they will always comprise the majority of computer programmers,"....(Furthermore, a new coding paradigm can make even experienced users relative beginners!).

This move also just makes sense given the way "high end" technology is filtering down to the lowest segments of the computing market. Even the cheapest Netbooks today have at least 1GB of memory and use some form of dedicated graphics processing (and if it's integrated, it is at least 128MB). 6-8 megapixel cameras are also "bargain bin" finds at most stores today. Therefore, the amount of computing power available today to the average user is stunning.

I think bringing 3D content generation to the masses will facilitate amazing amounts of innovation and richer experiences for more people. Being able to import the models generated from stitched photos into more advanced programs like 3DsMAx and Maya will also greatly increase the speed at which experienced artists can create game assets (exactly like the way digital cameras helped greatly improve the range of textures game designers/artists could achieve...see, e.g. the making of Farcry 2!). So, for both beginners AND experts, this is a change that would take things to the next level while using the ubiquitous tech that everyone has access to today. Great concept!

Offset:Sulman
November 9, 2009 1:59 PM PST

Arti Gupta (Intel)
Total Points:
4,463
Status Points:
4,463
Community Manager
Great post Sean! With all the computing power available to end users this project can bring a richer experience to end users with 3D internet.
November 9, 2009 11:28 PM PST


Lana Bachynski
90% of this post sounds really appealing to me. I'm currently a 3D animation student, and I cannot begin to express exactly how exasperated I get when it comes to modeling. I always seem to get lost in the mesh some how and SOMETHING always goes wrong, and I never know WHERE it went wrong so I can never fix it, and then it conflicts with my rigging, and so on and so forth...

However, being both a fan of photography AND modeling (for dummies), this is some technology I'd like to get my hands on. The entire process is incredibly intriguing, sort of brings some sort of reverse 3D printer to mind. Rather than putting down one thin layer upon another, It's taking them in reverse. Now that's my kind of modeling!

What didn't appeal to me so much was the notion of a fully 3 dimensional internet interface. Although I agree with Sulan on how it opens up a whole new realm of innovation, however... I LIKE the internet how it is. I think too much 3D ANYTHING gets tiring and well overdone. Sometimes all I want to see is some good ol' minimalistic graphic design.

Offset:Latienie
November 10, 2009 2:35 AM PST


Nicolas A
I had already seen similar experiments but not with results this impressive. Making 3D modelling easier is always good and this technology will end up in our phones in a not too distant future. Making each one of us a 3D modeller on the go.
Imagine the possibilities with image recognition, or even the power of several thousands of pictures taken in the same city found in flickr. We could automatically and accurately model cities in 3D.
Of course, it's not for today but meanwhile we can already easily model buildings in 3D and texture them inside a browser with Google Earth building maker for example.

offset:nicolas
November 10, 2009 5:24 AM PST


Jonathan
Great post! I've never really read or heard about this before but now that you mention it, I think this will be a useful tool for me and my friend. We're in the midst of creating or developing a new video game. We're starting small just to get the feel of how things roll and to get a better idea how much time we'll need to devote to developing if we decide to go bigger. This technique of 3d styling using a digital camera could come in handy for some graphics and such throughout our game development.

Offset:JonB
November 12, 2009 9:53 PM PST

spyderfreek
Total Points:
25
Registered User
Having had the joy of using OpenCV for some research projects, I have to say, none of the other options I've tried were anywhere near as fast (even without using IPP), so I'm happy to see it's still being maintained. At the same time, I've also had enough experience to know that computer vision results are almost never clean without substantial tweaking, and even then there are usually cases which simply cannot be handled in an elegant fashion. So I have some questions about the results posted here:

- Are the models shown in the video the direct output of your program, or did they receive touch-ups (such as filling in holes)?

- The marble statue and relief sculpture were both relatively smooth, with large sections of homogeneous materials. How far can you push the complexity of materials and topology until the system breaks down (assuming you don't need the same level of fidelity as with human faces)?

- On a related note, what kind of solutions do you use for lack-of-data problems, such as those caused by occlusion?

I believe that in order to see this kind of application come into wide-spread use, it will not only have to be easy to use, it will also have to be extremely robust; there are only so many greek sculptures that are waiting to have their holo-photos taken. New techniques, such as the use of Hidden Markov Models in image segmentation, have put us huge strides ahead of where we were even 10 years ago, but I still see this video as a sign of how far we have yet to go before these dreams can be realized.

Offset:spyderfreek
November 12, 2009 10:25 PM PST

styromaniac
Total Points:
0
Registered User
This is the kind of technology I've been waiting for! Imagine the possibilities. Indie developers on a low budget will be able to model structures just by snapping pictures of them and then loading their card onto the computer. I don't see this as just a lazy man's way out. Not everyone can afford the time spent in modeling and... who knows? It could become the true standard for synthesizing our world.

The other part of this is that for the first time we'll be able to build our own levels to play in PC and console games once this technology is widely adopted. Even better, you can feature yourself in them! There's no way this technology won't sell like crazy. People like to customize their characters to look like themselves. I, for instance, try to be sure that my XBOX Live avatar looks like me with the nose, chin, mouth, eyes and ears. I also try to update changes in hair style and what I wear. You receive compliments if you make your avatar look like you, and so I have done. The most detailed avatar system (from what I've seen) has to be in Saint's Row 1 and 2. It's amazing how close you can get a 3D model to look like you. Say down to every last facial feature with the only exception being the nose bridge! Now this technology will most certainly do even more with extreme accuracy! I can see facebook taking advantage of this...

Maybe this technology will help find people who have gone missing. We could imprison criminals and bring back home more innocents faster than ever before. You get a multi-axis view of people's faces. It will become harder for someone to not recognize a criminal in a WANTED poster or a child who may have been kidnapped..

The future is 3D. Every web page, online media album and video game will one day have the full ability to be personalized by users in full 3D. This technology is a step toward that dream, if not the movement of 3D innovation itself.

offset:styromaniac
November 13, 2009 6:42 AM PST

Abhay
Total Points:
80
Status Points:
30
Green Belt
Great Post Sean! Well as the MeshLab tool can be used for similar purpose, how different the new technology would be than the existing one?(MeshLab)

Cheers
November 17, 2009 8:24 AM PST

Sean Koehl (Intel)
Hi - I wanted to reply to a few of the questions above.

Lana -- I agree that not everying will or should transition to 3D. Perhaps it is better to say a transition "to an internet that fully supports 3D." Text, and 2D videos/animations will always have their place. But we expect that 3D will give rise to new usages that can't even be predicted now, just as the social media revolution has led to new usages we didn't expect.

spyderfreek -- to answer your questions. 1) They were not touched up. However there is one part of the process that we sometimes must do manually, which is masking the object from the backgound. This is not necessary on a flat background, but "greenscreening" an object from a complex background is still a challenge. A tractable one, however, that we are working on. 2) You are correct -- more complex shapes, are much more challenging. The deeper the holes or pockets in the object, the more challenging it is to get accurate depth info or see behind occulusions. This relates to your question 3), and the answer is that the primary way to address this is to take more source photos. Currently we take about 2 dozen photos from different angles. For a more complex objects with many occulsions -- say, a tree, you have to take many more, with an associated processing cost. But as you can see in the video of one of the statues (beleive it is a Kirin), we can readily deal with some occulsions such as the legs with no problem. So, for morst character/avatar models it shouldn't be a problem.

Abhay - We actually use Meshlab as well to deal with the point cloud and view the model. Our research is mainly focuesed on the earlier phase of getting a point cloud from a bunch of 2D photos without needed to position the camera precisely or using special camera equipment.
November 21, 2009 1:42 PM PST

spyderfreek
Total Points:
25
Registered User
Thanks for the response, Sean. Considering all the work that's being done with graph cut algorithms and sbuject classification (I saw a great presentation by a <a href="http://research.microsoft.com/vision/">Microsoft researcher</a> on the subject), it seems like it'll only be a matter of time before segmentation, too, can be automated. I suppose once common video recording devices yield an acceptable level of quality (including problems with motion blur), you'll have all the redundancy you need. My last post was kind of a downer, but you guys are really doing some great work! Keep it up!

Trackbacks (0)


Leave a comment  

To obtain technical support, please go to Software Support.
Name (required)*

Email (required; will not be displayed on this page)*

Your URL (optional)


Comment*