September 23 at 23:59 GMT was the deadline for submission of Intel Perceptual Computing Challenge projects. We squeaked in at the last minute, and we can’t wait to know the results. Instead of going crazy waiting, I decided to write about our project and how we prepared for this contest.
Fair warning: this isn’t necessarily a success story. Nobody can predict if our project will succeed or not. Many things could happen that we aren’t able to anticipate. Whether we win or lose, we hope that our story will be of some use to other developers.
After the last Hackathon in Nizhny Novgorod (Russia), we heard there would be another contest coming based around the Intel Perceptual Computing SDK. We knew we would want to participate. The beginning of the contest was set for June 6, and project descriptions needed to be submitted by June 26. Contest moderators would select 750 ideas that would move on to the next phase.
We had been working on an idea long before we knew about this contest. How often have you chatted with someone via video, and there’s something in the background that you don’t particularly want the person on the other line to see? For example — people in a cafe, busy co-workers at your crowded office, or a mess that you didn’t have time to clean up. We decided to make an app that would solve this problem: Virtualens.
The framework for this app came together quite intuitively. The new generation gesture cameras that work with PerC SDK - i.e., the Creative* Interactive Gesture Camera - can produce both RGB imagery and depth mapping. Using a combination of these two outputs, we predicted that we could emulate a shallow depth of field effect. That would mean that only objects at a certain distance would stay in focus, while everything else would become blurred. We’re all familiar with this optical effect; moviemakers call it “shallow focus”. It can be attained using the narrow aperture of a camera lens.
We had a working prototype before the contest. We’d coded it at the Hackathon but there was still a lot of work ahead. I took a month long vacation - I work for an American software company office headquartered in Russia – and I spent the whole vacation on algorithm optimization.
Many issues kept popping up. For instance, we knew we had to do something with holes in the depth map that projected to the RGB image (the green pixels in the image below). This happens because of pixels with no depth data.
There are several reasons this problem kept occurring:
- Depth sensor of the camera cannot reach some objects.
- The distance to objects is too small, they are under a bright sun, or the material is shiny. In any case, we won’t receive correct depth data.
- In the image above, you can see a shadow of my hand falling on a curtain. This happens because the depth sensor and IR emitter are in different parts of the camera. You can also get this effect when you take photos on your mobile phone with a build-in flash.
Here’s the really tough one: artifacts appear after stretching the depth map over an RGB image. The depth and image sensors have different fields of view. Combining data from the two would be like trying to wrap a ball with a flat sheet of paper – you will have to cut it and glue the pieces. Surely you can do it in a more careful way using interpolation and other factors, but you’ve also got to do it fast enough. If PerC SDK had another algorithm that could give you a perfect picture but consumed all CPU while giving only 10 fps this problem could be solved.
We studied lots of research on the topic and examined classic inpainting algorithms. Finally, we managed to make a heuristic algorithm which met most of our needs.
In the midst of all this, we got a nice surprise when the Challenge’s submission deadline was shifted to July 1, giving us another extra week to work. We made good use of this extra time by polishing our algorithms. We even managed to implement a tech demo and make a short demonstration video.
In the video below, you can see our initial implementation; objects placed both near and far from the camera were both blurred.
The long-awaited for results came on July 12, twelve days after our application submission. The finalists were to receive Creative* Gesture cameras. We already had one that we’d received at the Hackathon, but could certainly use another one: there were two developers on our team, and we needed to check the difference in calibration of the cams. Our first camera gave a serious shift projecting depth map on RGB image. (You can see it in the image to the right. Green pixels represent the depth data of my hand.) We couldn’t figure out if all the cams had the same shift or if each one was different. This was critical information for us to know.
We made good use of our time while waiting for the camera and solved two more tasks on our agenda for Virtualens:
- We gave our app the ability to blur the image where it was needed, and not to blur it where it was not needed. We also had to think about how to better transmit the rendered video stream to such apps as Skype. Basically, the app needed to pretend to be a web camera. After performing some experiments, we decided to do it through a DirectShow filter. Unfortunately, DirectShow is not supported in Windows Metro style apps; they use Media Foundation instead, and Media Foundation doesn’t allow your app to pretend to be a webcam in third-party apps. You can implement this in Metro only by creating a comprehensive driver which is far more difficult and time-consuming. To make a workable app for the contest on time we had to abandon support of Metro.
- We lacked accurate user interface, logo, icons, decent video and other non-programmer stuff. We needed someone creative, and thus our team expanded. Now there's thee of us: Nadya (GUI programmer, C# specialist), Alex (graphic designer and video director), and me - Alex (DSP, C++, system programmer and team leader).
We were glad to have on our team someone who could manage a graphic editor. We started creating different variants of the app’s logo. (Yeah, with only a month left and tons of work ahead of us it seemed like the right time. Back then we thought the app was almost ready. We were wrong.) We spent about a week just on creating icons. Then we put one in the system tray – the place where the app lives. By clicking on it, you could launch a small window and change some settings in case you didn’t want to do it using gestures. The system tray icon alone had been remade several times at this point.
After that we got down to the user interface. It was obvious that users wouldn’t see it very often. It was more likely that they would adjust the app settings with gestures, usually somewhere in Skype or Google Hangouts. The settings dialog would still be needed when a user wished to turn gestures on/off, adjust focus distance and blur rate, especially when he or she did it for the first time. We thought we could use the settings dialog to teach users how to use gestures to control this app. The gestures we chose are simple, but people don’t use them very often. So we thought that it would be a good idea to let users see the gestures, and learn how to do them right.
We also needed to show users the difference between focus distances and blur rate. We implemented a dynamic GUI component that illustrated the difference and showed how the picture would look like using selected settings. Unfortunately, different people interpreted these illustrations in a different way. I think it looked too mathematical and complicated. In the end, we decided to cut it out.
We finally decided to simplify the interface and make it in the native Windows style. WinForms didn’t look native by default, so we had to mess about with them a little. To make our settings window look more like an interactive tutorial rather than a boring set of checkboxes and sliders, we borrowed a simple idea from Trackpad’s settings dialog, which plays you a short video demo each time you hover over a certain item in the menu.
Here’s what we made:
The play buttons were added for touchscreens. Tapping or clicking on them would trigger short video tutorials for the four gestures. By default, the video box on the right shows you a preview from the camera.
The second camera finally arrived. To our great disappointment, it didn’t really have any shifts of depth map against RGB imagery. We didn’t know whether that applied to only this particular camera, or if all the new cameras were like that. We knew only one thing for sure: we wouldn’t know if cameras used by Challenge judges would have any shifts. So, we had to create a calibration tool, one that was simple and user-friendly. This turned out to be quite a challenge, in fact; it was so difficult that it could ruin our whole project. The difficulty was not in creating such a tool; rather, it was in making the tool simple enough for the average non-technical user.
Fortunately for us, Intel put together a Q & A session on August 9. In this session, we asked Intel developers about calibration, and they answered that they knew the problem existed. It turned out that the reason for this issue were depth sensors which were incorrectly attached to the cameras and therefore could result in a shift during shipping. They promised to release a calibration tool soon. Whew! What a relief! Thanks Intel!
The deadline was coming fast and we had no video yet. We found a guy with a camera and rented a small studio that was ready to put up with us one night for a reasonable amount of money.
I never thought it would be so hard to speak in front of the camera. Now, I have done interviews before. But this was not like that at all. It requires a lot of practice and experience to tell a story and look directly at the camera. Especially when there are two of you in the shot, because one of you will surely glance in some other direction at the moment the other one’s speaking. So we had to retake again and again until it was quite late. We ended up throwing our initial footage away, deciding to give it another try later in September.
On August 10 (sixteen days before the finish line) the deadline was postponed again. This time it was set for September 23. We were given a bonus month. However, I can’t say that we took advantage of it. Instead, we relaxed. Then came September bringing cold weather, the flu, and more work at my day job. The project work stagnated.
We still managed to accomplish something useful during this time, less on programming and more on market research. We knew there were similar solutions out there and there had to be something we could learn from them. For example, I found out that Google Hangouts can blur and replace your background.
Then we found a more formidable competitor – ”Personify for Skype” – a project supported by Intel. These guys are already somewhere we only dream about. Personify works as a plugin for Skype. When I discovered this, it became much clearer to me why Intel PerC SDK guys thought that Virtualens was a plugin too, and why they knew so much about Skype plugins. To be frank, I was not impressed – it seemed buggy. After Personify starts together with Skype it begins using the camera and consuming 70% of CPU with no outgoing calls. Apart from that, with this app running I couldn’t launch any other PerC SDK application that uses depth and RGB data at once.
The results of our small research filled us with optimism. We had learned one very important thing — we could do better than others out there in this space.
One week before the deadline
It always happens: the biggest part of the job is left right before the deadline. Suddenly there’s only one week left, and there’s still a lot of work left to be done. The main functionality of the app worked well, but there was still so much work to be done. We had to update the quick settings window, debug communication between application components from different address spaces, build an installer, and do hundreds of other small but important things.
During the last week we worked like hell. Twenty hours a day if not more. We improved and fine-tuned gestures. Initially, we had planned that users would enter the settings mode by clapping their hands in front of the camera. Then we thought that this could give the person you're calling the wrong impression. So we replaced it with a combination of two gestures – the V sign and waving. First, you address Virtualens with a V sign, and then you greet it by waving your hand. It feels quite logical and intuitive, actually. To exit settings mode, you show a V sign only.
Next, we implemented an emergency camera shutdown. Instead of searching for a Skype button, you could simply cover the camera with your hand. You can turn video back on with the same gesture.
We retook the video, this time on the street. We took care of a lot of bugs which we had made ourselves by coding at this high speed. We shot short video tutorials about how to control our app with gestures. Nadya caught the flu. The day before deadline she said “I will finish the project. The cough won’t let me sleep anyway.” And so she did. That night she solved two serious issues and the next day she was taken to hospital. (Don’t worry. Now she’s much better.)
My sister helped me with making an installer. She did most of the job and it was left only to correct some minor things.
Two hours before the deadline we were all set...almost. We only needed Alex to complete the final cut of the video with voice work in English. One hour left – I sit waiting. Thirty minutes left – I call Alex to ask where the hell the video is. No answer. I’m becoming anxious and start frantically putting English subtitles over our YouTube video in Russian. Fifteen minutes left – I call him again. No answer... then he picks up and I understand by his voice that he is sleeping! It turned out that he had just dozed off while the video was rendering. Yeah, everyone has a breaking point. But we managed to submit the whole project work in time – four minutes before the deadline.
We made it!
After the deadline
Surely it was impossible to remember everything in this rush. I soon realized that I had forgotten to mention in Readme.txt the point that you should restart Skype after installing Virtualens to make it work. After that, I started having nightmares where judges uninstalled our app and moved to the next project after they couldn’t find Virtualens among the available webcams in Skype. We had to add a notice everywhere we could: we added it to our main vide, our tutorial annotations, and I even emailed the Intel PerC SDK guys. They replied to me that they would add this notice to our Readme.txt. Whew! What a load off. Thanks guys!
We don’t yet know the results of the contest. We only know that it’s been a great experience for the whole team and a good benchmark of what we can do. I couldn’t imagine that I could work so hard for so long. Nor did any of us. But the project work is not over. We found some people and armed them with gesture cameras to test our app. Virtualens is not yet perfect, but that’s what we intend to make it.
Our main video:
We also made a video on how to install Virtualens:
A video on how to use this app with Skype (this is a more interesting video that shows me adjusting settings with my hands):
Here’s a video about how to use Virtualens if your client has no Webcam Options button:
Here you can look at all other contest participants. If you add “this month” filter you will get only finalists’ videos. Another contest participant searched YouTube and collected all projects on one page software.intel.com/en-us/forums/topic/474069
Here are projects that I liked most:
This one is pure magic – literally. These guys use PerC SDK to create spells for their game, so you can feel like a real wizard. Instead of upgrading some abstract numbers you train in agility and improve the smoothness of your real movements:
That’s it! Thanks for your attention and we look forward to receiving the results of the Challenge. Good luck to all who entered!