My son is two and half years old and he likes all sorts of stories and animations. So when it came to writing a proposal for Intel Perceptual Computing Challenge, I thought why not do something that will be liked by my son!
I have been an admirer of different art forms (though I have little of the artist in me). I use shadow puppets a lot to tell stories to Rupansh, my kid. There are two things to it, firstly it is the cheapest form of puppet and I can do several things, make stories and create characters without really worrying about perfection. Secondly loadshedding [rolling blackouts -ed.] has been a major problem at our place back in India. So I Iike to be creative when there is no electricity. I put on an emergency lamp and make out characters in front of lamp to view the shadow in the wall. But what really got me mad into the world of ShadowArt is this video by one of the most renowned shadow artist (There are less than twenty professional shadow artists living across the world) Mr. Amar Sen called 'let Kolkata Surprises You".
But as I am no good with such art forms, I thought, "Would it be wise to try out any such things?" Then I said "OK."
Initially my Idea was - I will get black hand cutouts against white background, have audio recording. So anybody can create shadow puppet and record them.
This was really what I wanted to achieve.
I wanted to keep it simple because I was working on one of my dream projects called "PONTA". It was supposed to be a full fledged Humanoid with an Ultrabook and a PerC camera. I thought that this will be the only product I will be working and had got most of the modalities working before the App finalists were declared.
However, to my utter surprise, that wasn't selected. I was very disappointed indeed. But SASM was on list of six apps that had got selected.
Start of Coding
I started the coding on July 12.
Mine was supposed to be a one man team with helping hand from my wife who incidentally is an embedded engineer who specializes in Arduino. So my first task was to get Hand Shadow in. Initially I thought that would be easy and would not take any time. But reality bites sooner than later. There were all sorts of pixels in the frame, very low resolution, when I scaled the resolution higher, the shadows were far bitter. I was convinced that I am not going to go anywhere with this app. So I took the courage to incorporate EmguCV. Initially, I was developing with Windows Forms in C#. So GDI+ was my main assistance in some of the image processing application. But I soon realized, if I have to bring even a slightest of smile to my Kid's face, I need to dig very deep in getting good shadows. So I started experimenting with several techniques: Median Filter, Gaussian Smoothing, Erosion, Dilation and what not.
Finally I got the result that I wanted. I first used erosion to remove noisy 1-3pixel parts. Then I used Median filter to smooth it up( not dilation, as that would change the edge profile. Then I combined Edge dtection and Gaussian Smoothing. Some grilling coding sessions that would stretch for more than 20 hours finally got me the shadows. If I can recollect correctly, by 20th July, shadows were really appearing nice.
Now I wanted to control the amount of shadow I want to display. Like different profile when distance of hand is varied from camera. Pure depth image would not give me that. So I analyzed depth image and realized pixels nearer to the camera were closer to 255 and as distance increases, they start approaching zero.
So I built a class that could simulate range and thus I could now simulate distance from camera without moving the hand. This range controller was so fun that it gave me a belief that this app needs much more than what I was doing. I knew it would be big as I was probably coding with my best efficiency ever.
Paradigm Shift in UI
Once initial hand shadow and range controller was in place, I had to take a call on the UI. I spent nearly three days to determine what the app would be like and decided that I will go with a layout similar to most of the software like Visual studio, Netbeans, Follows along with similar color chrome. As it was Shadow Art, I wanted the GUI to be Black and White in combination. Windows forms never gives me strength to build complicated logic. So I decided to loop back to my favourite WPF. And I zeroed in Metro style framework called Mahapps.
But the problem with WPF is, it uses an entirely different imaging format System.Windows.Media.Imaging, and PerC at best gave us System.Drawing.Bitmap. So I needed to write the routines for conversion from WPF Image to Bitmap and back to Wpf. Once I ported the concept in this whole new UI, I was getting more ideas and by now was certain that I would not be able to work with any other apps by the submission deadline (then Aug 24).
New Idea: Color Characters
Once I recorded my first movie, I thought it would be rather nice if I can paint my hands and bring color shadows into action. Having said that, I had to work with projection. In the first phase of the contest, I worked with an App called GesModello. But it was patchy and not an inspiring app to me. This time I was looking for getting the whole color image cut by my improved depth image (I will refer to it as shadow image from hence on).
It was still R3 of the SDK and there were no segmentation_viewer.cs sample. So I started working on c++ uvmap example and tried converting it to c#. It took more than two days to get the projection image. So now in my UI, I could have both color image as well as shadow image. But projection was not up to the mark and result was very disappointing. So I again wrote a different filter routine to finally get it working. But with so much filtering and image conversion in place, my frame rate was down to good 3fps. That was very frustrating, it was neither fun nor intuitive. But it got me the idea of what was going on.
It was then that I realized I could make a nice fusion of classic shadow puppets with real puppets. Therefore, it was quite clear that SASM will be a fusion app allowing both Shadow puppet and real puppet's projection to go hand to hand. Time was closing down for early profile form submission. So I thought it is a good thing to get the concepts and to make a simple movie as a demo for profile form. But when I sat to record the movie, I thought, "OK, well I need more characters. I need many characters to communicate in a movie. But I can use only two hands." It was then the idea of the best part of the App came to mind. Why not make an animation object and actually control it by hand? So a user would just make puppet and animate puppet for some time. These frames would be saved in a library. Characters can be brought into scene at the run time to make good animations.
Building Animation Library and Code Optimization
I could easily write a class called AnimationObject and achieve this part. For library, I used XML. Interestingly I also recorded my first video blog on the work I had done so far for complementing early profile form submission.
The video was detailed and I tried to create sort of a tutorial that would remind me of the initial work I had put up on the App. I also gathered the courage to share the video with some of prolific developers who had been my mentors over the years, knowing fully well that some of these guys may also be a finalist! That decision was very critical and changed the whole perspective of the App and my thinking. One of the renowned developers said, "Wow! You have a great product lining up. If you can use your app to produce the Calcutta movie, then you have a great chance!".
Those words, chance and wow were the catalysts. I thought it is perfect time to take the app to next level. Unfortunately, I had never worked with animation and augmented reality before. So I started reading about making of animation movies, culture, frames, background, still animation and everything that I could read. Those three days of reading created the fundamental and now I could build a model on which I could work.
I started writing animation routines like path animation, inverting the characters, removing them from scene. But with every added feature, the frame rate was coming down. I decide it would make no sense at all if I couldn't get a realistic frame rate. So it was time for optimization. I used a Stopwatch object to trace most resource consuming routines and started optimizing them. Things like AcquireAccess and ReleaseAccess must be closed. I should not call any PerC buffer more than required. So getting width and height from the plane from within loop was removed. I removed many functions and actually copied the whole function code in the loop. I changed MVVM pattern to more c style codes with Boolean semaphores. For loops replaced foreach loops, many objects were replaced by pointers, a++ were replaced by ++a, I removed parameters from function call, instead set the parameters in global variables. Garbage collection was worked upon so that most of the null variables can be collected in a single call. With these optimizations, I could shoot up fps to around 20. Spending hours looking at stopwatch logs and windows task manager had paid dividends.
Input and Control
By now, what i was doing was to inject the hand positions in the mouse code. Hand open and closed was linked to mouse click and release. But I must say, it was not at all intuitive. Seeing the cursor moving with characters was a pathetic experience. One day I thought, what If I can convert the projected image into thumbnail and move that thumbnail along with hand position only over working pane? A user would find it interesting to see his real hand cut out. That to me was another milestone in the app. I could easily make that working in about an hour of coding and needless to say it was amazing. Most of the developers would rely on controlling a 3D hand model. But as PerC can't really recognize every finger and part of hand accurately, that would at this moment be less accurate. Imagine your own hand in place of mouse, doing same thing that you are doing! I am convinced that I am going to use this theory in many PerC apps that I would be releasing.
But one thing was frustrating, in many of the times, my head part will also be shown as shadow and that did irritate me. So I incorporated face tracking and cut the head part through face coordinate. Also by now my shoulders had taken the toll and was in pain. I realized keeping hand in air for long time is not going to help me. I also need to provide support for mouse. So it was physical pain more than anything else which lead me to incorporate all the functionality with mouse also. It was more work than I thought. Mouse coordinates were different. I wanted simultaneous control. I thought it would be good to actually control one character with hand and another with mouse. So It became two different pipelines all together.
Now something bad happened. My Sony VAIO ( Win 7 Home Premium, 8GB) my main development machine was crushed. But thankfully I had everything backed up. I have made a habit to backup every three hours code to external storage when I develop these large apps. Once that had happened, next thing was to start developing in my Ultrabook that Intel had provided during AIC 2012. But problem with this machine was it's mouse left click was not working. So I thought, ok let me get the touch to work also. So touch came into action. Another class was written for touch. Now SASM characters could be controlled with hand, touch and mouse. Though this came accidentally, the outcome was wonderful. It gave more meaning to the app. It felt that nothing is enforced on user. He can use any modality he prefers to get the job done.
It was already Aug 10. I was running out of time. I thought my app is incomplete if I do not give a voice support. But honestly I could not get voice recognition working properly. So I incorporated MS-Speech SDK. But it needed external training. I wanted to avoid that. So I just kept trying on adding voice. Finally I got the result of voice integration. SASM would be a path breaking app in voice enabled systems. Mainly because a unique feature. I wrote the voice library such that what user sees as a label will be voice commands. I also find out that compund words are better recognized than single words. So I built this huge voice command set which was associated with every single operation in the app and there were about 200 commands. Yet user would not require to remember any commands.
But When voice was tested great, touch just refused to respond. I realized Voice Module is polling continuously which is draining resources. Also voice misdetects many times, hence GUI is frequently interrupted. So I devised a strategy that if something was recognized falsely, it should wait for at least 5 seconds before starting to poll again. Same thing after a successful recognition as after success some action would be performed which would take some time. Also I introduced suitable delays.
Speech and Tutorial
By now the App had been turned to a full fledged software and the complexity had increased beyond imagination. I thought It would be nice to see how a user reacts to it. I invited one of my colleagues to test the app. I told him nothing, no instructions and asked him to work. I then checked his pattern. It took him two days to decode the functionality and he would still struggle. So I wrote an in App tutorial module based on my observations. It would guide the user as what he needs to do to work with the App. Then it flashed that wow, if I can bring speech into play along with these text tutorial then user can learn faster. So Sppech was incorporated. Every tip that would be displayed would also be spoken out.
But introduction of speech lead to another issue. Now voice module was taking up spoken commands and performing unexpectedly. But by now deadline was extended to Sept 23. However early submission still remained Aug 20. It was my main target. So I started getting anxious and grown complexity of the code made it impossible to solve the issues faster. I wrote several semaphores to prevent voice from taking up word when speech was active. This was done by setting a global static variable before and after voice render.
Finally, these were working together. But my movie was a silent movie. There were no audio recording. I thought it is OK. I mean, the user can always put a voice-over. But voice-over would need windows live movie maker or something similar. I wanted the user to get the whole production done from the App. So I again started working on audio recording.
The problem with PerCs audio recording is that it consumes about 100Mb memory for 2 minutes recoding. My movies would be lengthy. So I could not buy the idea. I was now looking for Win32 based native recording. Then in one session I thought, why not create an audio file when it crosses 20 Mb and finally merge all these audios? That worked out. The app started saving the audio files. Finally, I used the Rockstar ffmpeg to combine Audio as well as video frames to produce the movie.
But I still had two days. So I decided to write my own inApp media center. Any recorded video will be stored over there. User can organize all his videos, play with them from media center. So I wrote another XML based library for managing media.
By the end I was jsut too exhausted and completely lost my interest in early submission. I intentionally left the town for two days. God! that was a good decision. When I started testing the app fresh, I found new glitches and problems. It took nearly another week to make the app a real beast. It would run for hours and days without loosing track. That was quite fascinating.
By the time I was getting ready to submit, Intel had released R4. In first phase, I had put a lot of blood in an App called Gestop. It was really a lot of work. Once I had submitted, R2 was released and Gestop did not work at all. That experience was still hot in my mind and I thought it should not happen again. Let me install the new SDK and test the App.
All was fine except voice did not work at all. I was literally crying my heart out. It did not have to happen to me. I again started looking on the issues and in 7 days got everything working. Thankfully app is also working great with R5 without any glitches.
I had realized that creating hand shadows is not easy and users will not find it easy. So we created several puppets, animations and the whole installer had become 600Mb! With pathetic internet speed and unpredictable power failures I doubted that I would be able to submit. But game agency helped me more than their limit. After two days of effort I could get the installer. They uploaded it on my behalf. I downloaded the installer from amazon services, installed in a clean system. Checked the App. And finally pressed the submit button.
This is the detailed video of the App
From the word go, my main concern was to satisfy my son and to be able to make good animation movie for him rather than building a demo for judges. I am satisfied that he who can't speak properly can close his fist and move characters and he just loves those characters. With such intriguing work, you tend to expect good results, but then I have my reward. It was a great experience and I am sure I will never be able to produce anything as complicated and huge as SASM.