Live From The USA
I write my penultimate blog of the Ultimate Coder Challenge II from the comfort and isolation of my GDC hotel room, where I spent the Saturday coding away on my Perceptucam app.
To recap, my app attempts to create a virtual teleconferencing call between two users, tapping into the Gesture Camera to re-create the user as a 3D avatar in a virtual boardroom. This experience is augmented with a touch based sketch pad, which the called user can view and contribute towards.http://youtu.be/Si5XrctxoHU
I can report that the GDC event was amazing, and it was really great to meet the other challenge contestants in person. A fantastically talented bunch of guys and gals you could ever hope to meet and I look forward to meeting again at the next developer hang-out.
The App So Far
I have just finished my coding for the night, and I have some observations and progress to report for the current state of the app and where it will be going in the next seven days. At GDC I was able to demonstrate the app to visitors at the Intel booth and discovered a few truths that will change my final deliverable.
VOIP and Network Data Syncing
After much trial and error, it seems the reason the sound lagged behind the visuals was that the sound buffer takes time to fill it's buffers (at both ends), before actually playing the audio. It actually stores up a section of what you say before sending and playing back for the person you are communicating with. This differs from the visual stuff which plays back instantly.
The solution is to buffer the visual information and then play it back in perfect sync with the sound when it finally arrives. This means the app will consume a sizable chunk of memory to store depth and color information and it also means that time stamping will be needed to sync the two types of media. Only then will the app deliver a predictable conferencing experience.
Voice Recognition In A Crowded Room
One of the most striking failures of the technology at GDC was the voice recognition system, which struggled to detect the words spoken among the noise of a GDC hall. It may be true that most calls will be held in a quiet office but equally it could be made in an airport or call center department.http://youtu.be/gtnuqVzGfHQ
As you can hear, the background noise at GDC was quite meaty. The lesson here is that the Perceptual SDK needs to include effective noise cancellation technology, and that the voice system should have a non-voice system to control the app as well. Fortunately my app provides touch buttons as the primary input method, and voice control as a secondary feature, so I dodged a bullet there!
The number of times I swiped across the Ultrabook to clear the screen of the current drawing numbered in the thousands. Alas the percentage of successful detection's was about 75%. Enough to demonstrate technology, but not enough for an end user who will accept nothing less than 100%. The only perceptual input that performed at 100% was my head tracker which passively got on with the task of controlling the view of the virtual conference room.
In discussions with other developers, it was clear that gestures will be under intense scrutiny from end users when incorporated into apps. Unless it provides something a button or touch cannot do, or provides an improved facility, it will be consigned to the novelty bin. It was also mutually agreed that gestures should be supported by a visual indicator to maintain a constant understanding between the computer and the user. Fail to provide this and your user could be waving their hands about to no avail with growing frustration.
With my remaining days, aside from the essential polishing work, I hope to experiment more with fool proof gestures, perhaps using camera-to-user calibration and visual feedback as a way to achieve the holy grail of 100% predictability.
My final 'missing piece' of the app is the 'Contact List' screen which will allow new contacts to be created and used to make connections between registered users.
At the moment the app only communicates between known local IP addresses which is good enough to test the technology at good network speeds but not for a call to another part of the world.
GDC Developer Tips
TIP 1 : Apparently, you are not restricted to 30fps when obtaining the color and depth stream data from the camera. You can use the SetProfile command to change the maximum fps allowed during the streaming activity. For those who want to use the depth data for high speed fluid input, this should definitely be checked out!
TIP 2 : Be aware that you might be getting choppy audio capture from the Gesture camera when you have both color and depth streams running at high resolution top speed. Running any sound capture on the cameras recording device while it streams the visual information produces an intermittent choppy output. Might be a driver issue, or a USB bandwidth issue. Worth knowing, especially if you are using this data for voice recognition.
I think the value of attending GDC with my app was invaluable. It allowed me to conduct a field test of what might become a commercial app, and quickly highlighted the areas that need work. It is work worth doing too, the general feeling from everyone in the Perceptual Computing space was that the potential was huge. All we need is a few pioneers to develop methods of interaction that transcends keyboard, mouse and touch. It will be these developers that ultimately claim the Perceptual Computing prize and produce apps that literally blow the mind.