With the realisation that I couldn’t top my week three video, I decided the smart thing was to get my head down and code the necessaries to turn my prototype into a functioning app. This meant adding a front end to the app and gets the guts of the conferencing functionality coded.
I also vowed not to bore the judges and fellow combatants to tears this week, and will stick mainly to videos and pictures.
This main video covers the new additions to the app, and attempts to demonstrate conferencing in action. With only one head and two cameras, the biggest challenge was where to look.
I also made a smaller video from a mobile camera so you can see both PC’s in the same shot, and you will also see a glimpse of what the Perceptual Camera data looks like once it’s been chewed up by network packets.
Top priority in week five will be to reduce the network packet size and improve the rendered visuals so we don’t see the disconcerting transition effects in the current version.
How My Perceptual 3D Avatar Works
One of the Ultimate Coder Judges asked how the 3D avatar was constructed in the week three demo, and it occurs to me that this information may be of use to other coders so here is a short summary of the technique.
The Gesture Camera provides a 16-bit plane of depth data streaming in at 30 frames per second, which produces very accurate measurements of distance from the camera to a point on the subject in front of the camera. This depth data also provides a reference offset to allow you to lookup the colour at that point too.
Once the camera is set-up and actively sending this information, I create a grid of polygons, 320 by 240 evenly spaced on the X and Y axis. The Z axis of the vertex at each corner is controlled by the depth data, so a point furthest from the camera would have the greatest Z value. Looking at this 3D construct front on you would see the polygons with higher Z values nearer to the render viewpoint. I then take the camera colour detail for that point and modify the ‘Diffuse’ element of the respective vertex to match it. The 3D model is not textured. The vertex coordinates are so densely packed together that they produce a highly detailed representation of the original colour stream.
Believe it or not, the screen shot above is rendering the main 3D object in wireframe mode. It gives you some idea how much data needs to be reduced before it can be effectively communicated over a network.
This process is repeated 30 times per second in sync with the rate at which the video stream outputs each frame providing a high fidelity render. Points that are too far in the distance have the alpha component of the diffuse set to zero making them invisible to the user. This removes the backdrop from the rendered mesh creating an effective contour.
The advantage in converting camera stream data into vertex data is that you have direct access to a 3D representation of any object in front of the camera, and the possibility exists to apply reduction and optimisation algorithms from the 3D world that could never have been used on a 2D stream.
Voice Over In Pain
Here Is a summary of my attempt to get voice networking into my app. I tried to compile Linphone SDK on Visual Studio, no joy, an old VS2008 project I found on Google Code, no joy, LibJingle to see if that would help, no joy, checked out and attempted to compile MyBoghe, many dependency errors (no joy). After what was about 6 hours of fruitless toil, I found myself looking closer to home. Turns out Dark Basic pro released a module many moons back called DarkNET which provides full TCP/UDT networking commands, and yes you guessed it, built-in VOIP commands! A world of pain has been reduced to about six commands that are fully compatible with the language I am using. Once I discovered this, my conferencing app came on in leaps and bounds.
As promised I have kept my blog shorter this week. I hope you liked the app and video, and please let me know if you would like blog five to include lots of source code. Next week is my last week for finishing app functionality, so we shall see VOIP (so you can hear and speak in the conference call) and optimisations and compatibility testing so you can run the app in a variety of scenarios. Given the time constraints, I am aiming to limit the first version of the app to two users in order to cram as much fidelity into the visuals as possible. This will also give me time to refine and augment the Perceptual Computing elements of the app, and show off more of what the Gesture Camera can do.
P.S. Hope Sascha is okay after sitting on all those wires. Ouch!