We definitely lost one of our nine lives this week with integrating face tracking into our game, but we still have our cat’s eyes, and are still feel very confident that we will be able to show a stellar game at GDC. On the face tracking end of things we had some big wins. We are finally happy with the speed of the algorithms, and the way things are being tracked will work perfectly for putting into Kiwi Catapult Revenge. We completed some complex math to create very realistic perspective shiting in Unity. Read below on those details, as well as for some C# code to get it working yourself. As we just mentioned, getting a DLL that properly calls update() from Unity and passes in the tracking values isn’t quite there yet. We did get some initial integration with head tracking coming into Unity, but full integration with our game is going to have to wait for this week. On the C++ side of things, we have successfully found the 3D position of the a face in the tracking space. This is huge! By tracking space, we mean the actual (x,y,z) position of the face from the camera in meters. Why do we want the 3D position of the face in tracking space? The reason is so that we can determine the perspective projection of the 3D scene (in game) from the player’s location. Two things made this task interesting: 1) The aligned depth data for a given (x,y) from the RGB image is full of holes and 2) the camera specs only include the diagonal field of view (FOV) and no sensor dimensions.

We got around the holes in the aligned depth data by first checking for a usable value at the exact (x, y) location, and if the depth value was not valid (0 or the upper positive limit), we would walk through the pixels in a rectangle of increasing size until we encountered a usable value. It’s not that difficult to implement, but annoying when you have the weight of other tasks on your back. Another way to put it: It’s a Long Way to the Top on this project.

The z-depth of the face comes back in millimeters right from the depth data, the next trick was to convert the (x, y) position from pixels on the RGB frame to meters in the tracking space. There is a great illustration here of how to break the view pyramid up to derive formulas for x and y in the tracking space. The end result is:

TrackingSpaceX = TrackingSpaceZ * tan(horizontalFOV / 2) * 2 * (RGBSpaceX - RGBWidth / 2) / RGBWidth)

TrackingSpaceY = TrackingSpaceZ * tan(verticalFOV / 2) * 2 * (RGBSpaceY - RGBHeight / 2) / RGBHeight)

Where TrackingSpaceZ is the lookup from the depth data, horizontalFOV, and verticalFOV are are derived from the diagonal FOV in the Creative Gesture Camera Specs (here). Now we have the face position in tracking space! We verified the results using a nice metric tape measure (also difficult to find at the local hardware store - get with the metric program, USA!)

From here, we can determine the perspective projection so the player will feel like they are looking through a window into our game. Our first pass at this effect involved just changing the rotation and position of the 3D camera in our Unity scene, but it just didn’t look realistic. We were leaving out adjustment of the projection matrix to compensate for the off-center view of the display. For example: consider two equally-sized (in screen pixels) objects at either side of the screen. When the viewer is positioned nearer to one side of the screen, the object at the closer edge appears larger to the viewer than the one at the far edge, and the display outline becomes trapezoidal. To compensate, the projection should be transformed with a shear to maintain the apparent size of the two objects; just like looking out a window! To change up our methods and achieve this effect, we went straight to the ultimate paper on the subject: Robert Koomla’s Generalized Perspective Projection. Our port of his algorithm into C#/Unity is below.

using UnityEngine; using System.Collections; public class MouseFollow : MonoBehaviour { void Start () { } void LateUpdate () { float n = Camera.main.nearClipPlane; //float n = 0.01f; float f = Camera.main.farClipPlane; //float f = 1000f; //Resolution curRes = Screen.currentResolution; // all below in world space // screen's bottom left corner Vector3 pa = new Vector3(-0.5f, -0.5f, n); // screen's bottom right corner Vector3 pb = new Vector3(0.5f, -0.5f, n); // screen's top left corner Vector3 pc = new Vector3(-0.5f, 0.5f, n); // head position (use mouse cursor for now) // TODO temp translate it to a percentage of a 1x1 screen where 0,0 is the center Vector3 pe = Camera.main.ScreenToWorldPoint(new Vector3(Input.mousePosition.x, Input.mousePosition.y, n)); //Debug.Log("pe: " + pe); pe.z = 0.75f; Camera.main.projectionMatrix = generalizedPerspectiveProjection(pa, pb, pc, pe, n, f); } Matrix4x4 generalizedPerspectiveProjection(Vector3 pa, Vector3 pb, Vector3 pc, Vector3 pe, float n, float f) { // Compute an orthonormal basis for the screen. Vector3 vr = pb - pa; vr.Normalize(); Vector3 vu = pc - pa; vu.Normalize(); Vector3 vn = Vector3.Cross(vr, vu); vn.Normalize(); // Compute the screen corner vectors. Vector3 va = pa - pe; Vector3 vb = pb - pe; Vector3 vc = pc - pe; // Find the distance from the eye to screen plane. float d = -Vector3.Dot(va, vn); // Find the extent of the perpendicular projection. float m = n / d; float l = Vector3.Dot(vr, va) * m; float r = Vector3.Dot(vr, vb) * m; float b = Vector3.Dot(vu, va) * m; float t = Vector3.Dot(vu, vc) * m; // Load the perpendicular projection. // glFrustum(l, r, b, t, n, f); Matrix4x4 mat = Matrix4x4.identity; mat[0] = 2.0f * n / (r - l); mat[1] = 0f; mat[2] = 0f; mat[3] = 0f; mat[4] = 0f; mat[5] = 2.0f * n / ( t - b); mat[6] = 0f; mat[7] = 0f; mat[8] = (r + l) / (r - l); mat[9] = (t + b) / (t - b); mat[10] = (f + n) / (n - f); mat[11] = -1f; mat[12] = 0f; mat[13] = 0f; mat[14] = 2.0f * f * n / (n - f); mat[15] = 0f; // Rotate the projection to be non-perpendicular. Matrix4x4 M = Matrix4x4.identity; M[0] = vr[0]; M[4] = vr[1]; M[ 8] = vr[2]; M[1] = vu[0]; M[5] = vu[1]; M[ 9] = vu[2]; M[2] = vn[0]; M[6] = vn[1]; M[10] = vn[2]; mat *= M; // Move the apex of the frustum to the origin. M = Matrix4x4.identity; M[0] = 1f; M[4] = 0f; M[ 8] = 0f; M[12] = -pe[0]; M[1] = 0f; M[5] = 1f; M[ 9] = 0f; M[13] = -pe[1]; M[2] = 0f; M[6] = 0f; M[10] = 1f; M[14] = -pe[2]; M[3] = 0f; M[7] = 0f; M[11] = 0f; M[15] = 1f; mat *= M; return mat; } }

The code follows the mouse pointer to change perspective (not a tracked face) and does not change depth (the way a face would). We are currently in the midst of wrapping our C++ libs into a DLL for Unity to consume and enable us to grab the 3D position of the face and then compute the camera projection matrix using the face position and the position of the computer screen in relation to the camera.

Last but not least we leave you with this week’s demo of the game. Some final art for UI elements are in, levels of increasing difficulty have been implemented and some initial sound effects are in the game.** **As always, please ask if you have any questions on what we are doing, or if you just have something to say we would love to hear from you. Leave us a comment! In the meantime we will be coding All Night Long!