Infrared5 Ultimate Coder Update 4: Flamethrowers, Wingsuits, Rearview Mirrors and Face Tracking!

Week three seemed to go much better for the Infrared5 team. We are back on our feet with head tracking, and despite the judges lack of confidence in our ability to track eyes, we still believe that we've got a decent chance of pulling it off. Yes, it’s true as Nicole as said in her post this week, that the Intel Perceptual Computing (IPC) SDK isn’t yet up to the task. She had an interview with the Perceptual computing team and they told her “that eye tracking was going to be implemented later”. What’s funny about the lack of eye tracking and even decent gaze tracking in the IPC SDK is that the contest is showing this:

Yes we know it’s just marketing, but it is a pretty misleading image. They have a 3D mesh over a guy’s face giving the impression that the SDK can do AAM and POSIT. That would be so cool!  Look out FaceAPI! Unfortunately it totally doesn't do that. At least not yet.

This isn’t to say that Intel is taking a bad approach with the IPC SDK beta either. They are trying out a lot of things at once and not getting lost in the specifics of just a few features. This allows developers to tell them what they want to do with it without spending tremendous effort on features that wouldn't even be used.

The lack of decent head, gaze and eye tracking is what’s inspired us on to eventually release our tracking code as open source. Our hope is that future developers can leverage our work on these features and not have to go through the pain we did in this contest. Maybe Intel will just merge our code into the IPC SDK and we can continue to make the product better together.

Another reason we are sticking with our plan on gaze and eye tracking is that we feel strongly, as do the judges, that these features are some of the most exciting aspects of the perceptual computing camera. A convertible ultrabook has people’s hands busy with typing, touch gestures, etc. and having an interface that works using your face is such a natural fit for this kind of setup.

Latest Demo of Kiwi Catapult Revenge

Check out the latest developments with the Unity Web Player version. We’ve added a new fireball/flamethrower style effect, updated skybox, sheep and more. Note that this is still far from final art and behavior for the game, but we want to continue showing the process we are going through by providing these snapshots of the game in progress. This build requires the free Brass Monkey app for iOS or Android.

A Polished Experience

In addition to being thoroughly entertained by the judges’ video blooper this week, one thing we heard consistently from them is that they were expecting more polished apps from the non-individual teams. We couldn’t agree more! One advantage that we have in the contest is that we have a fantastic art and game design team. That’s not to say our tech skills are lacking either. We are at our core a very technically focused company, but we tend not to compartmentalize the design process and the technology implementation in projects we take on. Design and technology have to work together in harmony to create an amazing user experience, and that’s exactly what we’re doing in this challenge.

Game design is a funny, flexible and agile process. What you set out to do in the beginning rarely ends up being what you make in the end. Our initial idea started as a sort of Mad Max road warrior style driving and shooting game (thus Sascha thinking ours was a racing game early on), but after having read some bizarre news articles on eradicating cats in New Zealand we decided the story of Cats vs. Kiwis should be the theme. Plus Rebecca and Aaron really wanted to try out this 2D paper, pop-up book style, and the Kiwi story really lends itself to that look.

Moving to this new theme kept most of the core game mechanics as the driving game. Tracking with the head and eyes to shoot and using the phone as a virtual steering wheel are exactly the same in the road warrior idea. Since our main character Karl Kiwi has magical powers and can fly, we made it so he would be off the ground (unlike a car that’s fixed to the ground). Another part of the story is that Karl can breathe fire like a dragon, so we thought that’s an excellent way to use the perceptual computing camera by having the player open their mouth to be able to shoot fire. Shooting regular bullets didn’t work with the new character either, so we took some inspiration from funny laser cats memes, SNL and decided that he should be able to shoot lasers from his eyes. Believe it or not, we have been wanting to build a game involving animals and lasers for a while now. “Invasion of the Killer Cuties” was a game we concepted over two years ago where you fly a fighter plane in space against cute rodents that shoot lasers from their eyes (initial concept art shown below).

Since Chris wrote up the initial game design document (GDD) for Kiwi Catapult Revenge there have been plenty of other changes we’ve made throughout the contest. One example: our initial pass at fire breathing (a spherical projectile) wasn’t really getting the effect we wanted. In the GDD it was described as a fireball so this was a natural choice. What we found though is that it was hard to hit the cats, and the ball didn’t look that good either. We explored how dragon fire breathing is depicted in movies, and the effect is much more like how a flamethrower works. The new fire breathing effect that John implemented this week is awesome! And we believe it adds to the overall polish of our entry for the contest.

(image credit MT Falldog)

Another aspect of the game that wasn’t really working so far was that the main character was never shown. We chose a first person point of view so that the effect of moving your head and peering around items would feel incredibly immersive, giving the feeling that you are really right in this 3D world. However, this meant that you would never see Karl, our protagonist.

Enter the rear view mirror effect. We took a bit of inspiration from the super cool puppets that Sixense showed last week, and this video of an insane wingsuit base jump and came up with a way to show off our main character. Karl Kiwi will be fitted with a rear view mirror so that he can see what’s behind him, and you as the player can the character move the same as you. When you tilt your head, Karl will tilt his, when you look right, so will Karl, and when you open your mouth Karl’s beak will open. This will all happen in real time, and the effect will really show the power of the perceptual computing platform that Intel has provided.

Head Tracking Progress Plus Code and Videos

It wouldn’t be a proper Ultimate Coder post without some video and some code, so we have provided you some snippets for your perusal. Steff did a great job of documenting his progress this week, and we want to show you step by step where we are heading by sharing a bit of code and some video for each of these face detection examples. Steff is working from this plan, and knocking off each of the individual algorithms step by step. Note that this week’s example requires the OpenCV library and a C compiler for Windows.

This last week of Steff's programming was all about two things: 1) switching from working entirely in Unity (with C#) to a C++ workflow in Visual Studio, and 2) refining our face tracking algorithm.  As noted in last week’s post, we hit a roadblock trying to write everything in C# in Unity with DLL for the Intel SDK and OpenCV.  There were just limits to the port of OpenCV that we needed to shed.  So, we spent some quality time setting up in VS 2012 Express and enjoying the sharp sting of pointers, references, and those type of lovely things that we have avoided by working in C#.  However there is good news, we did get back the amount of lower level control needed to refine face detection!

Our main refinement this week was to break through the limitations of tracking faces that we encountered when implementing the Viola-Jones detection method using Haar Cascades. This is a great way to find a face, but it’s not the best for tracking a face from frame to frame.  It has limitations in orientation; e.g. if the face is tilted to one side the Haar Cascade no longer detects a face.  Another drawback is that while looking for a face, the algorithm is churning through images per every set block of pixels.  It can really slow things down. To break through this limitation, we took inspiration from the implementation by the team at They have done a nice job putting face tracking together using python, OpenCV, and an RGB camera + Kinect. Following their example, we have implemented feature detection with GoodFeaturesToTrack and then tracked each feature from frame to frame using Optical Flow. The video below shows the difference between the two methods and also includes a first pass at creating a blue screen from the depth data.  

This week, we will be adding depth data into this tracking algorithm.  With depth, we will be able to refine our Region Of Interest to include an good estimate of face size and we will also be able to knock out the background to speed up Face Detection with the Haar Cascades. Another critical step is integrating our face detection algorithms into the Unity game. We look forward to seeing how all this goes and filling you in with next week’s post!

We are also really excited about all the other teams’ progress so far, and in particular we want to congratulate Lee on making a super cool video last week!  We had some plans to do a more intricate video based on Lee’s, but a huge snowstorm in Boston put a bit of a wrench in those plans. Stay tuned for next week’s post though, as we’ve got some exciting (and hopefully funny) stuff to show you!

For you code junkies out there, here is a code snippet showing how we implemented GoodFeaturesToTrack and Lucas-Kanada Optical Flow:

#include "stdafx.h"

#include "cv.h"
#include "highgui.h"

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#include <math.h>
#include <float.h>
#include <limits.h>
#include <time.h>
#include <ctype.h>
#include <vector> 

#include "CaptureFrame.h"
#include "FaceDetection.h"

using namespace cv;
using namespace std;

static void help()
    // print a welcome message, and the OpenCV version
    cout << "nThis is a demo of Robust face tracking use Lucas-Kanade Optical Flow,n"
            "Using OpenCV version %s" << CV_VERSION << "n"
            << endl;

    cout << "nHot keys: n"
            "tESC - quit the programn"
            "tr - restart face trackingn" << endl;

// function declaration for drawing the region of interest around the face
void drawFaceROIFromRect(IplImage *src, CvRect *rect);

// function declaration for finding good features to track in a region
int findFeatures(IplImage *src, CvPoint2D32f *features, CvBox2D roi);

// function declaration for finding a trackbox around an array of points
CvBox2D findTrackBox(CvPoint2D32f *features, int numPoints);

// function declaration for finding the distance a point is from a given cluster of points
int findDistanceToCluster(CvPoint2D32f point, CvPoint2D32f *cluster, int numClusterPoints);

// Storage for the previous gray image
IplImage *prevGray = 0;
// Storage for the previous pyramid image
IplImage *prevPyramid = 0;
// for working with the current frame in grayscale
IplImage *gray = 0;
// for working with the current frame in grayscale2 (for L-K OF)
IplImage *pyramid = 0;

// max features to track in the face region
int const MAX_FEATURES_TO_TRACK = 300;
// max features to add when we search on top of an existing pool of tracked points
int const MAX_FEATURES_TO_ADD = 300;
// min features that we can track in a face region before we fail back to face detection
int const MIN_FEATURES_TO_RESET = 6;
// the threshold for the x,y mean squared error indicating that we need to scrap our current track and start over
float const MSE_XY_MAX = 10000;
// threshold for the standard error on x,y points we're tracking
float const STANDARD_ERROR_XY_MAX = 3;
// threshold for the standard error on x,y points we're tracking
double const EXPAND_ROI_INIT = 1.02;
// max distance from a cluster a new tracking can be
int const ADD_FEATURE_MAX_DIST = 20;

int main(int argc, char **argv)
 // Init some vars and const
 // name the window
 const char *windowName = "Robust Face Detection v0.1a";
 // box for defining the region where a face was detected
 CvRect *faceDetectRect = NULL;
 // Object faceDetection of the class "FaceDetection"
 FaceDetection faceDetection;
 // Object captureFrame of the class "CaptureFrame"
 CaptureFrame captureFrame;
 // for working with the current frame
 IplImage *currentFrame;
 // for testing if the stream is finished
 bool finished = false;
 // for storing the features
 CvPoint2D32f features[MAX_FEATURES_TO_TRACK] = {0};
 // for storing the number of current features that we're tracking
 int numFeatures = 0;
 // box for defining the region where a features are being tracked
 CvBox2D featureTrackBox;
 // multiplier for expanding the trackBox
 float expandROIMult = 1.02;
 // threshold number for adding more features to the region
 int minFeaturesToNewSearch = 50;

 // Start doing stuff ------------------>
 // Create a new window 
 cvNamedWindow(windowName, 1);

 // Capture from the camera

  // initialize the face tracker

 // capture a frame just to get the sizes so the scratch images can be initialized
 finished = captureFrame.CaptureNextFrame();
 if (finished) 
 return 0;
 currentFrame = captureFrame.getFrameCopy();

 // init the images
 prevGray = cvCreateImage(cvGetSize(currentFrame), IPL_DEPTH_8U, 1);
 prevPyramid = cvCreateImage(cvGetSize(currentFrame), IPL_DEPTH_8U, 1);
 gray = cvCreateImage(cvGetSize(currentFrame), IPL_DEPTH_8U, 1);
 pyramid = cvCreateImage(cvGetSize(currentFrame), IPL_DEPTH_8U, 1);

 // iterate through each frame
 // check if the video is finished (kind of silly since we're only working on live streams)
 finished = captureFrame.CaptureNextFrame();
 if (finished) 
 return 0;
 // save a reference to the current frame
 currentFrame = captureFrame.getFrameCopy();

 // check if we have a face rect
 if (faceDetectRect)
 // Create a grey version of the current frame
 cvCvtColor(currentFrame, gray, CV_RGB2GRAY);
 // Equalize the histogram to reduce lighting effects
 cvEqualizeHist(gray, gray);

 // check if we have features to track in our faceROI
 if (numFeatures > 0)
 bool died = false;
 //cout << "nnumFeatures: " << numFeatures;
 // track them using L-K Optical Flow
 char featureStatus[MAX_FEATURES_TO_TRACK];
 float featureErrors[MAX_FEATURES_TO_TRACK];
 CvSize pyramidSize = cvSize(gray->width + 8, gray->height / 3);
 CvPoint2D32f *featuresB = new CvPoint2D32f[MAX_FEATURES_TO_TRACK];
 CvPoint2D32f *tempFeatures = new CvPoint2D32f[MAX_FEATURES_TO_TRACK];

 cvCalcOpticalFlowPyrLK(prevGray, gray, prevPyramid, pyramid, features, featuresB, numFeatures, cvSize(10,10), 5, featureStatus, featureErrors, cvTermCriteria(CV_TERMCRIT_ITER | CV_TERMCRIT_EPS, 20, -3), 0);

 numFeatures = 0;
 float sumX = 0;
 float sumY = 0;
 float meanX = 0;
 float meanY = 0;
 // copy back to features, but keep only high status points
 // and count the number using numFeatures
 for (int i = 0; i < MAX_FEATURES_TO_TRACK; i++)
 if (featureStatus[i])
 // quick prune just by checking if the point is outside the image bounds
 if (featuresB[i].x < 0 || featuresB[i].y < 0 || featuresB[i].x > gray->width || featuresB[i].y > gray->height)
 // do nothing
 // count the good values
 tempFeatures[numFeatures] = featuresB[i];
 // sum up to later calc the mean for x and y
 sumX += featuresB[i].x;
 sumY += featuresB[i].y;

 //cout << "featureStatus[" << i << "] : " << featureStatus[i] << endl;
 //cout << "numFeatures: " << numFeatures << endl;

 // calc the means
 meanX = sumX / numFeatures;
 meanY = sumY / numFeatures;

 // prune points using mean squared error
 // caclulate the squaredError for x, y (square of the distance from the mean)
 float squaredErrorXY = 0;
 for (int i = 0; i < numFeatures; i++)
 squaredErrorXY += (tempFeatures[i].x - meanX) * (tempFeatures[i].x - meanX) + (tempFeatures[i].y  - meanY) * (tempFeatures[i].y - meanY);
 //cout << "squaredErrorXY: " << squaredErrorXY << endl;

 // calculate mean squared error for x,y
 float meanSquaredErrorXY = squaredErrorXY / numFeatures;
 //cout << "meanSquaredErrorXY: " << meanSquaredErrorXY << endl;

 // mean squared error must be greater than 0 but less than our threshold (big number that would indicate our points are insanely spread out)
 if (meanSquaredErrorXY == 0 || meanSquaredErrorXY > MSE_XY_MAX)
 numFeatures = 0;
 died = true;
 // Throw away the outliers based on the x-y variance
 // store the good values in the features array
 int cnt = 0;
 for (int i = 0; i < numFeatures; i++)
 float standardErrorXY = ((tempFeatures[i].x - meanX) * (tempFeatures[i].x - meanX) + (tempFeatures[i].y - meanY) * (tempFeatures[i].y - meanY)) / meanSquaredErrorXY;
 if (standardErrorXY < STANDARD_ERROR_XY_MAX)
 // we want to keep this point
 features[cnt] = tempFeatures[i];

 numFeatures = cnt;

 // only bother with fixing the tail of the features array if we still have points to track
 if (numFeatures > 0)
 // set everything past numFeatures to -10,-10 in our updated features array
 for (int i = numFeatures; i < MAX_FEATURES_TO_TRACK; i++)
 features[i] = cvPoint2D32f(-10,-10);

 // check if we're below the threshold min points to track before adding new ones
 if (numFeatures < minFeaturesToNewSearch)
 // add new features
 // up the multiplier for expanding the region
 expandROIMult *= EXPAND_ROI_INIT;

 // expand the trackBox
 float newWidth = featureTrackBox.size.width * expandROIMult;
 float newHeight = featureTrackBox.size.height * expandROIMult;
 CvSize2D32f newSize = cvSize2D32f(newWidth, newHeight);
 CvBox2D newRoiBox = {, newSize, featureTrackBox.angle};

 // find new points
 CvPoint2D32f additionalFeatures[MAX_FEATURES_TO_ADD] = {0};
 int numAdditionalFeatures = findFeatures(gray, additionalFeatures, newRoiBox);
 int endLoop = MAX_FEATURES_TO_ADD;
 if (MAX_FEATURES_TO_TRACK < endLoop + numFeatures)
 endLoop -= numFeatures + endLoop - MAX_FEATURES_TO_TRACK;
 // copy new stuff to features, but be mindful of the array max
 for (int i = 0; i < endLoop; i++)
 // TODO check if they are way outside our stuff????
 int dist = findDistanceToCluster(additionalFeatures[i], features, numFeatures);
 features[numFeatures] = additionalFeatures[i];

 // TODO check for duplicates???

 // check if we're below the reset min
 if (numFeatures < MIN_FEATURES_TO_RESET)
 // if so, set to numFeatures 0, null out the detect rect and do face detection on the next frame
 numFeatures = 0;
 faceDetectRect = NULL;
 died = true;
 // reset the expand roi mult back to the init
 // since this frame didn't need an expansion

 // find the new track box
 if (!died)
 featureTrackBox = findTrackBox(features, numFeatures);
 // convert the faceDetectRect to a CvBox2D
 CvPoint2D32f center = cvPoint2D32f(faceDetectRect->x + faceDetectRect->width * 0.5, faceDetectRect->y + faceDetectRect->height * 0.5);
 CvSize2D32f size = cvSize2D32f(faceDetectRect->width, faceDetectRect->height);
 CvBox2D roiBox = {center, size, 0};
 // get features to track
 numFeatures = findFeatures(gray, features, roiBox);
 // verify that we found features to track on this frame
 if (numFeatures > 0)
 // find the corner subPix
 cvFindCornerSubPix(gray, features, numFeatures, cvSize(10, 10), cvSize(-1,-1), cvTermCriteria(CV_TERMCRIT_ITER | CV_TERMCRIT_EPS, 20, 0.03));

 // define the featureTrackBox around our new points
 featureTrackBox = findTrackBox(features, numFeatures);
 // calculate the minFeaturesToNewSearch from our detected face values
 minFeaturesToNewSearch = 0.9 * numFeatures;

 // wait for the next frame to start tracking using optical flow
 // try for a new face detect rect for the next frame
 faceDetectRect = faceDetection.detectFace(currentFrame);
 // reset the current features
 numFeatures = 0;
 // try for a new face detect rect for the next frame
 faceDetectRect = faceDetection.detectFace(currentFrame);

 // save gray and pyramid frames for next frame
 cvCopy(gray, prevGray, 0);
 cvCopy(pyramid, prevPyramid, 0);

 // draw some stuff into the frame to show results
 if (numFeatures > 0)
 // show the features as little dots
 for(int i = 0; i < numFeatures; i++)
 CvPoint myPoint = cvPointFrom32f(features[i]);
 cvCircle(currentFrame, cvPointFrom32f(features[i]), 2, CV_RGB(0, 255, 0), CV_FILLED);
 // show the tracking box as an ellipse
 cvEllipseBox(currentFrame, featureTrackBox, CV_RGB(0, 0, 255), 3);

 // show the current frame in the window
 cvShowImage(windowName, currentFrame);

 // wait for next frame or keypress
 char c = (char)waitKey(30);
        if(c == 27)
 case 'r':
 numFeatures = 0;
 // try for a new face detect rect for the next frame
 faceDetectRect = faceDetection.detectFace(currentFrame);

 // Release the image and tracker

 // Destroy the window previously created
 return 0;

// draws a region of interest in the src frame based on the given rect
void drawFaceROIFromRect(IplImage *src, CvRect *rect)
 // Points to draw the face rectangle
 CvPoint pt1 = cvPoint(0, 0);
 CvPoint pt2 = cvPoint(0, 0);

 // setup the points for drawing the rectangle
 pt1.x = rect->x;
 pt1.y = rect->y;
 pt2.x = pt1.x + rect->width;
 pt2.y = pt1.y + rect->height;

 // Draw face rectangle
 cvRectangle(src, pt1, pt2, CV_RGB(255,0,0), 2, 8, 0 );

// finds features and stores them in the given array
// TODO move this method into a Class
int findFeatures(IplImage *src, CvPoint2D32f *features, CvBox2D roi)
 //cout << "findFeatures" << endl;
 int featureCount = 0;
 double minDistance = 5;
 double quality = 0.01;
 int blockSize = 3;
 int useHarris = 0;
 double k = 0.04;

 // Create a mask image to be used to select the tracked points
 IplImage *mask = cvCreateImage(cvGetSize(src), IPL_DEPTH_8U, 1);

 // Begin with all black pixels

 // Create a filled white ellipse within the box to define the ROI in the mask.
 cvEllipseBox(mask, roi, CV_RGB(255, 255, 255), CV_FILLED);
 // Create the temporary scratchpad images
 IplImage *eig = cvCreateImage(cvGetSize(src), IPL_DEPTH_8U, 1);
 IplImage *temp = cvCreateImage(cvGetSize(src), IPL_DEPTH_8U, 1);

 // init the corner count int
 int cornerCount = MAX_FEATURES_TO_TRACK;

 // Find keypoints to track using Good Features to Track
 cvGoodFeaturesToTrack(src, eig, temp, features, &cornerCount, quality, minDistance, mask, blockSize, useHarris, k);

 // iterate through the array
 for (int i = 0; i < cornerCount; i++)
 if ((features[i].x == 0 && features[i].y == 0) || features[i].x > src->width || features[i].y > src->height)
 // do nothing
 //cout << "nfeatureCount = " << featureCount << endl;

 // return the feature count
 return featureCount;

// finds the track box for a given array of 2d points
// TODO move this method into a Class
CvBox2D findTrackBox(CvPoint2D32f *points, int numPoints)
 //cout << "findTrackBox" << endl;
 //cout << "numPoints: " << numPoints << endl;
 CvBox2D box;
 // matrix for helping calculate the track box 
 CvMat *featureMatrix = cvCreateMat(1, numPoints, CV_32SC2);
 // collect the feature points in the feature matrix
 for(int i = 0; i < numPoints; i++)
 cvSet2D(featureMatrix, 0, i, cvScalar(points[i].x, points[i].y));
 // create an ellipse off of the featureMatrix
 box = cvFitEllipse2(featureMatrix);
 // release the matrix (cause we're done with it)
 // return the box
 return box;

int findDistanceToCluster(CvPoint2D32f point, CvPoint2D32f *cluster, int numClusterPoints)
 int minDistance = 10000;
 for (int i = 0; i < numClusterPoints; i++)
 int distance = abs(point.x - cluster[i].x) + abs(point.y - cluster[i].y);
 if (distance < minDistance)
 minDistance = distance;
 return minDistance;

For more complete information about compiler optimizations, see our Optimization Notice.


Infrared5's picture

@rsa - the header files you referenced were custom written for this game. There are plans to create an open source version of the code. Stay tuned for the release of the code on the Infrared5 github page:

Infrared5's picture

@Walesa D, since you're talking about opencvsharp. I am assuming you're doing things within Unity (yes/no?). We found that workflow didn't work out for us partially because you're depending so much on ports of openCV to C#. Not all builds of openCV are cerated equal. It's better to stay in C++ with it where you have total control of the build. That's what we found anyway. We abandoned openCVSharp in Unity after about 48 hours.

As far as the Intel IPC in Unity, the depth map comes in as an array of shorts. If you want it to be something different, it's on you to convert.

rsa's picture

hi! are these header files user defined?? "CaptureFrame.h" ,"FaceDetection.h" ....
where shall i get them??

Walesa D.'s picture

hey,, i want to use opencvsharp for IPC.
but i have a problem, how can I access directly the depth stream camera as an IplImage ?
I have convert it pixel by pixel but the performance is so bad :(

Peter O&amp;#039;Hanlon's picture

Wow guys. This looks fantastic, and you win the longest blog post award this week - by far.

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.