Where's Your Head At? Ultimate Coder Week 6

Last week I and most other participants in the contest ran around GDC showing stuff off, so not much got done in terms of coding, but i for one learned one thing: People wanted to know how my head tracker works. So this week I will do my best to describe the algorithm and the process I used to come up with it.

image courtesy Flickr user bob_duffy

The first problem obvious: Find a head! while it is the first thing my algorithm does, it was one of the last things i implemented, because it turns out that its something that is rarely used. If you are tracking a head, the very word "tracking" indicates that you want to follow a head, so once you have found one you want to hold on to it. Therefor the "head finder" is only really needed the first frame, or when ever the algorithm concludes that it has lost the head.

To make this quick, I pick a pixel, read out its depth value, and then check if pixels one head size, to the left right and above are all at least 200 mm further away.  I do this on every 100th pixel. Usually i end up with a bunch of positives, so i pick the one closest to the camera.

We have now tracked the head to with in 20 pixels.

To figure out more precisely where the head is i scan the left right and above pixel row, until i "fall of the head." i can now re-center my head position more precisely.

We have now tracked the head to with in 4-5 pixels.

Wait? Only four or five pixels? Yes, now we run in to some problems. First of all, heads are round and not square, so if we happen to pick a pixel to the sides of our head, the height value will be much lower, as the vertical scan will fall of the edge where the head rounds off. Secondly the edges of your head is where you have hair, and hair is a very bad depth reflector as it defuses the IR pulse. So to make this better we re do the last step again, but this time we send many vertical rays, and many horizontal rays and then average them. To do this we need to scale our head to count for the distance to make sure we dont scan too much.

We have now tracked the head to within 1 pixel.

That's great right? one pixel! that's the resolution of the data so how can you do better? Well we need to do better. Consider this: The camera resolution is 320 pixels wide, if you want to move your head from side to side and still keep your entire head in view, you can move it about 240 pixels. Now lets imagine we want to draw a infinite corridor on the screen. When our head is right in-front of the screen the vanishing point should be at the center of the screen. If we move our head so it perpendicular to the edge of the screen the vanishing point should be there.

If you have a glossy screen there is an easy ay to test this: the vanishing point should always be right in the reflection of your eye, if everything is calibrated perfectly (When is it ever...)

So this means that if you move your 240 pixels of camera tracking the vanishing point should move 1920 pixels on a HD display, that's a resolution of 8 pixels! This is also if you are at the perfect range form the camera, if you move twice as far away the resolution gets twice as bad! When drawing your scene you can hide some of this by flattening your scene, to avoid the worst of the jerkiness, but you can also make it much worse by having something that protrudes out of the screen.

So how do I make it better? Well at this point I have a fairly accurate idea of where the head is, but the main problem is that I'm tracking the edges of the head where all the hair is, and not the surface. So what I decided to do was to draw a box around the head and do an average position of all pixels in it. I weight the pixels by how much the protrude out of the face and how bright the IR reflection is. This way I mainly get values from the much more stable face then the flimsy hair reading.

Now I get down to about 1/4 - 1/6 of a pixel in accuracy.

Since the position value is a float i actually get more then this but beyond this point it is pretty much noise. To get rid of the noise i simple keep the last frames head position around, and if the head hasn't moved more then a threshold i just keep it where it is.

I decided early on not to do any frame to frame smoothing of the head position for the simple reason that latency is very important, and I'm happy to report that running in 60Hz it is very responsive.

The results are very good and i doubt that any significant gains in precision can be made with this hardware.

For more complete information about compiler optimizations, see our Optimization Notice.

1 comment

Peter O'Hanlon's picture

A stunning amount of detail Eskil. Thanks for the detailed description. I'm just disappointed I didn't get to meet up with you last week.

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.