Multithreading Perceptual Computing Applications in Unity3d

by Steff Kelsey, Sr Software Developer @ Infrared5

Introduction

It’s the eleventh hour and your application is going live tomorrow. But it’s sluggish! It’s jittery! It’s jarring! You wanted an immersive experience and you have already worked for weeks! You need to optimize!

Sound familiar? The above scenario is where the Infrared5 team found itself within 24 hours of showing our game, Kiwi Katapault Revenge, at GDC 2013. We had worked for weeks in different sandboxes, one team working on the experience in Unity and another working on the camera input and face tracking in C++. Putting it all together revealed some performance problems. Creating controllers with perceptual computing algorithms is a difficult thing to do because input-ready controllers need to be as flawless and low latency as possible while the algorithms in computer vision applications can bring the CPUs of smaller notebooks and tablets past their maximum output. If we had more time, we could look closer at our C++ code and find a way to optimize. But refactoring often takes longer than 24 hours and is fraught with peril. We decided to leave the C++ code alone and move the face tracking algorithms into a different thread in Unity so we could show off our game with pride at GDC.

In this article, we will demonstrate how we recognized the performance problems, how we diagnosed that multithreading our application was a possible fix, and how we iterated on different implementations to arrive at a good solution using C# in Unity. 

The Approach

  1. Analyze the performance of the application
  2. Use a profiler to quickly find bottlenecks in the code
  3. Evaluate if threading will fix the problem
  4. Thread the method in a safe way using locks
  5. Make sure locks are not creating bottllenecks
  6. Repeat steps 1 through 5 until performance is acceptable
  7. Safely kill the thread

Analyzing Performance

Plenty of great tools are out there for analyzing the performance of your application. Since we were working on a Unity game, we saw problems right away just by playing the game. Motion was jittery and the lag between the controls and the screen update was unacceptable. At this point, we were hoping that the cause of the poor performance was limited to one or two isolated pieces of code. We needed to find where the bottleneck was and unblock it as quickly as possible. We used the Unity Profiler to find our problem code, but first let’s look at some other options.

Our solution came in to us on the C# side of the application, but we could have just as easily analyzed and fixed things in C++. You can do very simple things in C++ to diagnose problems, from timing methods and outputting the results into log files or utilizing proprietary and open source profilers. I was using the Express version of Visual Studio* (no profiler), but the Pro version has great tools for profiling. Intel VTuneTM analyzer is another trusted proprietary tool. For analyzing graphics specific applications, Very Simple Profile Library (VSProfileLib) is an open source profiler with a proven track record. Also in this space, Intel offers the Intel Graphics Performance Analyzer (Intel® GPA). So, there were plenty of ways that the C++ code could be profiled and optimized to remove bottlenecks in the face tracking pipeline. At some point in the near future, we will revisit our OpenCV face tracking algorithms and optimize them to work on leaner computers.

Using the Unity Profiler

Here we were with jittery playback of our Unity application, but where was the bottleneck? This application featured a first person camera flying through the air, shooting targets on the ground with lasers, listening for voice commands to activate a flamethrower (“FIIIIIRRRRRRRE!!”), and changing the perspective projection of the game based on the position of the players face. In short, there were plenty of places where code bottlenecks might be lurking. For quick diagnosis, we used the Unity Profiler.


The non-threaded profile output (above), shows profile usage graphs for the CPU, GPU, rendering, memory, and audio. Clearly visible is a repeating spike in the CPU usage with a process taking upwards of 70 ms. Looking at the Overview, we can see that the offending method is the Update() call in the IntelPercCompControllerNotThreaded class. Relevant lines are below:

Void Update()
{
  if (faceTrackingIsWorking)
  {
    updateFacePosition();
  }
}

private void updateFacePosition()
{
  // always check if we have successfully advanced frames
  gotUpdateFromCamera = false;
  gotUpdateFromCamera = faceTracker.AdvanceFrame();

  if (gotUpdateFromCamera)
  {
    float newX = 0f;
    float newY = 0f;
    float newZ = 0.5f;
    if (faceTracker.GetIsTrackingFace())
    {
      // update the position of the camera
      // get the values for the new face position
      newX = faceTracker.GetFaceX();
      newY = faceTracker.GetFaceY();
      newZ = faceTracker.GetFaceZ();
      // save the new value in facepos for the projection calc and to smooth (by lerp) on the next frame update
      targetFacePosition.Set(newX, newY, newZ);
      // reset the time since last update
      ticksSinceLastUpdate = 0;
      lastUpdateStamp = DateTime.Now;
    }
    else
    {
      // if we have waited long enough, return to neutral position
      ticksSinceLastUpdate = DateTime.Now.Ticks - lastUpdateStamp.Ticks;
      if (ticksSinceLastUpdate > maxUpdateTicks)
      {
        // return to neutral position
        targetFacePosition.Set(newX, newY, newZ);
      }
    }
    //Debug.Log("targetFacePosition = ("+ targetFacePosition.x.ToString("0.00") +", "+targetFacePosition.y.ToString("0.00") +", "+ targetFacePosition.z.ToString("0.00") +")");
  }
}
Since this class is small and contains small methods that do only a few things each, it is easy to find the offending method call inside updateFacePosition(): faceTracker.AdvanceFrame(). This method is called directly on our C++ DLL and kicks off various tracking algorithms depending on where the player’s face is in relation to the camera. The DLL is either searching for a face, finding feature points on the face to track, or tracking the points from frame to frame and calculating the face size and position while determining if our track has gone haywire (Is the tracked region the same size as a normal face, etc? If not, adjust the region or restart detection). All three pathways in the pipeline can use substantial processor ticks, especially when combined with everything the Unity Engine needs to be doing at the same time. Like we said before, there is plenty of opportunity to optimize the C++ starting at the pathways called in the AdvanceFrame() method, but we have a deadline approaching, and we have a nice method that we can easily put in a thread. The updateFacePosition() method.

A good candidate for threading when refactoring code has the following traits.

  • small method that does only one thing
  • has clear shared resource with the main thread (or other threads if you have something that can be broken into many pathways)

When threading, there should be a small amount of shared resources between the threads. We have broken up a bottleneck by threading, but introduced new complexity by creating conditions that will have to be synchronized.

Threading in C#

Let’s thread! With the all important deadline looming, I don’t properly plan and instead quickly create a new thread, add a thread lock object (so I feel like I am being safe), and dump my entire updateFacePosition() method under the locked brackets. Code is below.

void Start() 
{
  // start the face tracker in a new thread
  faceTrackingThread = new Thread(StartAndRunFaceTracker) {Name = "FaceTrackingThread"};
  faceTrackingThread.Start();
}

private void StartAndRunFaceTracker()
{
  while (!killThread)
  {
    if (!triedToInitFaceTracker)
    {
      triedToInitFaceTracker = true;
      faceTracker = new FaceTrackerWrapper();
      // location of the haar cascade file for openCV to load in the DLL
      string haarPath = @"./Assets/haarcascade_frontalface_alt.xml";
      int a = faceTracker.InitTracking(haarPath);
      // save if we initialized successfully
      faceTrackingIsWorking = (a == 0);
    }
    
    // output result to log
    Debug.Log("faceTrackingIsWorking = " + faceTrackingIsWorking);

    while (faceTrackingIsWorking)
    {
      Thread.Sleep(33);
      lock(threadLock) // lock to be thread safe
      {
        updateFacePosition();
      }
    }
  }
  Debug.Log("Camera ThreadLoop end========================================");
}

private void updateFacePosition()
{
  // always check if we have successfully advanced frames
  gotUpdateFromCamera = false;
  gotUpdateFromCamera = faceTracker.AdvanceFrame();

  if (gotUpdateFromCamera)
  {
    float newX = 0f;
    float newY = 0f;
    float newZ = 0.5f;
    if (faceTracker.GetIsTrackingFace())
    {
      // update the position of the camera
      // get the values for the new face position
      newX = faceTracker.GetFaceX();
      newY = faceTracker.GetFaceY();
      newZ = faceTracker.GetFaceZ();
      // save the new value in facepos for the projection calc and to smooth (by lerp) on the next frame update
      targetFacePosition.Set(newX, newY, newZ);
      // reset the time since last update
      ticksSinceLastUpdate = 0;
      lastUpdateStamp = DateTime.Now;
    }
    else
    {
      // if we have waited long enough, return to neutral position
      ticksSinceLastUpdate = DateTime.Now.Ticks - lastUpdateStamp.Ticks;
      if (ticksSinceLastUpdate > maxUpdateTicks)
      {
        // return to neutral position
        targetFacePosition.Set(newX, newY, newZ);
      }
    }
    //Debug.Log("targetFacePosition = ("+ targetFacePosition.x.ToString("0.00") +", "+targetFacePosition.y.ToString("0.00") +", "+ targetFacePosition.z.ToString("0.00") +")");
  }
}

void Update()
{
  if (faceTrackingIsWorking)
  {
    lock(threadLock) // lock to access the shared resource
    {
      // check on a possible change in sensivity
      if (ticksSinceLastUpdate > maxUpdateTicks)
      {
        // face has been lost, slowly ease back to neutral
        facePosition = Vector3.Lerp(facePosition, targetFacePosition, Time.deltaTime * 1.0f);
      }
      else
      {
        // track quickly when the user is moving around
        Vector3 smoothed = DataSmoothingUtil.ExponentialSmoothing3(targetFacePosition, facePosition, 0.3f); 
        facePosition = smoothed;
      }
    }
  }
  // ... do a bunch of other stuff....
}
Walking through the code, we can see the creation of a new thread that uses the StartAndRunFaceTracker() method. The thread is started and immediately goes into a while loop, which the killThread property is checked against. Since we’re doing a repeated action in our thread (checking for changing face position), we needed a way to repeat the loop until it was no longer needed. We have another loop further down that checks if face tracking is even working (it might not have initialized properly if no camera was found or some other condition) and here is where we put in some timing. We tell the thread to sleep for 33 ms before each updateFacePosition() check, and we locked everything around that call (remember this lazy lock). The main thread does not pick up the shared resource again until the Update() method. Here, we lock the thread while pulling data from the shared resource, the targetFacePosition property.

Back to the Profiler and Voila!

Is it Fixed?

Looking at our new profile graphs, we can see that the consistent peak of CPU usage around 33 ms in height has been completely wiped out, but we still have repeated spikes of usage around 70 ms in height. And the game is still jittery. Your programming career flashes before your eyes. You thought threading was the answer and the application is still just as slow. What happened? Locking happened.

A thread lock can be nicely visualized as a traffic light. You have two intersecting roads and the light is green for only one direction at a time. The other directions have to stop for red lights and wait until they get the green light. Just like stop lights can create bottlenecks, thread locks can create bottlenecks. Of course, looking at our code, the quickest way out is to just remove one of the locks, right? A common sense analysis of what we are doing can lead to the conclusion that it actually doesn’t matter if one thread is reading the targetFacePosition while the other is writing it. They don’t “need” to be in sync in this case. If one misses an update call, it will pick it up again on the next frame at 60fps in Unity. So, you take the lock off and look at the profiler.

// lock removed for performance sake (please don't do this ever)
void Update()
{
  if (faceTrackingIsWorking)
  {
    // check on a possible change in sensivity
    if (ticksSinceLastUpdate > maxUpdateTicks)
    {
      // face has been lost, slowly ease back to neutral
      // TODO ACCESSING SHARED VARIABLE targetFacePosition WITH NO LOCK! MUST FIX!
      facePosition = Vector3.Lerp(facePosition, targetFacePosition, Time.deltaTime * 1.0f);
    }
    else
    {
      // track quickly when the user is moving around
      // TODO ACCESSING SHARED VARIABLE targetFacePosition WITH NO LOCK (TOTALLY DOING IT TWICE, TOO)! MUST FIX!  
      Vector3 smoothed = DataSmoothingUtil.ExponentialSmoothing3(targetFacePosition, facePosition, 0.3f); 
      facePosition = smoothed;
    }
  }
  // ... do a bunch of other stuff....
}


What a nice looking graph! Problem solved! Did we go to GDC with this code to demo our game? If we did, I would be too ashamed to admit it. Only the git logs know the truth.

Is it Safe?

Back to our example of the locks as a traffic light. What I essentially did was make a light that only controls one direction of traffic. It changes from green to red while the perpendicular road just sends cars right through whenever, with no light. Collisions are inevitable. We are very unsafe. Does it really matter given the way that the shared resource is being accessed? No, it doesn’t matter at the moment. At the moment is the qualifying statement to look out for. We don’t know how many versions of this game are in our future. It could go viral and we could add hundreds more features! There could be more cameras added in a large installation version to really nail that target face position! We just don’t know. Better to be safe now and be ready to handle changes in the future. The crowd screams, “but performance was terrible when we were completely safe!” Before giving in to a life of danger, let’s look at why performance suffered with the previous implementation of thread locks.

Lock Smarter

Looking at the code snippet where we locked the thread, we placed it around the entire updateFacePosition() method. I think that’s being a little bit overzealous. The method actually has a few calls to the DLL in there and the majority of the code does not get or set the shared resource. We only need to lock up points where the shared resource is accessed. Everything else is just too much. In this case, by locking up the faceTracker.AdvanceFrame() call on the DLL (along with lots of other stuff), we are forcing every thread that uses the same lock to wait for that call to complete before the lock is released. So we created a bottleneck of exactly the same size as the one we were trying to fix! And the property name threadLock has to be one of the worst names ever for a maintenance engineer to deal with. Sure, it locks the thread, but what are we protecting? I like to name the lock objects after the shared resource. Enter the targetFacePositionLock object.

void Start() 
{ 
  // start the face tracker in a new thread
  faceTrackingThread = new Thread(StartAndRunFaceTracker) {Name = "FaceTrackingThread"};
  faceTrackingThread.Start();
}

private void StartAndRunFaceTracker()
{
  while (!killThread)
  {
    if (!triedToInitFaceTracker)
    {
      triedToInitFaceTracker = true;
      faceTracker = new FaceTrackerWrapper();
      // location of the haar cascade file for openCV to load in the DLL
      string haarPath = @"./Assets/haarcascade_frontalface_alt.xml";
      int a = faceTracker.InitTracking(haarPath);
      // save if we initialized successfully
      faceTrackingIsWorking = (a == 0);
    }
    // output result to log
    Debug.Log("faceTrackingIsWorking = " + faceTrackingIsWorking);

    while (faceTrackingIsWorking)
    {
      Thread.Sleep(33);
      updateFacePosition();
    }
  }
  Debug.Log("Camera ThreadLoop end========================================");
}

private void updateFacePosition()
{
  // always check if we have successfully advanced frames
  gotUpdateFromCamera = false;
  gotUpdateFromCamera = faceTracker.AdvanceFrame();
  
  if (gotUpdateFromCamera)
  {
    float newX = 0f;
    float newY = 0f;
    float newZ = 0.5f;
    if (faceTracker.GetIsTrackingFace())
    {
      // update the position of the camera
      // get the values for the new face position
      newX = faceTracker.GetFaceX();
      newY = faceTracker.GetFaceY();
      newZ = faceTracker.GetFaceZ();
      lock(targetFacePositionLock) // lock when accessing this property
      {
        // save the new value in facepos for the projection calc and to smooth (by lerp) on the next frame update
        targetFacePosition.Set(newX, newY, newZ);
      }
      // reset the time since last update
      ticksSinceLastUpdate = 0;
      lastUpdateStamp = DateTime.Now;
    }
    else
    {
      // if we have waited long enough, return to neutral position
      ticksSinceLastUpdate = DateTime.Now.Ticks - lastUpdateStamp.Ticks;
      if (ticksSinceLastUpdate > maxUpdateTicks)
      {
        lock(targetFacePositionLock) // lock when accessing this property
        {
          // return to neutral position
          targetFacePosition.Set(newX, newY, newZ);
        }
      }
    }
    //Debug.Log("targetFacePosition = ("+ targetFacePosition.x.ToString("0.00") +", "+targetFacePosition.y.ToString("0.00") +", "+ targetFacePosition.z.ToString("0.00") +")");
  }
}

void Update()
{
  if (faceTrackingIsWorking)
  {
    lock(targetFacePositionLock) // lock to access targetFacePosition
    {
      // check on a possible change in sensivity
      if (ticksSinceLastUpdate > maxUpdateTicks)
      {
        // face has been lost, slowly ease back to neutral
        facePosition = Vector3.Lerp(facePosition, targetFacePosition, Time.deltaTime * 1.0f);
      }
      else
      {
        // track quickly when the user is moving around
        Vector3 smoothed = DataSmoothingUtil.ExponentialSmoothing3(targetFacePosition, facePosition, 0.3f); 
        facePosition = smoothed;
      }
    }
  }
  //.....do some other stuff ......
}

Now, the lock has been moved to only protect the targetFacePosition property. The expensive function calls are safely running in their own thread and are not part of the thread lock.

Is it Fixed? (Part II)

You bet it is fixed.


A party has started. Have we thought of everything?

Threads Gone Wild

What a beautiful set of graphs! You test the app a few times, shut it down for high fives, and prepare to package up your game, when you notice the Creative* Gesture Camera still has it’s lights on while the game is no longer running.


You test it a few times and notice that the light stays on until you restart the game, and then it shuts down for a second while the the camera is initialized. But how can this be happening? We were so safe! Why is the safe thread running forever? Why has your thread gone wild?

It’s simple. Don’t forget to kill your threads. They won’t always get the message that the job is done.

 void OnDestroy()
{
  //Debug.Log("OnDestroy");
  // shut down the camera
  faceTrackingIsWorking = false;
  killThread = true; // the thread will exit it's big while loop once this is set true
  faceTrackingThread.Join(); // join with the main thread to be sure it is done
}

Key Do's and Don'ts

  • Use small methods in small classes to leave more opportunities to thread later. Spaghetti code requires a large refactor to implement threading. (Save your noodles for dinner and leave them out of your code)
  • Don’t lock up too large a code block when putting in thread locks. You’re just creating a new bottleneck instead of fixing one.
  • Lock up all lanes of traffic, not just one (often for the sake of performance). You might be able to get away with “half-locking” now, but you never know what changes will be made in the future.
  • Make sure you kill your threads when they need to be destroyed. They can hang around in the background and cause trouble down the line. With great power (of threading), comes great responsibility (of managing your threads).

Summary

In this paper, we looked at:

  • identifying performance problems and quickly finding bottlenecks in code
  • proprietary and open source profiling tools for C++ and C#
  • recognizing when an algorithm can be threaded
  • thread safety with locks
  • best practices when using locks to optimize performance and safety
  • killing threads

Additional Resources

Infrared5 is an interactive studio that focuses on emerging technology. We like to build site-specific applications, games, and second-screen experiences. If it is cutting-edge, challenging, and fun then we say 'yeah, we can build that!'

http://www.infrared5.com

Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione