Architecture Pattern: Compute On Demand

There are so many examples of applications using pre-processing strategy that it is trivial. For example using a webcam we often find the device driver doing some software adjustments and corrections such as white balancing. Too often we find devices using software features. Other examples would be in a pipeline and User Interfaces. When it comes to UI we already learned to fill the list when the user clicks the drop-down, so only when the user really wants to use the list we will "pay" for the data. Given that, it is surprising to find that very expensive and resource intensive algorithms sometimes completely ignore what's actually required. We can find our system spending well over 50% CPU over data that nobody really wants, or 7% CPU out of the total 16% CPU load wasted on pointless data.

Here is a simplified scenario: Our system needs to identify when the user is looking away from the screen:
1. First of all we start by assuming that there is no user so every 500ms we check to see whether or not there was any movement.
2. If there was movement and it has the features of a head then we search for the eyes.
3. When we find the eyes we lock on to them and keep tracking them.
4. When the distance between the eyes is reduced to 50% we say that the user is looking away.

We know how to identify movement so Item 1 is taken care of internally. We are using a third party library for Item 2: detection of head features. We paid a freelancer to write the eye detection algorithm for us (Item 3) and we completed Item 4 internally.

Our pipeline looks like this:

Here are the details:

Camera Source: has a new frame every 30ms
Contrast Stretching: Because both 'Head Detection' and 'Movement' require this image correction we use the same processing for both.
Movement: Every 500ms takes the latest frame and compares it with the previous frame (500ms before)
Head Detection: This is an off-the-shelf component which we bought and it does 10 measurements per second (100ms)
Clipping: Once we have heads detected we cut off the edges so that we look for eyes only within the blocking rectangle of the heads
Eye Detection: The custom algorithm detects the location of eyes and adds a marker: left / right
Eye Distance: We measure the distance between a pair of eyes and when this goes below a level we say that the person is not looking at the monitor


Within time we found that the user should get a 'grace period' of 5 seconds before we trigger the flag because users sometimes look at the mouse or at a fly moving around them. So eventually we decided to measure the distance every second for two consecutive frames (200ms head detection).

Can you now see the waste? Here it is:

1. The camera outputs 30 frames per second and thus Contrast Stretching executed on 30 frames. When 'Movement' is on we only use 2 of these frames. When 'Head Detection' is on we only use 10 of these frames.
2. 'Head Detection' outputs 10 frames per second so Clipping is also applied on 10 frames every second and the expensive 'Eye Detection' algorithm is also executed for every one of these frames but 'Eye Distance' only uses 2 frames per second.

The trivial solution is to "tailor" the system for optimized execution. The problem is for example that we might find out that the version of 'Head Detection' for Windows 12 outputs a frame every 87ms, or we might decide that we want 'Eye Distance' to work for 500ms and then pause for 800ms.

Here is a better solution:

We take the diagram above and move it to code. All the components are there and we have the flow working. What's missing are the inputs and outputs. Since this is real-time / live data processing we only want the latest and most up-to-date input. Here is the input list:

Contrast Stretching: 1 Frame input
Movement: 2 Frames input
Head Detection: 1 Frame input
Clipping: 1 Frame input
Eye Detection: 1 Frame input
Eye Distance: 2 Frame input (because we make the decision only after the second frame)

Here is how it works:

Every 30ms the Camera sends a frame to 'Contrast Stretching'. Instead of processing the data, 'Contrast Stretching' will only save the frame in its input slot (global to the pipeline object).
When 'Movement' is on (no user yet): Every 500ms we ask 'Contrast Stretching' for a frame. 'Contrast Stretching' will check to see if there is a valid output on the output slot. If so it will return the output, if not then it will process the data to create the output. Somewhat similar to a Singleton. When a new input is sent from the Camera the output becomes invalid.

When a user is detected the system becomes active. At that state every second we ask 'Eye Distance' to tell us whether the user is looking away or not and we count 5 'yes' in a row. 'Eye Distance' will return the result of the previous test performed and initiate a new test by asking 'Eye Detection' for its output and then asking again after 100ms. After the second result has arrived 'Eye Distance' will evaluate and save the answer for the next time it is asked.

When 'Eye Distance' asks 'Eye Detection' for data, 'Eye Detection' will ask 'Clipping' for data. 'Clipping' gets a new input frame from 'Head Detection' every 100ms. It saves the input frame and when it is asked for output it will take the latest frame and process it (unless it was already processed before - Singleton style).

This way even if block specifications change the system is still optimized for performance. We actually saved over 90% CPU load BY DESIGN and not by post-coding optimizations.

Note that often you might want to add Expiration tag for the pipeline data (for example in case the previous frame was 1 week ago before hibernation)

This is a very good system model and we should expect more and more systems designed according to these concepts as parallel computing matures.

For more complete information about compiler optimizations, see our Optimization Notice.


Lorenzo C.'s picture

Great article Asaf. Thanks for taking the time to share it and also thank you for including some of the commercial details of your dev effort. That added perspective.

Asaf Shelly's picture

No. We don't have a single resource shared between a writer (Producer) and a reader (Consumer).
Implementing this on Producer/Consumer pattern would mean that the Consumer is invoking the Producer, asking only for the fragment of the buffer which is required.

anonymous's picture

Is this not only the Producer/Consumer pattern, rediscovered?

Abhishek 81's picture

I was just wondering how we can implement this in Ultrabook.As i am discovering the site i am liking it more,good resource.

Dmitry Oganezov (Intel)'s picture

I think it's a very valid approach, Asaf.
>> We can find our system spending well over 50% CPU over data that nobody really wants.
...and sometimes understanding of what's going on in the system takes inapropriate long time. 

I actually wanted to ask about a 3rd party library that you use to detect head features and about eyes detection alghorytm. 

Have you heard about our brand-new Intel® Perceptual Computing SDK?  I suppose that it has the simular funtionality. Check it out and tell us what do you think:

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.