General Acyclic Graphs of Tasks

The task graphs considered so far have a tree structure where each task has a single successor task::parent() waiting for it to complete. To accommodate more complex graphs where a task has multiple successors, Intel® Threading Building Blocks (Intel® TBB) 2.2 and later has methods that allow direct manipulation of a task's reference count.

For example, consider a MxN array of tasks where each task depends on the tasks to the left and above it. The following figure shows such an example:

Task graph where some tasks have more than one successor.

The following code evaluates such a task graph, where each task computes a sum of inputs from its neighbors to the left and above it.

const int M=3, N=4;
class DagTask: public tbb::task {
    const int i, j;
    // input[0] = sum from above, input[1] = sum from left
    double input[2];
    double sum;
    // successor[0] = successor below, successor[1] = successor to right
    DagTask* successor[2];
    DagTask( int i_, int j_ ) : i(i_), j(j_) {
        input[0] = input[1] = 0;
    task* execute() {
        __TBB_ASSERT( ref_count()==0, NULL );
        sum = i==0 && j==0 ? 1 : input[0]+input[1];
        for( int k=0; k<2; ++k )
            if( DagTask* t = successor[k] ) {
                t->input[k] = sum;
                if( t->decrement_ref_count()==0 )
                    spawn( *t );
        return NULL;
double BuildAndEvaluateDAG() {
    DagTask* x[M][N];
    for( int i=M; --i>=0; )
        for( int j=N; --j>=0; ) {
            x[i][j] = new( tbb::task::allocate_root() ) DagTask(i,j);
            x[i][j]->successor[0] = i+1<M ? x[i+1][j] : NULL;
            x[i][j]->successor[1] = j+1<N ? x[i][j+1] : NULL;
    // Add extra reference to last task, because it is waited on
    // by spawn_and_wait_for_all.
    // Wait for all but last task to complete.
    // Last task is not executed implicitly, so execute it explicitly.
    double result = x[M-1][N-1]->sum;
    // Destroy last task.
    return result;

Function BuildAndEvaluateDAG first builds an array of DagTask. It allocates each task as a root task because task::parent() is not used to record successor relationships. The reference count of each task is initialized to the number of its predecessors. It evaluates the graph by spawning the initial task x[0][0] and waiting for x[M-1][N-1] to complete. As each task completes, it decrements the reference count of its successors, and spawns any successor whose count becomes zero. Given a sufficient number of processors, execution sweeps diagonally over the graph like a wave front from top left to bottom right.

The last task x[M-1][N-1] requires special handling because of its special interaction with BuildAndEvaluateDAG:

  • The last task is used to wait explicitly for other tasks to complete. Method wait_for_all requires that the last task's reference count be set to one more than the number of its predecessors. Thus the last task is not implicitly executed when its predecessors complete.

  • The value sum must be extracted from the last task before it is destroyed.

Hence the example explicitly executes the last task, extracts its sum, and then destroys it.

For more complete information about compiler optimizations, see our Optimization Notice.