Continuation Passing

Method spawn_and_wait_for_all enables an executing parent task to wait until its child tasks complete, but can incur some inefficiency. When a thread calls spawn_and_wait_for_all, it keeps busy until all of the children complete by working on other tasks. Sometimes the parent task becomes ready to continue, but cannot do so immediately because its thread is still executing one of the other tasks. The solution is for the parent to not wait on its children, and instead spawn both children and return. The children are allocated not as children of the parent, but as children of the parent’s continuation task. Any idle thread can steal and run the continuation task when its children complete.

The "continuation-passing" variant of FibTask described in Simple Example is shown below.

  • Insertions are shown in bold font

  • Deletions are commented out.

struct FibContinuation: public task {
    long* const sum;
    long x, y;
    FibContinuation( long* sum_ ) : sum(sum_) {}
    task* execute() {
        *sum = x+y;
        return NULL;
struct FibTask: public task {
    const long n;
    long* const sum;
    FibTask( long n_, long* sum_ ) :
        n(n_), sum(sum_)
    task* execute() {
        if( n<CutOff ) {
            *sum = SerialFib(n);
            return NULL;
        } else {
            // long x, y; This line removed 
            FibContinuation& c = 
                *new( allocate_continuation() ) FibContinuation(sum);
            FibTask& a = *new( c.allocate_child() ) FibTask(n-2,&c.x);
            FibTask& b = *new( c.allocate_child() ) FibTask(n-1,&c.y);
            // Set ref_count to "two children plus one for the wait".
            spawn( b );
            spawn( a );
            // *sum = x+y; This line removed
            return NULL;

The following differences between the original version and the continuation version need to be understood:

The big difference is that in the original version x and y were local variables in method execute. In the continuation-passing version, they cannot be local variables, because the parent returns before its children complete. Instead, they are fields of the continuation task FibContinuation.

The allocation logic is changed. The continuation is allocated with allocate_continuation. It is similar to allocate_child, except that it forwards the successor of this to c, and sets the successor of this to NULL. The following figure summarizes the transformation:

Action of allocate_continuation

A property of the transformation is that it does not change the reference count of the successor, and thus avoids interfering with reference-counting logic.

The reference count is set to 2, the number of children. In the original version, it was set to 3 because spawn_and_wait_for_all required the augmented count. Furthermore, the code sets the reference count of the continuation instead of the parent, because it is the execution of the continuation that waits on the children.

The pointer sum is passed to the continuation by the constructor, because it is now FibContinuation that stores into *sum. The children are still allocated with allocate_child, but notice that now they are allocated as children of the continuation c, not the parent. This is so that c, and not this, becomes the successor of the children that is automatically spawned when both children complete. If you accidentally used this.allocate_child(), then the parent task would run again after both children completed.

If you remember how the original top-level code, ParallelFib, was written, you might be worried now that continuation-passing style breaks the code, because now the root FibTask completes before the children are done, and the top-level code used spawn_root_and_wait to wait on the root FibTask. This is not a problem, because spawn_root_and_wait is designed to work correctly with continuation-passing style. An invocation spawn_root_and_wait(x) does not actually wait for x to complete. Instead, it constructs a dummy successor of x, and waits for the successors’s reference count to be decremented. Because allocate_continuation forwards this dummy successor to the continuation, the dummy successor’s reference count is not decremented until the continuation completes.

For more complete information about compiler optimizations, see our Optimization Notice.