Cancellation and Nested Parallelism

The discussion so far was simplified by assuming non-nested parallelism and skipping details of task_group_context. This topic explains both.

An Intel® Threading Building Blocks (Intel® TBB) algorithm executes by creating task objects that execute the snippets of code that you supply to the algorithm template. By default, these task objects are associated with a task_group_context created by the algorithm. Nested Intel TBB algorithms create a tree of these task_group_context objects. Cancelling a task_group_context cancels all of its child task_group_context objects, and transitively all its descendants. Hence an algorithm and all algorithms it called can be cancelled with a single request.

Exceptions propagate upwards. Cancellation propagates downwards. The opposition interplays to cleanly stop a nested computation when an exception occurs. For example, consider the tree in the following figure. Imagine that each node represents an algorithm and its task_group_context.

Tree of task_group_context

Suppose that the algorithm in C throws an exception and no node catches the exception. Intel TBB propagates the exception upwards, cancelling related subtrees downwards, as follows:

  1. Handle exception in C:

    1. Capture exception in C.

    2. Cancel tasks in C.

    3. Throw exception from C to B.

  2. Handle exception in B:

    1. Capture exception in B.

    2. Cancel tasks in B and, by downwards propagation, in D.

    3. Throw an exception out of B to A.

  3. Handle exception in A:

    1. Capture exception in A.

    2. Cancel tasks in A and, by downwards propagation, in E, F, and G.

    3. Throw an exception upwards out of A.

If your code catches the exception at any level, then Intel TBB does not propagate it any further. For example, an exception that does not escape outside the body of a parallel_for does not cause cancellation of other iterations.

To prevent downwards propagation of cancellation into an algorithm, construct an 'isolated' task_group_context on the stack and pass it to the algorithm explicitly. The bold font in the following example shows how. The example uses C++11 lambda expressions for brevity.

#include "tbb/tbb.h"
bool Data[1000][1000];
int main() {
    try {
        parallel_for( 0, 1000, 1, 
            []( int i ) {
                task_group_context root(task_group_context::isolated);
                parallel_for( 0, 1000, 1,
                   []( int  ) {
                       Data[i][j] = true;
                throw "oops";
    } catch(...) {
    return 0;

The example performs two parallel loops: an outer loop over i and inner loop over j. The creation of the isolated task_group_context root protects the inner loop from downwards propagation of cancellation from the i loop. When the exception propagates to the outer loop, any pending outer iterations are cancelled, but not inner iterations for an outer iteration that started. Hence when the program completes, each row of Data may be different, depending upon whether its iteration i ran at all, but within a row, the elements will be homogenously false or true, not a mixture.

Removing the blue text would permit cancellation to propagate down into the inner loop. In that case, a row of Data might end up with both true and false values.

