Cook Until Done: parallel_do

For some loops, the end of the iteration space is not known in advance, or the loop body may add more iterations to do before the loop exits. You can deal with both situations using the template class tbb::parallel_do.

A linked list is an example of an iteration space that is not known in advance. In parallel programming, it is usually better to use dynamic arrays instead of linked lists, because accessing items in a linked list is inherently serial. But if you are limited to linked lists, the items can be safely processed in parallel, and processing each item takes at least a few thousand instructions, you can use parallel_do to gain some parallelism.

For example, consider the following serial code:

void SerialApplyFooToList( const std::list<Item>& list ) {
    for( std::list<Item>::const_iterator i=list.begin() i!=list.end(); ++i ) 

If Foo takes at least a few thousand instructions to run, you can get parallel speedup by converting the loop to use parallel_do. To do so, define an object with a const qualified operator(). This is similar to a C++ function object from the C++ standard header <functional>, except that operator() must be const.

class ApplyFoo {
    void operator()( Item& item ) const {

The parallel form of SerialApplyFooToList is as follows:

void ParallelApplyFooToList( const std::list<Item>& list ) {
    parallel_do( list.begin(), list.end(), ApplyFoo() ); 

An invocation of parallel_do never causes two threads to act on an input iterator concurrently. Thus typical definitions of input iterators for sequential programs work correctly. This convenience makes parallel_do unscalable, because the fetching of work is serial. But in many situations, you still get useful speedup over doing things sequentially.

There are two ways that parallel_do can acquire work scalably.

  • The iterators can be random-access iterators.

  • The body argument to parallel_do, if it takes a second argument feeder of type parallel_do<Item>&, can add more work by calling feeder.add(item). For example, suppose processing a node in a tree is a prerequisite to processing its descendants. With parallel_do, after processing a node, you could use feeder.add to add the descendant nodes. The instance of parallel_do does not terminate until all items have been processed.

Code Sample

The directory examples/parallel_do/parallel_preorder contains a small application that uses parallel_do to perform parallel preorder traversal of an acyclic directed graph. The example shows how parallel_do_feeder can be used to add more work.

For more complete information about compiler optimizations, see our Optimization Notice.