parallel_do Template Function

Summary

Template function that processes work items in parallel.

Header

 #include "tbb/parallel_do.h"

Syntax

template<typename InputIterator, typename Body> 
void parallel_do( InputIterator first, InputIterator last,
                 Body body[, task_group_context& group] );

template<typename Container, typename Body>
void parallel_do( Container c, Body body[, task_group_context& group] );
                

Description

The parallel_do template has two forms.

The sequence form parallel_do(first,last,body) applies a function object body over a sequence [first,last). Items may be processed in parallel. Additional work items can be added by body if it has a second argument of type parallel_do_feeder. The function terminates when body(x) returns for all items x that were in the input sequence or added to it by method parallel_do_feeder::add.

The container form parallel_do(c,body) is equivalent to parallel_do(std::begin(c),std::end(c),body).

The requirements for input iterators are specified in Section 24.1 of the ISO C++ standard. The table below shows the requirements on type Body.

parallel_do Requirements for Body B and its Argument Type T

Pseudo-Signature

Semantics

B::operator()(
cv-qualifiers T& item,
 parallel_do_feeder<T>& feeder
 ) const
OR
 B::operator()(cv-qualifiers T&
item ) const
                                  

Process item. Template parallel_do may concurrently invoke operator() for the same this but different item.

The signature with feeder permits additional work items to be added.

Caution

Defining both the one-argument and two-argument forms of operator() is not permitted.

T( const T& )

Copy a work item.

~T::T()

Destroy a work item.

For example, a unary function object, as defined in Section 20.3 of the C++ standard, models the requirements for B.

Tip

For good performance, the grainsize of B::operator() should be on the order of ~100,000 clock cycles. If it is less, overhead of parallel_do may outweigh performance benefits.

The parallelism in parallel_do is not scalable if all of the items come from an input stream that does not have random access. To achieve scaling, do one of the following:

  • Use random access iterators to specify the input stream. Also, consider using parallel_for in this case.

  • Design your algorithm such that the body often adds more than one piece of work.

The algorithm can be passed a task_group_context object so that its tasks are executed in this group. By default the algorithm is executed in a bound group of its own.

Example

The following code sketches a body with the two-argument form of operator().

struct MyBody {
    void operator()(item_t item, 
                    parallel_do_feeder<item_t>& feeder ) {
        for each new piece of work implied by item do {
            item_t new_item = initializer;
            feeder.add(new_item);
        }
    } 
};
                
For more complete information about compiler optimizations, see our Optimization Notice.