parallel_do Template Function


Template function that processes work items in parallel.


 #include "tbb/parallel_do.h"


template<typename InputIterator, typename Body> 
void parallel_do( InputIterator first, InputIterator last,
                 Body body[, task_group_context& group] );

template<typename Container, typename Body>
void parallel_do( Container c, Body body[, task_group_context& group] );


The parallel_do template has two forms.

The sequence form parallel_do(first,last,body) applies a function object body over a sequence [first,last). Items may be processed in parallel. Additional work items can be added by body if it has a second argument of type parallel_do_feeder. The function terminates when body(x) returns for all items x that were in the input sequence or added to it by method parallel_do_feeder::add.

The container form parallel_do(c,body) is equivalent to parallel_do(std::begin(c),std::end(c),body).

The requirements for input iterators are specified in Chapter 24 of the ISO C++ standard. The table below shows the requirements on type Body.

parallel_do Requirements for Body B and its Argument Type T



   cv-qualifiers T& item,
   parallel_do_feeder<T>& feeder
 ) const
   cv-qualifiers T& item,
 ) const

Process an item. parallel_do may concurrently invoke operator() for the same body object but different item.

The signature with feeder permits additional work items to be added.


Defining both the one-argument and two-argument forms of operator() is not permitted.

T( const T& )

Copy a work item.

T( T&& )

Supported since C++11; optional. Move a work item.


Destroy a work item.


For good performance, execution of B::operator() should take at least ~100,000 clock cycles. If it is less, overhead of parallel_do may outweigh performance benefits.

The parallelism in parallel_do is not scalable if all of the items come from an input stream that does not have random access. To achieve scaling, do one of the following:

  • Use random access iterators to specify the input stream. Also, consider using parallel_for in this case.

  • Design your algorithm such that the body often adds more than one piece of work.

The algorithm can be passed a task_group_context object so that its tasks are executed in this group. By default the algorithm is executed in a bound group of its own.


The following code sketches a body with the two-argument form of operator().

struct MyBody {
    void operator()(item_t item, 
                    parallel_do_feeder<item_t>& feeder ) {
        for each new piece of work implied by item do {
            item_t new_item = initializer;
For more complete information about compiler optimizations, see our Optimization Notice.