Running your code in parallel with tbb::parallel_invoke

By Alexey Murashov, Published: 04/02/2009, Last Updated: 04/02/2009

Greetings everyone! I would like to introduce you to a new template function recently added to TBB – tbb::parallel_invoke. It provides TBB users a simple way to run several functions in parallel. So, for example, if you have three functions that do some work and you would like to run them simultaneously, you may write the following TBB code (I skipped some things like scheduler initialization):

void Function1();
void Function2();
void Function3();

void RunFunctions() {
tbb::parallel_invoke(Function1, Function2, Function3);

Looks simple, doesn’t it :-)? You do not have to define any specific classes or write extra code to use parallel_invoke. It is possible to pass function pointers or functor objects to the template function using the same syntax:

void (*FuncPtr1)(void), (*FuncPtr2)(void);
void RunFuncPtrs
    tbb::parallel_invoke(FuncPtr1, FuncPtr2);
class FunctorClass {
    void operator() () const {}
} Functor1, Functor2;
void RunFunctors
    tbb::parallel_invoke(Functor1, Functor2);

It also supports lambda functions available in C++0x:

    []() { std::cout << "Hello!"; },
    []() { std::cout << "Greetings!"; }

Up to ten functions can be run by parallel_invoke:

tbb::parallel_invoke(Func1, Func2, Func3, Func4, Func5, Func6, Func7, Func8, Func9, Func10);

Obviously, you could write your own code to run the functions in parallel, but when you use parallel_invoke you get all usual benefits from TBB. Since parallel_invoke uses a task-based approach, the code will run on any platform and on different numbers of cores.

However in order to be run by parallel_invoke, the functions should have no arguments and no return value. The second restriction is not strict – actually you can pass a non-void function, but the return value will be ignored, so doing this is not a good design.

tbb::parallel_invoke also includes exception handling and cancellation support. It behaves like other TBB template algorithms:

    tbb::parallel_invoke (Function1, Function2, Function3)
}catch (tbb::captured_exception &exc) {
    // Processing exc

 And now a little bit about implementation details. As I mentioned above, TBB tasks are used, so each user-defined function is run by a separate task. The tasks form a tree, each leaf runs up to three user functions. For example, a five functions version looks like this (each box represents a task):


Note each sub-root task runs a user-defined function in its body to optimize the number of tasks. The most complicated case with ten user functions looks like:


The tasks aren’t blocked at the inner level. Sub-root tasks use continuation-passing style to prevent it; wait_for_all is called only at the top level.

Well, it seems at this point I have nothing more to say about tbb::parallel_invoke. But it’s only because it is really a simple and useful construct! Have a nice day :-)

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804