Running your code in parallel with tbb::parallel_invoke

Greetings everyone! I would like to introduce you to a new template function recently added to TBB – tbb::parallel_invoke. It provides TBB users a simple way to run several functions in parallel. So, for example, if you have three functions that do some work and you would like to run them simultaneously, you may write the following TBB code (I skipped some things like scheduler initialization):

void Function1();
void Function2();
void Function3();

void RunFunctions() {
tbb::parallel_invoke(Function1, Function2, Function3);

Looks simple, doesn’t it :-)? You do not have to define any specific classes or write extra code to use parallel_invoke. It is possible to pass function pointers or functor objects to the template function using the same syntax:

void (*FuncPtr1)(void), (*FuncPtr2)(void);
void RunFuncPtrs
    tbb::parallel_invoke(FuncPtr1, FuncPtr2);
class FunctorClass {
    void operator() () const {}
} Functor1, Functor2;
void RunFunctors
    tbb::parallel_invoke(Functor1, Functor2);

It also supports lambda functions available in C++0x:

    []() { std::cout << "Hello!"; },
    []() { std::cout << "Greetings!"; }

Up to ten functions can be run by parallel_invoke:

tbb::parallel_invoke(Func1, Func2, Func3, Func4, Func5, Func6, Func7, Func8, Func9, Func10);

Obviously, you could write your own code to run the functions in parallel, but when you use parallel_invoke you get all usual benefits from TBB. Since parallel_invoke uses a task-based approach, the code will run on any platform and on different numbers of cores.

However in order to be run by parallel_invoke, the functions should have no arguments and no return value. The second restriction is not strict – actually you can pass a non-void function, but the return value will be ignored, so doing this is not a good design.

tbb::parallel_invoke also includes exception handling and cancellation support. It behaves like other TBB template algorithms:

    tbb::parallel_invoke (Function1, Function2, Function3)
}catch (tbb::captured_exception &exc) {
    // Processing exc

 And now a little bit about implementation details. As I mentioned above, TBB tasks are used, so each user-defined function is run by a separate task. The tasks form a tree, each leaf runs up to three user functions. For example, a five functions version looks like this (each box represents a task):


Note each sub-root task runs a user-defined function in its body to optimize the number of tasks. The most complicated case with ten user functions looks like:


The tasks aren’t blocked at the inner level. Sub-root tasks use continuation-passing style to prevent it; wait_for_all is called only at the top level.

Well, it seems at this point I have nothing more to say about tbb::parallel_invoke. But it’s only because it is really a simple and useful construct! Have a nice day :-)

For more complete information about compiler optimizations, see our Optimization Notice.



i tried to implement a quick sort using parallel_invoke in the way it is done exemplarily in the design patterns manual of tbb (section 7). however, the program does'nt scale about the size of the sorted array. is this because of the blocking nature of parallel_invoke? my speculation is that the number of blocked functions blows up the stack size, however i can't determine that the stack size is exceeded. (i'm using ubuntu and gcc: ulimit -s prints a maximum stack size of 8192 kByte) Can you explain the reason? in the design patterns manual i noticed the remark: "If ultimate efficiency and scalability is important, use tbb::task and continuation passing style." i implemented the quick sort with task_groups where each task is spawned as a member of one single group, since a join of subtasks is not necessary. this scales perfectly and it seems like a more comfortable solution than using tbb::task.
thanks, sebastian

hi, I'Ve got an question. I can't get the parallel_invoke working with lambda expressions. (i tried it as it is shown in this blog but it seem that I am doing anything wrong). Any Tipps or examples?


Great point, I totally forgot about boost::bind and the exponential arity burst :)

MAD\akukanov's picture

Adding function arguments would cause an exponential burst of the number of function overloads we would have to provide. Assume we would support passing just up to 2 arguments; it would add two more overloads for each of "up to 10" functions even if all arguments are the same. And then someone would ask for more :)

As far as I know, argument binding is the common practice to solve this kind of problem; C++98 provides limited means for that, and Boost.Bind is there to suite more sophisticated needs.

So this is kind of a TBB equivalent to OMP's "section" functionality. I remember when I began migrating from OMP to TBB, I missed "section" quite a bit, but finally re-factored it out. Anyways as functionality its quite nice to have it again, as it makes some patterns cleaner to implement. But it would be nice to be able to be able to pass in the same arguments to all calls. Of course it can be done with functors, but not with simple function pointers.

MAD\akukanov's picture

With lambda function support, launching other TBB algorithms should be easy I think; in the above lambda example, just replace string output with calls to parallel_for (and may be use [&] instead of []). Without it, well, one have to do the same manually - write function objects capturing enough context to start parallel_for, and feed those to parallel_invoke.

I tried to come up with some trick to make this work, especially with the existing TBB algorithm templates such as parallel_for and parallel_reduce. I notice that your solution also doesn't really handle that case, since those functions take arguments. I have not yet looked at the code, but perhaps the algorithm templates could be adjusted to be classes instead, so that they could be passed directly to parallel_invoke ?

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.