Generic Parallel Algorithms for Intel ® TBB - "They're Already in There" Part 1

In the last blog, I gave a crash course on how to create an Intel® parallel_for with lambda expressions, which is another way of letting the compiler do the tedious work of creating a function object. I’d like to give a few high-level descriptions of some Generic Parallel Algorithms included in Intel® TBB to let you know what’s possible. In teaching new ways to parallel program, sometimes it’s good to first understand what you could do in a given situation rather than diving into detailed steps on how to do it exactly right. First of all, you’ll probably forget all those detailed steps by the next morning, but second of all, you’ll remember that you heard “something like such and such was in the API already.” Then you can look into the User’s Guide, Reference Manual, and/or Tutorial and probably save yourself a half day of work by not reinventing the wheel.


Reductions are huge with parallel programming and can be considered a ‘for’ loop style computation. A reduction can be considered a functional – it takes as input a collection of elements and distills those values down to one or a few values. You can either use a pre-prepared reduction which can be found in any of the three Intel® PBB models, or in many cases, create your own custom reduction. It’s kind of like defining your own vector space in linear algebra – you declare the rules, give your rules the input, and out should come your result. A good example of this can be found right in the TBB tutorial.


The parallel_do within Intel® TBB processes different “work items” in parallel. Let me explain further. The inputs are (first, last, body), with body being a function object. Its invocation applies body over the half-open interval [first,last). In this fashion, there is freedom for the items (see code below) to be processed in parallel. You can add more work items with parallel_do_feeder. The function terminates when body(x) returns for all items x that were in the input sequence or added to it by method parallel_do_feeder::add().

This code implements a body with the two-argument form of operator().
struct MyBody {
void operator()(item_t item,
parallel_do_feeder& feeder ) {
for each new piece of work implied by item do {
item_t new_item = initializer;

If you’ve ever used for_each from the Standard Template Library, here is a nice parallel version you can use. It’s basically the same thing as parallel_do, but doesn’t allow the feeder. Calling parallel_for_each(first,last,f) applies f to the result of dereferencing every iterator in the range [first,last), and do this in parallel if possible.

I’ll talk about parallel_invoke, parallel_pipeline (one of my favorites, definitely not easy to parallelize yourself without using this API call), parallel_sort, and parallel_scan in the next blog.

Para obter informações mais completas sobre otimizações do compilador, consulte nosso aviso de otimização.