Easier Intel® TBB parallel_for with C++0x Lambda Expressions

In the last blog, I explained how to “build” a parallelized for out of templatized components. Today I’m going to show you an easier way to implement the Intel® Threading Building Blocks (Intel® TBB) parallel_for.

The Final Draft International Standard (FDIS) for C++11, also known as C++0x came out in March of the year. (It is interesting to note that we actually have someone on the Array Building Blocks team that is on the C++ Standards Committee). The main addition to this standard was lambda expressions. These are supported in the following compilers:

- GNU g++ Compiler V4.5+
- Intel® C++ Compiler V12
- Microsoft* C++ Compiler V16 (2010)

This different style of Intel® TBB parallel_for uses these lambda expressions, you need to use one of these updated compiler versions for it to work.

With C++0x lambda expression support, the parallel_for I used in the previous blog changes to:

#include "tbb/blocked_range.h"
#include "tbb/parallel_for.h“
using namespace tbb;
void ChangeArrayParallel (int* a, int n )
{
parallel_for (0, n, 1,
[=](int i) {
Foo (a[i]);
});
}
int main (){
int A[N];
// initialize array here…
ChangeArrayParallel (A, N);
return 0;
}


What’s happening here?

First of all the parallel_for is now overloaded. It takes a start, stop, and step argument and now the whole blocked_range construct is created behind the scenes. So you are definitely losing some control that you would have had with the other way. I’d say most people are fine with giving up blocked_range and not having to worry about it.

Next, the (int i) allows you to capture variables by value from the surrounding scope. In this fashion, you can then iterate through all the individual values of a[i] when calling Foo. Finally, the lambda expressions implement the operator right inside the call to parallel_for. All in all, it’s a much easier way to call parallel_for.

Note that all of the different variants of parallel_for fall within what the Intel® TBB User’s Guide refers to as Generic Parallel Algorithms. These are different than Structured Parallel Patterns that Intel® Array Building Blocks uses. The Generic Parallel Algorithms answer the question of what type of programming construct to use to implement a parallel algorithm rather than what type of parallel algorithm it actually is.

Generic Parallel Algorithms
parallel_for(range)
parallel_reduce
parallel_for_each(begin, end)
parallel_do
parallel_invoke
pipeline, parallel_pipeline
parallel_sort
parallel_scan

All of these fall within a “for loop style” of computation but accomplish different objectives. My next entry will go into these in a small amount of detail. The threadingbuildingblocks.org website goes into them far more detail.

For more complete information about compiler optimizations, see our Optimization Notice.
Categories: