parallel_for is easier with lambdas, Intel® Threading Building Blocks

Lambdas are an exciting new addition to C++ in the current draft for C++ 0x. (see my prior post for "Hello Lambda" - my introduction to Lambdas). The Intel compilers support them now in the Intel compiler products, and Microsoft has support in their beta for Visual Studio 2010. I think we can expect to see support for lambdas added quickly, and a great deal of interest in using them in C++ code.Lambdas quite simply allow code to be specified inline in ways that find particularly useful for parallel programming constructs - notably for Intel® Threading Building Blocks (Intel® TBB).

Intel TBB supports forms of algorithms in the old style (without lambdas) and in the new style with lambdas.

To get an idea why I expect lambdas will be very popular with Intel TBB - we can look at the "with" and "without" syntax.

The "old" syntax before lambdas were available involved coding your "work to be done in parallel" into an operation() within a class. It was the toughest thing to teach about using Intel TBB - and it is THE reason C programmers complained about "C++ syntax" being needed with Intel TBB.

parallel_for(range, body, optional partitioner) w/out lambdas


#include "tbb/tbb.h"

using namespace tbb;

class ApplyFoo {

  float *const my_a;

  public:

    void operator()( const blocked_range<size_t>& r ) const {

      float *a = my_a;

      for( size_t i=r.begin(); i!=r.end(); ++i )

        Foo(a[i]);

    }

    ApplyFoo( float a[] ) :

      my_a(a) {}

  };

  void ParallelApplyFoo( float a[], size_t n ) {

  parallel_for(blocked_range<size_t>(0,n), ApplyFoo(a));

}

Writing the same program, but using lambdas - is actually quite readable:

parallel_for(first, last, step, function) with lambdas


#include "tbb/tbb.h"

using namespace tbb;

void ParallelApplyFoo( float* a, size_t n ) {

  parallel_for( blocked_range<size_t>(0,n),

    [=](const blocked_range<size_t>& r) {

      for(size_t i=r.begin(); i!=r.end(); ++i)

        Foo(a[i]);

      }

    );

}

Starting with Intel TBB 2.2, there is another form of parallel_for allowed too - and seems to be considered a little easier to read by some:

parallel_for(first, last, step, function) with lambdas


#include "tbb/tbb.h"

using namespace tbb;

void ParallelApplyFoo(float a[], size_t n) {

  parallel_for(size_t(0), n, size_t(1) , [=](size_t i) {Foo(a[i]);});

}

Final note - ridding yourself of compiler warnings.

There is a compiler warning that the above will produce currently with the Intel compiler - which you can just ignore, or it can be eliminated using:


//The pragma turns off warnings from the compiler about "use of a local type to declare a function".

#pragma warning( disable: 588)

For more complete information about compiler optimizations, see our Optimization Notice.