English | 中文 | Русский | Français
2,590 Posts served
8,335 Conversations started
To get an idea why I expect lambdas will be very popular with Intel TBB - we can look at the "with" and "without" syntax.
The "old" syntax before lambdas were available involved coding your "work to be done in parallel" into an operation() within a class. It was the toughest thing to teach about using Intel TBB - and it is THE reason C programmers complained about "C++ syntax" being needed with Intel TBB.
parallel_for(range, body, optional partitioner) w/out lambdas
#include "tbb/tbb.h"
using namespace tbb;
class ApplyFoo {
float *const my_a;
public:
void operator()( const blocked_range<size_t>& r ) const {
float *a = my_a;
for( size_t i=r.begin(); i!=r.end(); ++i )
Foo(a[i]);
}
ApplyFoo( float a[] ) :
my_a(a)
{}
};
void ParallelApplyFoo( float a[], size_t n ) {
parallel_for(blocked_range<size_t>(0,n), ApplyFoo(a));
}
Writing the same program, but using lambdas - is actually quite readable:
parallel_for(first, last, step, function) with lambdas
#include "tbb/tbb.h"
using namespace tbb;
void ParallelApplyFoo( float* a, size_t n ) {
parallel_for( blocked_range<size_t>(0,n),
[=](const blocked_range<size_t>& r) {
for(size_t i=r.begin(); i!=r.end(); ++i)
Foo(a[i]);
}
);
}
Starting with Intel TBB 2.2, there is another form of parallel_for allowed too - and seems to be considered a little easier to read by some:
parallel_for(first, last, step, function) with lambdas
#include "tbb/tbb.h"
using namespace tbb;
void ParallelApplyFoo(float a[], size_t n) {
parallel_for(size_t(0), n, size_t(1) , [=](size_t i) {Foo(a[i]);});
}
Final note - ridding yourself of compiler warnings.
There is a compiler warning that the above will produce currently with the Intel compiler - which you can just ignore, or it can be eliminated using:
//The pragma turns off warnings from the compiler about "use of a local type to declare a function".
#pragma warning( disable: 588)
| August 5, 2009 9:19 AM PDT
James Reinders (Intel)
|
No, lambda functions capture at the exact moment they are defined - in the context of the definition. That means the capture by reference grabs a pointer, and capture by value grabs the value at that instant. |
| August 5, 2009 9:26 AM PDT
James Reinders (Intel)
|
Here is an example of using lambdas that emphasizes that the capture is when the lambda is defined: template<typename F> void Eval( const F& f ) { int i = 77; f(); } void foo() { int i = 22; Eval( [=]{printf("Hello, Lambdas %dn",i); } ); } void bar() { int i = 99; auto f = [=]{printf("Hello, Lambdas %dn",i); }; f(); i = 88; { int i = 66; f(); } f(); } void bar2() { int i = 99; auto f = [&]{printf("Hello, Lambdas %dn",i); }; f(); i = 88; { int i = 66; f(); } f(); } void bar3() { int i = 99; auto f = [=]() mutable {printf("Hello, Lambdas %dn",i); }; f(); i = 88; { int i = 66; f(); } f(); } int _tmain(int argc, _TCHAR* argv[]) { foo(); bar(); bar2(); bar3(); return 0; } This prints values of 22, 99, 99, 99, 99, 88, 88, 99, 99, 99. The first 22 is printed inside Eval() ignoring the local value of 77 and sticking with the value captured in foo() of 22. Next, in bar(), the capture of 99 is done and holds while i is 99, 88, a new local i is 66, and back to 88 - we see 99, 99, 99 from the lambda. Next, in bar2(), we see the same code as bar() but capture by reference - here the change of variable i is tracked, but the precise i that was in scope. So we see 99, 88, 88 - note that the i=66 has no effect because it is not the i pointed to by the reference when the lambda is created. Finally, bar3() shows that the mutable keyword has nothing to do with whether a capture by value tracks te variable captured. It only affects whether the compiler allows changes inside the lambda body to the value. |
| September 7, 2009 8:08 PM PDT
sm345
|
The new parallel_for syntax( considered easier to read by some!) is in line with the microsoft syntax for their parallel for. That's good.. very good. But please don't get rid of the range concept in future releases. Its useful when I need to fine tune the partitioning. I assume either block or auto is the default partitioner with the new syntax...and its just syntactic sugar for the REAL classic parallel_for. |

Rafael