In order to judge how many clockticks a task should have at minumum so that it is worthwhile to run it in parallel I'm running a simple experiment:
Run a loop doing a lengthy calculation sequentially (and measure the time using a hp timer).
Then I run the same loop devided into two tbb tasks (there is no memory interaction between the tasks).
Of course, I repeat this many times to get some statistics.
Eventually, I substract the (average) parallel run time from the (average) sequential run time (= overhead).
Finally, I run the same experiment with many different loop lengths.
For very small tasks (<3k clock cycles on my nahelem) the sequential execution is slightly faster (as to be exspected).
Actually, the parallel run time is very good for small tasks (compared to my own task scheduler), aparantly tbb measures the task size and does not run the task in parallel if it is too small. For larger task sizes, the parallel execution is faster than sequential and for task sizes > 200k cycles the speed up is about 1.9.
I would have exspected the overhead to be pretty much independent of the actual task size (at least beyond a specific size where you see scheduler noise mostly).
As for my question:
I measure, that the average spawn/join overhead to be linearly dependent on the loop size up until a task size of about 300k clocks. Afterwards the overhead is a constant (+ some fluctuation).
How does tbb decide, wether to run a spawned task in parallel or not and might this mechanism cause the linear dependency of the overhead on the loop size? (I guess tbb might run the task sequentially the first time it is spawned and measure the task size (execution time) to decide this?)