Question about loop parallelization

Question about loop parallelization

I am trying to parallelize a section in my application which comprises of 2 consecutive for-loops. These loops are independent with each other, so, ideally, I would like to follow a 2-level parallelization approach: spawn each loop as a different task at the higher level, and then parallelize each one with tbb::parallel_for.Would it be possible to compose in such a way different TBB parallel constructs, i.e. use a parallel_for inside a task or task_group? And if yes, would it work well, i.e., as if the 2 loops were handled as a single, twice as large parallel for loop? My intuition says that, once the 2 loops are scheduled simultaneously onto some worker queue and their recursive splitting starts, the execution from that point on would not be much different from the case of a unified, larger loop.

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Recursive parallelism is exploited extensively by TBB, and you are encouraged to use it yourself, for example by nesting parallel_for inside parallel_invoke. It doesn't really matter whether the tasks are homogeneous, as under the hood inside a single parallel_for loop, or heterogeneous, as between the different levels of the nesting you propose. Just don't exaggerate: nesting parallel_for inside parallel_for is likely to perform less well than a serial loop inside parallel_for, because of parallel overhead.

Thanks Raf! Nesting parallel_for inside parallel_invoke seems to do the job. And it does not incur additional overhead at all (i.e., performs exactly as a single parallel loop working over the unified range of both loops). I agree that nesting parallel_for's in a more aggressive manner may worsen things.

Leave a Comment

Please sign in to add a comment. Not a member? Join today