I had assumed that grainsize referred to the smallest chunk of work into which a range would ever be divided. The TBB book says "The grainsize effectively sets a minimum threshold for parallelization" (p36). This would make it similar to the chunk size in OpenMP guided scheduling. However looking at the code, it actually subdivides as long as the current size exceeds the grainsize. This means you can end up with a minimum range almost half of the grainsize specified.

I understand the grainsize does not need to be set exactly, and we normally end up picking a value by experimentation anyway, but the current behaviour is a little unexpected. Not asking for any change, just maybe the behaviour could be clarified a little in documentation.

And a minor detail - if the size is exactly equal to the grainsize then
a parallel sort will divide the range but a parallel reduce or any operation using a blocked range
will not.

2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi, Martin.

It appears that the TBB reference manual gets it right (this from page 10 of my copy):

A blocked_range is splittable into two subranges if the size of the range exceeds grain size.

And in that vein, the TBB book quote is not really wrong however misleading it might seem. Setting grain size does effectively set a minimum threshold. It's just not equal tothe value of the smallest chunk, but roughly twice that size as you point out.

And yes, parallel_sort does compare greater-or-equal in its is_divisible function. However, that's probably not a variance that anyone would notice since the grain size in the parallel_sort is not currently a user-settable parameter. It's hard wired to 500 in the quick_sort_range class.

Leave a Comment

Please sign in to add a comment. Not a member? Join today