Basic Questions about TBB

Basic Questions about TBB

I have many questions about TBB, but I'll start with a really basic one

When talking tasking, how small is too small? If I am particularly interested in TBB because I deal with tree structures and the recursive/nested nature of the threading built into TBB maps really well to trees. I have tree's of say 150,000 elements or so. But if I'm recursing through my entire tree and doing something really simple on each element (e.g. setting a bool) and I want to make use of tasks, does TBB agglomeratefor me or would I still need to do something myself? It's not clear to me whether TBB manages that or whether I have to myself (i.e. can TBB see that certain tasks are small and just clump them together into one task?). 

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I see you have discovered the Intel Threading Building Blocks "Design Patterns" document, and in particular the section regarding agglomeration.  As the tree example there points out, the usual means for agglomeration within a tree does require some assistance in the user code.  The typical method, as described there, is to track tree depth within the recursive processing and and at some depth threshold stop subdividing the work (e.g., switch from parallel_invoke(process_subtree(left_node),process_subtree(right_node)) to just process_subtree(left_node); process_subtree(right_node)), that is, switch from subdividing to serializing.  Whether a fixed-depth threshold will work or you'll need to go to an adaptive threshold will depend on the shape and balance of the trees that need to be processed.  Intel TBB knows nothing about task size and has no inherent way to agglomerate, but that also means the scheduler has lower overhead.  It's better to leave code so embued with policy in the user code, and keep the library code down to just mechanism.

Ok, thanks for the clarification. I'll implement an adaptive threshold. It's examples on the website like this which throw me:

// serial loop
for (int i = 0; i < 10000; ++i) a[i] = f(i) + g(i);
// parallel loop
tbb::parallel_for( 0, 10000, [&](int i) { a[i] = f(i) + g(i); } );

Just one more question - I'm currently comparing technologies, we currently use the Qt toolset which comes with a QtConcurrent library providing map reduce and other high level multithreading API's. Can you point me to an article on how tbb tasking differs from pooled threads + an API to launch asynchronous tasks ala QtConcurrent::run

I read the article on Using Tasks Instead of Threads already - cache locality due to work stealing is a positive. The QtConcurrent algorithm just uses the next available thread to run the work as opposed to the deque for each thread as per tbb. Are there any other benefits?

Leave a Comment

Please sign in to add a comment. Not a member? Join today