I'm new to Threading Building Blocks and need a design pattern to parallelize an existing divide and conquer loop. Superficially, it works like this:
foreach element in container if element can be pruned continue else if element is divisible divide element into 4 smaller elements else conquer element
So far, no problem for TBB. Just use a parallel_for over a std::vector and a tbb::parallel_invoke (or something) for the subdivision.
Now for the twist. Note that conquer is an expensive step (in terms of CPU and memory) and pruning elements is a huge win. Whether an element can be pruned or not depends upon the other elements that have already been conquered. We can increase the chances of pruning by processing elements in a sorted order. This is one of those algorithmic optimizations that outweighs the benefits of parallelization.
My first hunch was parallel_sort followed by parallel_for. This is exactly wrong because the range will be recursively split so that the last thread is working on the lowest priority end of the vector.
My other notion was a parallel_do over a concurrent_priority_queue, with the subdivide step adding smaller elements back into the queue. However, parallel_do requires begin and end iterators. concurrent_priority_queue only provides try_pop.
How do you process elements in a prioritized order with TBB? I understand that only a serial algorithm will guarantee maximum pruning by conquering the first element before even considering the second, but there must be a way to focus n threads on the n highest priority elements.