I have a vector of objects on which I need to run a function in parallel. Each task involves a recursive octree traversal, and I'm curious if there is a better way to organize my TBB usage to get more optimum depth-first parallel traversal.
Right now each tree node uses a blocked range parallel_for to start the recursion into each tree. Each of the function calls that kick off the tree-traversals are also called using a blocked range parallel_for.
Running everything with parallel_for appears to complete a software-thread-ish number of the tree traversals at around the same time! It is more desirable for me to execute fewer of the octree traversals faster instead of a number of them at the same time.
Please correct me if I'm wrong, but this is my current thinking!
Will I get the behavior I want if I create each of the recursive function calls as TBB tasks instead of using parallel_for's? At the top of the tree I would spawn() a task for each of my octrees to kickstart the parallel recursive traversals. Inside of the recursive functions I would spawn_and_wait() because of tree dependencies.
Would this create more of a depth-first behavior, or what other design pattern must I use?