I want to implement a multi-threaded version of a branch-and-bound search. The search space is a 16 level 128-ary tree.
So far, the intermediate result is stored in an array. this does not contain only the 16 levels, but also other entries, which are deduced from those values and needed to check the constrains for the bound condition.
Cilk Plus using work stealing seems well suited to implement this in a convenient way, especially as the framework itself decides whether to split a task or not.
The problem is, that every thread needs its own copy of the array, as it is part of the current state. Not knowing where this split might happen, I assume I have to copy the data for every working packet I use.
In an single-threaded implementation with recursive calls and copying the array for every single call (a DFS on the search tree) the performance drops dramatically compared to a single-threaded do-try-undo-approach which uses only one array.
Is there anything I can do about this using Cilk+ besides adjusting the size of working packets? (like testing 10 or 20 values in one recursive call instead of just 1)