I am trying to interface intel tbb with a distributed parallel runtime library. I am trying to do the simplest thing, which is to bypass our current task queue and spawn all of the tasks using intel tbb. The task queues are similar, but I think we will see significant thread scaling performance with intel tbb. The problem is that our current task queue has support for a task queue fence, but I do not see the same support in intel tbb. Our task queue fence keeps the main thread from continuing with the main program, and allows it to perform computational tasks. I was wondering if there is a way to implement something like this with intel tbb. I am fairly new to intel tbb, so I will appreciate any advice.
For more complete information about compiler optimizations, see our Optimization Notice.