I am currently tuning an application with VTune. What exactly does it mean if a program spends considerable time in "TBB Dispatch Loop"? I assume this is overhead introduced by the scheduler spawning tasks?
*wait_for_all methods are replaced in VTune by TBB Dispatch Loop.
I doubt it is the top of the stack for a hot-spot you see. There should be either a parallel algorithm (with your functor) or receive_or_steal_task(). If you want more advice, please post here a snapshot of the stack where you see it.