Just wondering academically: is it more efficient to use an atomic (relaxed, of course), task cancellation, or both together, to stop a simple parallel_for loop?
With an atomic, cancellation in the loop Bodies can be nearly immediate (unless the code still wants to load the atomic only once every x iterations), but parallel_for will start to execute a Body for each chunk.
With task cancellation, the parallel_for is stopped before spawning more tasks, but cancellation propagation takes longer than coherent-cache chatter, and has to run its course once started.
Assume that the parallel_for is simple (not a lot of code to touch for an atomic, no recursive parallelism), and has coarse granularity and/or uses auto_partitioner (not that many chunks). Think quick_sort_pretest_body in include/tbb/parallel_sort.h, or similar with a lambda (even less coding for that atomic).
My intuition says that there's little benefit from task cancellation in such a situation, if any... or worse.