The task scheduler works most efficiently for fork-join parallelism with lots of forks, so that the task-stealing can cause sufficient breadth-first behavior to occupy threads, which then conduct themselves in a depth-first manner until they need to steal more work.
The task scheduler is not the simplest possible scheduler because it is designed for speed. If you need to use it directly, it may be best to hide it behind a higher-level interface, as the templates parallel_for, parallel_reduce, etc. do. Some of the details to remember are:
Always use new(allocation_method) T to allocate a task, where allocation_method is one of the allocation methods of class task. Do not create local or file-scope instances of a task.
All siblings should be allocated before any start running, unless you are using allocate_additional_child_of.
Exploit continuation passing, scheduler bypass, and task recycling to squeeze out maximum performance.
If a task completes, and was not marked for re-execution, it is automatically destroyed. Also, its successor’s reference count is decremented, and if it hits zero, the successor is automatically spawned.