tbb scheduler idle

tbb scheduler idle


i use tbb::pipeline to run stats procesing in parallel

code looks like:

pipeline.add_filter(st1); // pick task day to process

pipeline.add_filter(st2); // actual processing, parallel

pipeline.add_filter(st3); // reduce, seq processing


st1 - stage very fast

st2 - really slow, it can take about 1 sec to run on my hardware

st3 - 0.2...0.3 seconds to run

i add debug points to every filter like:






to collect timestamps when task executing

in test run was 3 days to process

max 4 tokens to run in pipeline

pentium4 D HT (2 threads)

linux 2.6.18 kernel, debian etch

stats shows:



cals - my 2nd stage

and reduce - 3rd

value 0.2 means stage performed by 2 threads, 0.1 - only one thread.

so i wounder why tbb does not run reduce for 2nd day immediatly after reduce for 1st day is done

it waits until all filters done, and only then picks task to execute

it seems strange to me

is exists any possibility to run stages as soon as previous stage done to maximize CPU usage ?

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I would guess that at the second stage day 3 was processed before day 2. But the third stage, which is ordered, can't start "reducing" day 3 before day 2. I think you could easily check this guess extending the information collected at debug points with some data-specific info (e.g. the number of the day being processed).

As far as I understand, hyperthreads aren't quite "even"; one of the threads only has chances to execute when some processor units aren't used by the other one. So I do not wonder if in your case the main thread started to process day 1, the TBB worker thread took day 2 but made slow progress (due to HT), then the main thread completed day 1, took day 3, and having kind of priority on the processor resources completed day 3 before the worker thread finished day 2.

If that's the case, I wonder if adding a pause/yield point right before taking a new token from the pipeline would help the second thread take priority on processor resources and complete its job earlier.

yes, logging shows that "3rd-day" can really pass stage2 before "2nd-day".


I believe that adding yield or pauseoperations to the main thread won't help. OS does not discern logical CPUs in HT systems (at least it was so some time ago). Therefore when the main thread relinquishes its time quantum, the system will see that another thread is already working, and so it will resume the main thread. During all this time the processing will be happening in the same (main) pipeline of the CPU and so the second thread will remain in the secondary (low priority) CPU pipeline.

I think the problem could be solved by increasing the maximal number of tokens in flight. E.g. if the hyper-thread works at 15% of the main one speed, than 7 or 8 tokens will assure the acceptable balance. You couldplay with number of tokens in the range 6-15 and find the value resulting in the maximal throughput.


-Andrey Marochko

yes, sched_yield() in stage1 doesn't help

on real 4-way xeon box tasks mostly processed as planed.

(really "1-st day" have a little more data than follows)

Leave a Comment

Please sign in to add a comment. Not a member? Join today