I evaluated my cilk application using "taskset -c 0-(x-1) MYPROGRAM) to analyze scaling behavior.
I was very suprised to see, that the performances increases up to a number of cores but decreases afterwards.
for 2 Cores, I gain a speedup of 1,85. for 4, I gain 3.15. for 8 4.34 - but with 12 cores the performance drops down
to a speedup close to the speedup gained by 2 cores (1.99).
16 cores performe slightly better (2.11)
How is such an behaviour possible? either an idle thread can steal work or it cant?! - or may the working packets be too coarse grained and the stealing overhead destroys the performance with too many cores in use?!