I have a simple parallel_for loop that basically runs on the blocked_range<0, TASKS, 1), or in other words, creates exactly TASKS tasks. To measure scalability, I tried varying the number of tasks from 2 to 8 on an 8-core machine (2x4). I always initialize without specifying the number of threads, so I assume 8 are created.
While running this experiment, I noticed using top(3) that pretty much all the CPUs are 100% or nearly so. This is true regradless of the value of TASKS, although I verified (with printfs) that the correct number of tasks is indeed created.
Anybody else observed this behavior and/or can explain what the other cores are so busy doing?