we have an issue with the benchmark service:
in most benchmark reports our code was apparently run on only one core, even though it is parallelized (and runs so on our own workstations).
We are using tbb and parallel_for with a grainsize that divides the input range into exactly #workerThreads many subranges. All reports come back with 99% CPU usage. (And it's the same when we increase the thread:task ratio to 1:50.)
However, I submitted the same code this morning, too, and it came back with 1000% CPU usage.
Are multiple benchmarks run at the same time? And if so, why are we losing out against the others?