To experiment with Threading Building Blocks task stealing, I further modified the sub_string_finder_extended.cpp program that I've used for my recent
parallel_for testing. I had previously specified a grain size of 2000, which resulted in TBB applying an effective grain size of about 1107 and dividing the
parallel_for loop into 16 subtasks. Running on my AMD dual-core system, each core processed 8 of the subtasks.
Assume you have N tasks to complete, all identically structured in terms of input and output data. If your data is stored in arrays, the simplest programming structure is to perform the calculations in a loop that iterates through the input arrays and writes results into the corresponding output array locations. A multithreaded program that performs these tasks could divide the tasks into equal groups and assign each group to a single thread.
Is programming for parallelism necessary with multi-core processors?
- Yes: 1082 votes (81.2%)
- No: 136 votes (10.2%)
- Probably but I do not have time to think about it: 115 votes (8.6%)
I closed my "Grain Size Experiments" post with some thoughts about "a little mystery" -- the fact that when I set my
parallel_for grain size to any value above 50% but below 100% of my total range, the grain size that is actually used is one that evenly divides the work between the two processors (working on my dual-core Gentoo system).