English | 中文 | Русский | Français
2,856 Posts served
8,606 Conversations started
My original Threading Building Blocks task-stealing demo was performed on my AMD Athlon-64 dual-core machine (running Gentoo Linux). Today I moved the application over to my Intel quad-core machine (running Windows).
As you may recall, in my experiment I modified the sub_string_finder_extended.cpp program in the example from the TBB Getting Started guide. For the runs on the dual-core machine, I changed the code so that the tasks originally assigned to one core would take 10 times longer to complete than the tasks originally assigned to the other core. This resulted in the core with the shorter task list completing all of its tasks, while the other core still had a lot of assigned work remaining. The testing showed that TBB reassigned tasks to the idle processing core, automatically adjusting in real time to the changing tasking status of the cores.
To make the task stealing trackable on the quad-core machine, I modified the SubStringFinder() class such that three of the cores will initially be assigned task lists that will complete quickly, while the fourth core will be assigned a task list that will take about 10 times as long to complete as the task lists assigned to the other three cores. Here's the revised code (with the new changes in bold):
Here's the modified code in the SubStringFinder() class:
class SubStringFinder {
const string str;
size_t *max_array;
size_t *pos_array;
public:
void operator() ( const blocked_range<size_t>& r ) const {
tick_count t0 = tick_count::now();
int n;
int n1;
if (r.begin() >= 13284) {
n1 = 40;
} else {
n1 = 4;
}
for (n=0; n<n1; n++) {
for ( size_t i = r.begin(); i != r.end(); ++i ) {
size_t max_size = 0, max_pos = 0;
for (size_t j = 0; j < str.size(); ++j)
if (j != i) {
size_t limit = str.size()-( i > j ? i : j );
for (size_t k = 0; k < limit; ++k) {
if (str[i + k] != str[j + k]) break;
if (k > max_size) {
max_size = k;
max_pos = j;
}
}
}
max_array[i] = max_size;
pos_array[i] = max_pos;
}
}
tick_count t1 = tick_count::now();
time_t tNow = time(NULL);
cout << r.begin() << " - " << r.end() <<
" tNow " << tNow <<
" dT " << (t1 - t0).seconds() << endl;
}
SubStringFinder(string &s, size_t *m, size_t *p) :
str(s), max_array(m), pos_array(p) { }
};
In other words, all I had to do was to assign the last 25% of the parallel_for index range (which includes 17711 elements) to the larger tasks.
Here are the results (ordered by the time when each task completed):
| Index Range | Start Time (sec) |
Finish Time (sec) |
dT (sec) |
| 4427 - 5533 | 896 | 901 | 4.6 |
| 8855 - 9961 | 896 | 901 | 4.7 |
| 0 - 1105 | 896 | 901 | 4.8 |
| 5534 - 6640 | 901 | 905 | 4.5 |
| 9962 - 11068 | 901 | 906 | 4.7 |
| 1106 - 2212 | 901 | 906 | 4.7 |
| 6641 - 7747 | 905 | 910 | 4.7 |
| 11069 - 12175 | 906 | 910 | 4.8 |
| 2213 - 3319 | 906 | 910 | 4.7 |
| 7748 - 8854 | 910 | 915 | 4.7 |
| 3320 - 4426 | 910 | 915 | 4.5 |
| 12176 - 13282 | 910 | 915 | 4.6 |
| 13283 - 14389 | 896 | 942 | 45.5 |
| 16604 - 17710 | 915 | 951 | 36.2 |
| 15497 - 16603 | 915 | 957 | 42.3 |
| 14390 - 15496 | 915 | 959 | 44.5 |
Looking at the table, you see the short tasks being performed in batches of three. The three processing cores each work on a short task, then all three move on to the next task in their own list. At time 915 seconds, all of the short tasks have been completed, and three processing cores are idle, while the fourth core (which was assigned long tasks) is still chugging away, working on its first task.
TBB notices that there are three unstarted tasks in the fourth core's list, and assigns the unstarted tasks to the three idle cores. The result is that about 222 seconds of total work is completed in 63 seconds. Again we see the ability of Threading Building Blocks to dynamically adjust tasking in real time based on the current state of the available processing cores.
Kevin Farnham
Community Manager,
TBB Open Source Community
