TBB tasks return_list

TBB tasks return_list

I'm writing a benchmark making use of tbb tasks spawned largely with the "allocate_additional_child_of(...)" command, due to not knowing how many tasks to spawn until runtime. Currently I performan set_ref_count(1), then in a loop perform as many allocates and spawns as necessary, then doing a wait_for_all. This seems to run fine for small numbers of tasks, but if I start getting into larger values (thousands), I begin to get random assertion failures, the most common of which is "another thread emptied the return_list."

Due to getting different assertion failures for the same command line arguments, I assumed it was a memory corruption issue, and have tried tracing it down with both microsoft's debug heap and OSX's guardmalloc, both of which haven't found anything (on the contrary, the benchmark runs without problem when using malloc debug tools, possibly due to the slowdown removing any race conditions).

Is there any information on what this return_list is? The assertion fails when I try to spawn another task using allocate_additional_child_of(*this), but I'm not sure what it means. Other assertions that pop up tend to do with "small_task_count corrupted" and "attempt to spawn task that is not in allocated state", even though I only spawn tasks immediately after allocating them. Perhaps how I'm spawning tasks isn't correct? Hopefully understanding this return_list may help me trace down the issue.

6 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

I have composed a simple test for this kind of spawning. It runs from 100 to 1.000.000 tasks with different workloads (from 10 to 10K flops). It works fine on dual procesor 8 core machine. Could you check if it works for you, and if it correctly represents what you tried to do in your app?

static volatile double g_vol = 0;

class leaf_task : public tbb::task
{
tbb::task* execute () {
++g_tasks_started;
for ( size_t i = 0; i < 10000; ++i )
g_vol += i/g_vol;
return NULL;
}
}; // class leaf_task

class simple_root_task : public tbb::task
{
const size_t m_task_count;
const size_t m_task_load;

tbb::task* execute () {
set_ref_count(1);
for ( size_t i = 0; i < m_task_count; ++i ) {
spawn( *new( allocate_additional_child_of(*this) ) leaf_task );
}
wait_for_all();
return NULL;
}
public:
simple_root_task ( size_t tasks, size_t taskLoad )
: m_task_count(tasks)
, m_task_load(taskLoad)
{}
}; // class simple_root_task

void Test ()
{
for ( size_t i=100; i<=1000000; i*=10 ) {
for ( size_t l=10; l<=1000000000/i; l*=10 ) {
tbb::task &r = *new( tbb::task::allocate_root() ) simple_root_task(i, l);
try {
tbb::task::spawn_root_and_wait;
} catch ( ... ) {
printf ("TestX: exception caught
");
}
printf ("Done: num tasks %d, task load %d
", i, l);
}
}
} // Test

All right, now I see where is the problem. Functions like allocate_additional_child_of are treaky beasts as they implicitly use another argument - "this". And it is easy to overlook the fact that "this" plays a critical role here, it establishes thread ownership for the new task. And it has a requirement to be owned by the current thread. In your example you use parent as owner, but the parent can be (and, as the corrupted memory testifies, is) owned by (running on) another thread at the moment of some of its descendants allocation.

Here is the fixed piece of code:

class leaf_task : public tbb::task
{
tbb::task* execute () {
++g_tasks_started;
for ( size_t i = 0; i < 10000; ++i )
g_vol += i/g_vol;
return NULL;
}
public:
static tbb::task* create ( tbb::task *owner, tbb::task *parent ) {
return new( owner->allocate_additional_child_of(*parent) ) leaf_task;
}
}; // class leaf_task

class task_launcher : public tbb::task {
tbb::task* execute(){
spawn( *leaf_task::create(this, parent()) );
return NULL;
}
}; // class task_launcher

Important changes are in bold.

Note also that I turned the instance method "creator" into the static one. It allows you to avoid creating fake local objects to call the instance method, and makes the code cleaner. My heart nearly stopped when I saw that leaf_task object allocated on the stack.

To Andrey Marochko: "volatile" doesn't work in C++, as Arch Robison will tell you, so you have a race on g_vol; the self-assignment operation is another reason to have a mutex.

To "nzea": Don't call allocate_additional_child_of() on "parent", because "this" may have been stolen and then there are ownership issues, see reference below.

To TBB: I see "8.3.2.4 new( this.task::allocate_additional_child_of( parent ))" in reference manual revision 1.9 (may not be most recent, but I have to run now), which should be either "this->" or maybe something like "t.".

(Hey, we had a race on the thread and Andrey won... First and third paragraphs are still relevant, though.)

To Raf: I know pretty well about volatile limits and issues. In the test above I do not care about accumulated value correctness. "volatile" here is just simple way to prevent the optimizer from eliminating the loop in release mode (I checked whether the sample works with different loads in leaf tasks). Though from the methodological standpoint you are right of course, spreading bad habits is not good, so tbb::atomic would be a better choice.

This discussion inspired me to upgrade my implementation of task_group on http://softwareblogs.intel.com/2008/07/02/implementing-task_group-interface-in-tbbto allow multiple threads to create tasks within the same task_group. I used task::self() for the "owner". It depends upon an undocumented implementation feature that when there is no running task ona master thread, nonetheless task::self() returns a reference to a dummy task owned by that thread.

Connectez-vous pour laisser un commentaire.