Spawn and count children before setting reference count - possible?

Spawn and count children before setting reference count - possible?

Dear collegues.

According to TBB manual:

Make sure to call set_reference_count(3) before spawning any children. Failure to do so results in undefined behavior.

I've tried. Really fails.
In the loop within the task 'A' on some condition I need to spawn task 'B' in sake of performance. I do not know beforehand how many will be spawned, but I need to wait for them in the end of task 'A'.
Is that possible?
Of course I can postpone spawning the tasks, queue them into some local memory and spawn after the loop when the numbers are already counted. I suspect that this will significantly slow down my algorithm, so I have to check somehow.

Thank you,
Ovcharenko Egor. 

9 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

You have various options, including task_list, allocate_additional_child_of(), recursive parallelism, ..., roughly in order of decreasing latency and increasing programming effort.

Dear Raf.

In my code I have two types of tasks - DispTask and PropTask, and two concurrent queues - basketQ and collectionQ.
PropTask pops 1 basket from basketQ and either pushes one collection into collectionQ or just finishes with nothing, which means the leaf of the task tree.

1) Initially I had a schema like this:

DispTask pops a group of collections from collectionQ and process them one after another in the loop. At some point of this processing a new basket may be produced. In such a case I push this basket into the basketQ and just increment the counter of pushed baskets 'ninjected'. At the very end of DispTask I do the following.

 

if (ninjected) {
task_list tlist;
empty_task& cont = *new(allocate_continuation()) empty_task();
cont.set_ref_count (ninjected);
for (int i = 0; i < ninjected; i++) {
PropTask& pt = *new(cont.allocate_child()) PropTask();
tlist.push_back (pt);
}
if (!tlist.empty()) spawn(tlist_normal);
}
return NULL;
 

Here cont.set_ref_count (ninjected) seems to be necessary  and also necessary before spawning any children.

2) As I want to improve a little bit this algorithm, I try to push the basket and allocate and spawn the corresponding PropTask immediately when the basket is ready.

For this first I try to do the following. Instead of cont.allocate_child() I write

PropTask& pt = *new(allocate_additional_child_of(cont)) PropTask();

This allows not to specify the number of childer beforehand and seem to work fine without this line

cont.set_ref_count (ninjected);

Is that right from the theory point of view?

3) As the next I get rid of task_lists and instead of filling the list and spawning it I spawn tasks one by one in the loop.

4) Here comes the most interesting. As I move the task spawning code into the innermost methods the code stops working.

So, somewhere deep inside the method, called from DispTask there is

DispTask& thisTask = (DispTask&) tbb::task::self();
 if (thisTask.cont == NULL) thisTask.cont = new(thisTask.allocate_continuation()) empty_task();
 PropTask& pt = *new(tbb::task::allocate_additional_child_of(*thisTask.cont)) PropTask(priority);
 tbb::task::spawn(pt);

'cont' is the continuation task stored as a DispTask class data member.

Any ideas?

Thank you,
Ovcharenko Egor. 

In fact as soon as I substitude using task_list to immediate spawning of tasks the code brask with the following message.

pure virtual method called
terminate called without an active exception

Using a container of elements with an equal number of tasks seems a bit suspicious. Perhaps you could use a built-in algorithm like pipeline() or parallel_do() instead?

I don't see anything obviously wrong about what you've described, let alone how you could be calling a pure virtual method, but you should be able to let the debugger pinpoint the exact call location as a significant clue?

One important note.

DispTask -> n1 * PropTasks -> DispTask -> n2 * PropTasks -> ...
And there are some conditions in both tasks which may lead to termination.

I think, the problem (regarding your original point number 4) could be this: the cont task is spawned as soon as it's reference count reaches zero. Imagine this scenario:

  1. cont is created with ref count 0
  2. you create t1 using allocate_additional_child_of(cont). cont's ref count is set to 1.
  3. you spawn t1
  4. another thread steals t1 and executes it. Note that the original thread is still executing the DispTask task that spawns the PropTask tasks.
  5. once t1 finishes, it decrements cont's ref count, which reaches 0. cont is spawned as a result of this and starts executing, finishes and is destroyed
  6. the original thread continues with the loop and tries to create t2 using allocate_additional_child_of(cont), but at this point, cont no longer exists.

Ooops, I forgot to also include the solution, not just explain the problem. There is a way to make this immediate spawning work. The issue is, that we cannot allow cont's ref count to reach zero untill all of the PropTask tasks are spawned. We can do this by setting the ref count of cont after it is created to 1. This way, it will only drop to 1 even if all spawned tasks finish before the spawner does.

On the other hand, we want the cont to start eventually, so the ref count needs to be decremented after all PropTask tasks have been spawned. I can think of two options:

First, you decrement the ref count manually, simply calling cont->decrement_ref_count(). But, in that case, you need to check whether this dropped the ref count to zero. Fortunately, the decrement_ref_count works atomically and returns the new count, so all you need to do is "if (cont->decrement_ref_count()==0) spawn(*cont);"

Second, you could make the task scheduler do this for you. You set the cont task as the parent of the DispTask. This means that once DispTask finishes, it decrements the reference count of it's parent (the cont task) and if that count reaches zero, it is automatically spawned.

How could I have missed that? Good catch!

Anyway, I suppose the second option is about using recycle_as_child_of() (a common pattern with use of continuations)? I suppose you can do that, with a switch to let it immediately return from execute() in that role... but then I would go with recycle_as_safe_continuation(), instead (same switch, less awkward because the task stays at the same level, and it can perhaps retain some state information instead of switching that over to a separate continuation task). Or, if you know in time when a child task will be the last, you can just allocate it normally and let it consume the single preset reference, but I would still prefer the recycle_as_safe_continuation().

Leave a Comment

Please sign in to add a comment. Not a member? Join today