Reusing the Root task.

Reusing the Root task.

Hi,
considering this trivial code:

while( i < MAX ) {

  Root& r = *new(allocate_root() ) Root(i);

  spawn_root_and_wait( r );

  ++i;

}
is it possible to reuse the root task in the while loop? Something like:
Root& r = *new (allocate_root) Root();

while( i < MAX ) {

  r.set_i( i );

  spawn_root_and_wait( r );

  ++i;

}
Thanks.

12 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione

With such a question, instead of just aiding and abetting, it's probably better to first ask what you want to achieve?

Hi,

What I want is to create the root task only once before the loop. When the loop begins, I want to pass to the root task some values that differ in each iteration, and create the child tasks.

So I wanted to avoid to create a root task in each iteration.

Thanks.

I still don't see any valid reason to do this. The task will just execute on the current thread (unless occasionally when you're unlucky enough to have it stolen), which means there's no significant speedup on the scalable allocator (unless the task does so little that all of it is overhead). But why would you even need that intermediate level anyway, or why would you reuse the task inside a loop instead of moving the loop inside the task?

Hi,
do you mean...?

class Root {

  public:

    task* execute() {

       while( i < MAX ) {

         Child& c = ....;

         spawn( c );

         /// more things.

       }

    }

}

I'm sorry, there's just not enough useful information here for me to offer any sensible advice, other than to forget all about reusing tasks for now, because at this time it sounds like premature optimisation (or worse). Just make sure to actually create parallelism (creating a task and immediately waiting for it to finish is counterproductive), and to get the reference count correct before spawning the first task (number of expected child tasks that weren't created with create_additional_child plus 1 for the wait). When you have something that works by consulting the tutorial and following its examples, there's still ample time to get creative later on if the need arises.

Hi,

the idea is, while not reaching the the condition to exit the loop

while(i
So, if I put the loop inside the root task, how could I use continuation passing style in each iteration of the loop?
Something like:
[cpp]if(!is_continuation) {

  // precompute some things

  // create childs

  is_continuation=true;

  recycle_as_continuation()

} else {

  // check things

}

return NULL;

Or can I still use a continuation task while having the loop inside the root task?

It is almost always a bad idea to consider the actual number of available hardware threads. Tasks are not threads, and you should not attempt to make them behave as if they were. If you think you can eliminate parallel overhead by creating only the tasks you really need, think again, because you almost certainly won't be able to do accurate predictive scheduling of the tasks that you keep, and the eliminated overhead will probably come back with a vengeance in the form of idle time at the end, where most cores are standing by for the last task(s) to finish. Take advantage of the lightness of tasks to create more of them ("parallel slack"), and let the scheduler spread them over the available threads.

Maybe the documentation should be more explicit in marking this as an anti-pattern, because new adopters keep making the same mistake.

When you're making the assumption that all cores are available to execute those child tasks, you're also saying that there's no advantage in using a continuation, so there's even a contradiction in this setup. On the other hand, continuatons are a more natural match for a task scheduler, so I wouldn't really discourage their use, but then I would also recommend bypassing the scheduler for one of the child tasks. Still, at your current level of experience it looks more like premature optimisation.

Finally, are you quite certain that you shouldn't be using one of the provided algorithms instead?

Hi,

I'm using a data structure(which I dindn't develop, it's the way it is) that returns the id of the articles that should be treated in each iteration of the while loop. If there is one core, it returns one id. If there are 4 cores, it returns 4 id's.
That's why I'm creating as child tasks as cores are in the computer.

In each child, computations for each article are done. Those computations are article specific, the differ for each type of article. Then, in the continuation, some more computations are done, but this time with all the articles as a whole.

"Take advantage of the lightness of tasks to create more of them
("parallel slack"), and let the scheduler spread them over the available
threads."

The only way to be able to create more tasks is setting the number of cores by hand and then pass it to the structure. For example, in my laptop with 2 cores I could tell that there are 4 and create 4 tasks. Cand this be useful at all? I have my reserves...

"Finally, are you quite certain that you shouldn't be using one of the provided algorithms instead?"
First I thought of using parallel_for, but since the computation to be done for each article depends of the type of the article, switched to the task manager.

Thanks.

Hi,

I'm using a data structure(which I dindn't develop, it's the way it is) that returns the id of the articles that should be treated in each iteration of the while loop. If there is one core, it returns one id. If there are 4 cores, it returns 4 id's.
That's why I'm creating as child tasks as cores are in the computer.

In each child, computations for each article are done. Those computations are article specific, the differ for each type of article. Then, in the continuation, some more computations are done, but this time with all the articles as a whole.

"Take advantage of the lightness of tasks to create more of them
("parallel slack"), and let the scheduler spread them over the available
threads."

The only way to be able to create more tasks is setting the number of cores by hand and then pass it to the structure. For example, in my laptop with 2 cores I could tell that there are 4 and create 4 tasks. Cand this be useful at all? I have my reserves...

"Finally, are you quite certain that you shouldn't be using one of the provided algorithms instead?"
First I thought of using parallel_for, but since the computation to be done for each article depends of the type of the article, switched to the task manager.

Thanks.

I don't see how that would require you to stripe them across the same number of tasks? Or is that not what you mean?
I thought that would be the best way. What would you suggest?, pass all of the id's to a single task and work with them inside?

I don't see the connection?

Seems I misunderstood something.

And how do I remove the post? I only see how to edit it.

Thanks!

Hi,

it seems that I edited Raf's post with my answer instead of replying it.

Sorry.

Lascia un commento

Eseguire l'accesso per aggiungere un commento. Non siete membri? Iscriviti oggi