How to reschedule tasks but not as continuation?

How to reschedule tasks but not as continuation?

Bild des Benutzers klaim

I have a context where I first setup a bunch of custom tasks which all have a member object saying if the task should be rescheduled or not.
Basically, my custom task type execute() function looks like this:

tbb::task* execute() override
 {
// ... here the code that execute some task specific work
switch( task_result )
 {
 case( RESCHEDULE ):
 {
 // Make sure this task is re-executed as required.
 // See: http://stackoverflow.com/questions/11847689/how-to-reschedule-a-concurrent-task/11849373#11849373
 this->increment_ref_count(); 
 this->recycle_as_safe_continuation();
 break;
 }
 case( END ): 
 {
 // No need for rescheduling.
 break;
 }
 default:
 {
 UCX_ASSERT_IMPOSSIBLE( "Unmanaged task result : '" << to_string(task_result) << "' Task: " << to_string(m_task.id()) );
 break;
 }
 }
return nullptr;
 }

All my task instances are doing this. To instantiate them, I'm using the trick described in the doc to have the main thread waiting for all tasks to end. So I have an emty root task and each time I create a task I allocate it as a child of the root task, then I spawn it in the root task.

Currently I'm having a behaviour, where a task that is setup to reschedule until some specific time has passed is "blocking" the execution of other tasks.
Once this task is not rescheduling anymore, it ends and the other tasks are executed.
However, I need the other tasks to be executed in parallel. The current load isn't much
but I suspect my way of rescheduling tasks to be wrong
I don't have much data, and I'm not a multithreading specialist yet so maybe I'm wrong.

Anyway, I'm looking for a way to make sure that when I reschedule a task, it is put as last of the task queue of the worker thread, to make it easily stealable by other threads.
Each one of these tasks I'm spawning, when rescheduled, shouldn't be continuations of themselves as the current code is written,
they should just be rescheduled to be executed later, not immediately.

Unfortunately I tweaked the rescheduling code but didn't find any way to do this so far as any other code than the present one will assert in TBB code if I try a variant.

So I'm asking if there is a way safe way to reschedule tasks not as continuation but as if it was just spawned and put in some worker thread queue?

29 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.
Bild des Benutzers Raf Schietekat

Instead of (atomically) incrementing the reference count before calling recycle_as_safe_continuation(), simply set it to 1.

The meaning of continuation is not about when it will be executed next but about having the same position in the hierarchy of dependencies as an unordered tree, and its timing will not be any different from that of a newly allocated and spawned task, as tasks are always added to the side of the deque where the worker thread takes the next task to execute (other threads steal from the other side). If you want near-FIFO scheduling, use enqueue() instead of spawn().

Maybe it's also useful to explain what you want to achieve, for a second opinion about this arrangement.

Bild des Benutzers klaim

Zitat:


Instead of (atomically) incrementing the reference count before calling recycle_as_safe_continuation(), simply set it to 1.

The meaning of continuation is not about when it will be executed next but about having the same position in the hierarchy of dependencies as an unordered tree, and its timing will not be any different from that of a newly allocated and spawned task, as tasks are always added to the side of the deque where the worker thread takes the next task to execute (other threads steal from the other side). If you want near-FIFO scheduling, use enqueue() instead of spawn().

I did both suggestions with no apparent change of behaviour. I suppose setting 1 as ref count and using enqueue() instead of spawn() is indeed what I originally wanted so I'll keep the code this way for now.

I still have a task blocking other tasks anyway.

Zitat:

Maybe it's also useful to explain what you want to achieve, for a second opinion about this arrangement.

Other than just having tasks that are guaranteed to not be interdependant in execution thread (which I supposed TBB would avoid as much as possible), I'm not sure what more to add.

One important thing is that the rescheduling system (in my custom task class) is used to "delay" the execution of a taks too. What I mean is that the execution code will first check if it is "time" (based on a virtual time -that can be slower or paused- or on real time) to really execute the task. If not it just reschedule (after checking a potential other condition to not reschedule).

Basically I want my tasks to be able to
1. delay their execution
2. be rescheduled an undefined times

For 1) I don't know how to tell TBB that it can forget about the task for some time (is there one?), so I used the simpler way for now: reschedule until execution.
For 2) the current code would be ok if it wasn't apparently blocking tasks when the task is just rescheduling until real execution.

Strangely, I have other tasks that reschedule the same way than the blocking task but it's the only one that block some others. Maybe it's in my code, I'm actively trying to find the source of this but it's not obvious so far.

Any suggestions on reorganizing this code to achieve this in an efficient way is welcome.

Bild des Benutzers klaim

Sorry, I totally forgot one important thing: the first tasks I want my system to execute are not spwned/enqueued immediately.

I first accumulate them into a tbb::task_list and spawn them only when I reach the call to a run_and_wait() function, which when it is finished will terminate the whole application.

Here is my original run_and_wait() implementation:


void run_and_wait()

		{

			m_root_task.spawn( m_task_list );

			m_root_task.wait_for_all();

		}

When I schedule tasks, either the system is not running, in which case the task is added to the task list; or the system is running, in which case it was spawned, but I modified it to enqueue as you suggested.

Now, this run_and_wait() function still spawn instead of enqueue, I totally forgot about this, so I changed it to this:


		void run_and_wait()

		{

			while( !m_task_list.empty() )

			{

				m_root_task.enqueue( m_task_list.pop_front() );

			}

			m_root_task.wait_for_all();

		}

This apparently works with no locking. So my problem is solved. However, a confirmation that it's the right way to do this would reassure me. :)

If you have any idea how to tell TBB to delay the execution of a task, other than having the task rescheduling itself until it's time to execute, I'm interested.

Bild des Benutzers Raf Schietekat

Zitat:

One important thing is that the rescheduling system (in my custom task class) is used to "delay" the execution of a taks too. What I mean is that the execution code will first check if it is "time" (based on a virtual time -that can be slower or paused- or on real time) to really execute the task. If not it just reschedule (after checking a potential other condition to not reschedule).

This means that you have many tasks constantly polling, creating a lot of overhead.

Even worse, as I neglected to point out before, when you are rescheduling as a safe continuation with a reference count of one, at the end of execute() the scheduler would have to assume that this is a task that probably had children/predecessors that were spawned, stolen and have all finished executing, so I would expect that it would immediately spawn the task (putting it at the head of the queue where it would be picked up next) or even bypass that step as a possible or actual (?) internal optimisation when execute() returns nullptr, in either case effectively causing the worker thread to busy-wait for the right "time".

But then I don't see how changing to enqueue() in run_and_wait() could possibly make any difference; maybe someone else could point out something I have overlooked?

(Added) Is a previously enqueued task that is rescheduled as a continuation always enqueued again instead of spawned, perhaps?

Bild des Benutzers klaim

I think it works better because in my case I had tasks scheduled in this order:

- A
- B
- C

Then a call to run_and_wait().
When using spawn, C was run before A and B even if they were not pushed in that order. When I changed to enqueue, they were executed in that order.
Strange thing is that both A and C are polling indeed and they seem to both work correctly now, while before A wasn't executed at all (not even for polling I mean) until C was executed (ended polling and didn't reschedule).

I'm aware about the polling problem but I don't know how to do it better for now. I think I could setup some kind of additional mechanism that would not reschedule the task immediately but that would need some juggling with the task instances.

Bild des Benutzers klaim

I think that it's because A takes more time than C. A isn't really polling like C does. C does a quick comparison of current time vs time to execute then reschedule (until it don't). A lock a mutex, then goes in a loop to execute a list of functions pointers (currently only 2). Then reschedule. It's more work. Not sure what is the exact impact here.

Bild des Benutzers klaim

Zitat:

(Added) Is a previously enqueued task that is rescheduled as a continuation always enqueued again instead of spawned, perhaps?

I'm not sure but I tried to enqueue explicitely but it can't work before returning from execute() so I guess that it gets re-enqueued as you say. We really need a specialist to clarify this because I don't understand all the TBB code...

Bild des Benutzers Raf Schietekat

Maybe it's purely by accident that the enqueue() appears to cure the problem... Perhaps the problem will reoccur if the enqueue()'ing happens in a different order and/or the program is allowed to run for a longer time (I was sort of grasping at straws about automatic re-enqueue)?

Bild des Benutzers klaim

I just tried by moving the A and B scheduling after the C scheduling.
On first cold execution (automatically launched just after build) I got the same but than originally.
All the following execution (without recompiling/linking anything) did execute the tasks correctly.

So I agree that it is certainly an accident. Or I'm missing something about what is really happening.

Bild des Benutzers Raf Schietekat

OK, at least that red herring is out of the way.

Another part of the explanation would be that you have only two hardware threads that are kept busy polling, leaving no resources available to steal and execute a third task. I don't assume you are trying this on a single core, are you? That would be even more problematic. TBB encourages optional parallelism without mandatory concurrency (of any degree), and this is quite the opposite of that.

If such is the case (and even if it's not), the solution is to get rid of the busy-waiting.

At least at first you should probably try without recycling. As an experiment, just enqueue() (not spawn()) a copy of the new task with the same polling logic (which can be allocated as an additional child of the dummy root task), and let the current task be automatically deleted at the end of the execute().

Next, instead of immediately enqueue()'ing the new task, give it to an independent thread that will enqueue() or perhaps spawn() it at the appropriate time, and stop polling inside the task.

Once that works, you can start thinking about recycling again. If you're worried about the cost of not recycling in the interim, consider embedding a pointer to an allocated structure that you hand over from one task to the next. Otherwise, you'll have to have some way to assure that a task is not executed again while it is still executing from the previous run, which could happen owing to unfortunate timing related to preemption unless you either have an external thread poll for the reference count to drop or use a dummy child that is spawned at the appropriate time (which implies automatic spawn() of its parent), which may defeat any gains from recycling anyway. It would be nice to have some more hooks into the scheduler, but these are the workarounds familiar to me.

That's only if you really really want to roll your own task-based solution, of course, because I still don't know why you wouldn't just use one of the existing algorithms instead (a pipeline comes to mind, where the input stage could pick the next item from a time-sorted heap of future events after consuming them from a concurrent queue).

Bild des Benutzers klaim

Zitat:

Raf Schietekat schrieb:

Another part of the explanation would be that you have only two hardware threads that are kept busy polling, leaving no resources available to steal and execute a third task. I don't assume you are trying this on a single core, are you? That would be even more problematic. TBB encourages optional parallelism without mandatory concurrency (of any degree), and this is quite the opposite of that.

I'm using a Core i7 M 620, it have 4 core in the task manager (hyper-threading seem deactivated).

Zitat:

If such is the case (and even if it's not), the solution is to get rid of the busy-waiting.

At least at first you should probably try without recycling. As an experiment, just enqueue() (not spawn()) a copy of the new task with the same polling logic (which can be allocated as an additional child of the dummy root task), and let the current task be automatically deleted at the end of the execute().

Next, instead of immediately enqueue()'ing the new task, give it to an independent thread that will enqueue() or perhaps spawn() it at the appropriate time, and stop polling inside the task.

Once that works, you can start thinking about recycling again. If you're worried about the cost of not recycling in the interim, consider embedding a pointer to an allocated structure that you hand over from one task to the next. Otherwise, you'll have to have some way to assure that a task is not executed again while it is still executing from the previous run, which could happen owing to unfortunate timing related to preemption unless you either have an external thread poll for the reference count to drop or use a dummy child that is spawned at the appropriate time (which implies automatic spawn() of its parent), which may defeat any gains from recycling anyway. It would be nice to have some more hooks into the scheduler, but these are the workarounds familiar to me.

I will try a variant: I have a task that update virtual clocks. I could make the clock update check if there are tasks to enqueue.
I believe that would be equivalent.

Transfering the task body from one task instance to the other seems possible, I'll check exactly what it implies.

Zitat:

That's only if you really really want to roll your own task-based solution, of course, because I still don't know why you wouldn't just use one of the existing algorithms instead (a pipeline comes to mind, where the input stage could pick the next item from a time-sorted heap of future events after consuming them from a concurrent queue).

This is not a pipeline I'm setting up. Several different system have updating tasks that need to be executed in repetition. It's ok if they are busy, what's not ok is if one task blocks all the others. All the tasks are designed to be non-blocking.

I first tried the Flow Graph as it indeed looks like what I was trying to do (but it's a bit less flexible).
I hit a problem at the time after much experiments. I don't remember exactly yet what I figured but it made me think it was not the appropriate construct for my case.

Bild des Benutzers Raf Schietekat

Zitat:

I'm using a Core i7 M 620, it have 4 core in the task manager (hyper-threading seem deactivated).

Then it remains a mystery to me how one out of three tasks (as the exact grand total?) could possibly be blocked.

Zitat:

This is not a pipeline I'm setting up. Several different system have updating tasks that need to be executed in repetition.

It all depends on what you call a task: TBB tasks are more of an underlying mechanism for implementing higher-level algorithms, and recycling is more of an optimisation thing than a semantics thing, so you could definitely consider another mapping than one-to-one between concepts that merely happen to share the name "task". I initially thought of parallel_do/while(), but managing the order in a separate input stage made me consider a pipeline; if the fixed number of tokens in flight is no obstacle, why not (rhetorical)? But I don't know enough about the situation to say more.

Bild des Benutzers klaim

Zitat:

Raf Schietekat schrieb:

Quote:

I'm using a Core i7 M 620, it have 4 core in the task manager (hyper-threading seem deactivated).

Then it remains a mystery to me how one out of three tasks (as the exact grand total?) could possibly be blocked.

It is not the grand total: there are more tasks, much of them being executed only once, some of them rescheduling until the end of the application (which is a game - but not architectured in a classical way).

Zitat:

Quote:

This is not a pipeline I'm setting up. Several different system have updating tasks that need to be executed in repetition.

It all depends on what you call a task: TBB tasks are more of an underlying mechanism for implementing higher-level algorithms, and recycling is more of an optimisation thing than a semantics thing, so you could definitely consider another mapping than one-to-one between concepts that merely happen to share the name "task". I initially thought of parallel_do/while(), but managing the order in a separate input stage made me consider a pipeline; if the fixed number of tokens in flight is no obstacle, why not (rhetorical)? But I don't know enough about the situation to say more.

Let me try to clarify then.
First, I will use algorithms from tbb over this system. I'm not avoiding high level constructs, I just needed something different in addition of them.
Second, the system in question is just a layer over the Task Scheduler. It's goal is to abstract it, in case I want to spawn a task manually, because I know I will have to port the system in other platforms that TBB can't compile in.
Third, I'm in a context where the workloads widely differs in the second to second.

However, I'm using a differnt task system outside my task scheduler abstraction. Forget about TBB for a minute, I'll get back to it.
My task types are described lilke that: Task‹T› the template parameter being the kind of input it will work with.
These Task objects can be put (moved, to be precise) into two kind of systems:
- my task scheduler abstraction (which only accept Task‹void›): in this case it is assumed that the task will be executed by an unknown thread and that if rescheduled, there is no guarantee that it will always be the same thread. In practice, once moved into my task scheduler abstraction, I create a custom TBB-derived task instance, called AsyncTask_TBB, which can hold the Task and knows how to execute it correctly (taking into account it's infos, if it should be rescheduled, until when, if it have a end condition etc.).
- TaskChain‹T› : it's basically a container which contain Task‹T›. It's thread-safe (at least In designed it this way but I don't have another pair of eyes to check... so far it seems to be correct). It have a execute( const T& ) function that will go through all the contained tasks in order, execute them (if enough time passed since last execute() call), remove the tasks that don't need to be rescheduled. I almost used a pipeline for this but a pipeline suppose that the tasks are interdepenent and don't have special behaviour like I just described, so I had to setup my own system.

Now, the strategy I've setup is meant to be used this way:
- I have several unrelated systems that need to be constantly updated, so they each have one Task‹void› which is moved in the task scheduler. This Task‹void› body will just call the execute() of a TaskChain‹WhateverInputIsNeeded›. This is like setting up different parallel pipelines that loop.
- I have other systems that rely on the previously described system. They need to be updated in sync with their parent system. To do this, the parent system expose a way to insert other tasks into it's TaskChain, which means all the tasks related to this parent system will be executed in sync, on the same thread per update, in a specific order. It is important to understand that this TaskChain, the more or less pipeline, will have a variety of tasks being inserted and removed quite frequentely, but not at all update, so it becomes the synchronization bottleneck.
- There is also a messaging system where needed but most messaging happen through network. Anyway, I use some message queues. Intuitively, the Flow Graph system was the most obvious choicce to setup this system, but it didn't fill all my needs.
- The case with the task that is not executed immediately, delayed, is an exception and I really think it should be one of the parent systems that should keep a queue of these to be executed later, as you described in fact. I tried with rescheduling because I don't have yet such system in place and wanted to setup a delayed task to try.

Part of this strategy is to avoid having to expose thread-safe interface for all my systems. Instead, interdependent systems just sync their tasks by using the same TaskChain, which means there is only a mutex lock when inserting a new task (and I believe it can be removed but I'm not sure how yet).

The important thing I want to achieve is indeed that the workload between parent systems is automatically distributed by TBB. Which fails apparently sometime because of the way I'm rescheduling?

Scheduling a clone instead of rescheduling a task might be a better solution. Semantically, it is. What I fear is the cost of instantiating this clone, which is why I did this.

I which I could open source this system to show you and get more expert advice, but it's not possible at the moment.

I'm not sure this description is clearer than looking at the code unfortunately.

Bild des Benutzers Raf Schietekat

Zitat:

It is not the grand total: there are more tasks, much of them being executed only once, some of them rescheduling until the end of the application (which is a game - but not architectured in a classical way).

OK… I thought you might have isolated a minimal reproducer with just 3 tasks. That's another relief! :-)

Zitat:

I know I will have to port the system in other platforms that TBB can't compile in

Any hint which one? Maybe it's just "not yet"...

Zitat:

Scheduling a clone instead of rescheduling a task might be a better solution. Semantically, it is. What I fear is the cost of instantiating this clone, which is why I did this.

It doesn't have to be a full clone, though. It's like the new C++ move (vs. copy), because the old task's only remaining job is to get deleted, and it's cheap enough to move a pointer. With the scalable allocator, overhead is certainly very manageable, and negligible compared to polling. It would be nicer, though, to have more ways of recycling, because now you can't re-enqueue without polling for the end of the current execute (as far as I know?).

Zitat:

I'm not sure this description is clearer than looking at the code unfortunately.

Thanks for trying. Perhaps it will get clearer when I reread it tomorrow morning. :-)

(Correction after next posting) i->I

Bild des Benutzers klaim

Zitat:

Raf Schietekat schrieb:

OK… I thought you might have isolated a minimal reproducer with just 3 tasks. That's another relief! :-)

Indeed. ^__^;

Zitat:

Quote:

I know I will have to port the system in other platforms that TBB can't compile in

Any hint which one? Maybe it's just "not yet"...

Well Android, and potentially some console platforms. Frankly it's the least important of the reasons, so you can ignore it.

Zitat:

Quote:

Scheduling a clone instead of rescheduling a task might be a better solution. Semantically, it is. What I fear is the cost of instantiating this clone, which is why I did this.

It doesn't have to be a full clone, though. It's like the new C++ move (vs. copy), because the old task's only remaining job is to get deleted, and it's cheap enough to move a pointer. With the scalable allocator, overhead is certainly very manageable, and negligible compared to polling. It would be nicer, though, to have more ways of recycling, because now you can't re-enqueue without polling for the end of the current execute (as far as i know?).

It looks like it's not a big change in my code so I will first try this see if whatever the order I get good results. Fortunately I isolated the code enough to try different TBB use without changing the semantic of the program (order of task execution is not important in the case they are asynchronous tasks).

Zitat:

Quote:

I'm not sure this description is clearer than looking at the code unfortunately.

Thanks for trying. Perhaps it will get clearer when I reread it tomorrow morning. :-)

Sorry about that. ;____;

Bild des Benutzers klaim

So, as suggested, I changed this code:


tbb::task* execute() override // this is AsyncTas_TBB::execute(), my custom tbb::task derived class

			{

                                // blah blah blah
				switch( task_result )

				{

				case( TaskEndRequest::RESCHEDULE ):

					{

						// Make sure this task is re-executed as required.

						// See: http://stackoverflow.com/questions/11847689/how-to-reschedule-a-concurrent-task/11849373#11849373

						this->set_ref_count( 1 );

						this->recycle_as_safe_continuation();
						break;

					}

					// blahblah

				}

				return nullptr;

			}

To this:


tbb::task* execute() override // this is AsyncTas_TBB::execute(), my custom tbb::task derived class

			{

                                // blah blah blah
				switch( task_result )

				{

				case( TaskEndRequest::RESCHEDULE ):

					{

						// Make sure this task is re-executed as required.

						m_task_scheduler.schedule( std::move( m_task ) );

						break;

					}

					// blahblah

				}

				return nullptr;

			}

Here m_task is my special Task type object which is movable-only (like std::thread if you will) and m_task_scheduler is the unique task scheduler abstration (a reference to it).
So basically I just take the task content, reschedule it again which will create another AsyncTas_TBB (my tbb::task based task) and enqueue it. It is exactly like if I cloned the executing task as all informations about the task is in m_task.

This apparently work, whatever the order of the different tasks I'm trying. :D
Obviously I should wait to have a more complete application (I have like 15% of it right now) to make sure but it seems like the correct solution. I just hope reallocating tasks in loop this way will not be a problem in the future.
Anyway it seems to be closer to the semantic I needed.

Bild des Benutzers Raf Schietekat

Zitat:

Let me try to clarify then.

I hope I picked up a little more on a second reading.

It's true that a pipeline always runs items through to the end. I don't know how many applications would benefit from being able to have them dropped somewhere in the middle, perhaps quite a few. Now you have to simulate that (using a special pointer value, or perhaps something like "bool m_zombie"), which is not very elegant and also suboptimal. More seriously, pipelines are also quite unfair to each other, so you would have to run each from a separate application thread (giving them their own arenas to avoid starvation-causing entanglement), but this could lead to oversubscription instead. That limitation definitely warrants an overhaul to allow several pipelines to be run together as a group. Meanwhile, you should use your own solution (especially if you also want to insert additional "stages" while the "pipeline" is running).

TBB was originally conceived to optimise throughput, where it didn't matter that sometimes the scheduling was very unfair. In your system, take care to avoid spawn() where it could lead to high-level unfairness, so use enqueue() instead in the right places. Note that running an enqueued task takes priority over stealing a spawned task (last time I looked anyway), in an effort to improve locality (good for performance), but that also means it still does not entirely prioritise minimising backlog, so, while starvation is now avoided, there's still some lingering unfairness that may be relevant in a real-time application, although I haven't seen any complaints about this yet so either it's not very serious or I'm just seeing ghosts. Recycling as a continuation internally does the same as spawn()'ing the recycled task, and you may have to work around that, as you have already experienced.

Perhaps others can suggest more general patterns for architecting this kind of application?

Zitat:

Well Android, and potentially some console platforms. Frankly it's the least important of the reasons, so you can ignore it.

Android is currently being targeted: check out (as in have a look at) the latest development release.

Zitat:

m_task_scheduler.schedule( std::move( m_task ) );

I only used move/copy as an analogy. Maybe there are situations where perhaps locality considerations make move the winner, but don't forget about pointers.

Zitat:

This apparently work, whatever the order of the different tasks I'm trying. :D

Great!

Bild des Benutzers klaim

Zitat:

Raf Schietekat schrieb:Meanwhile, you should use your own solution (especially if you also want to insert additional "stages" while the "pipeline" is running).

It is actually certainly the most important feature to add and remove work from a "pipeline" in my case as it changes depending on the context from second to second.

Zitat:

Android is currently being targeted: check out (as in have a look at) the latest development release.

Good news!

Zitat:

Quote:

m_task_scheduler.schedule( std::move( m_task ) );

I only used move/copy as an analogy. Maybe there are situations where perhaps locality considerations make move the winner, but don't forget about pointers.

I didn't move it because of your analogy, but because the object is move-only semantic. I agree pointers are supposed to be faster (this object is bigger than a pointer as it hold all extra information about the task) but it makes the code simpler to understand so far and wasn't a bottleneck so I'll keep it that way for now. It also avoid me to have to use new, so makes things more predictable. I'll switch to unique pointers if I see it as a performance bottleneck.

Bild des Benutzers Wooyoung Kim (Intel)

If you want to use 'recycle_as_safe_continuation' to reuse a task and reschedule it at a later time, you should set the ref-count to at least 2 and when the time comes, you need to explicitly decrement the count, and if the ref-count is 0, then spawn it again. If you set it to 1, by the time the thread returns to the TBB scheduler, it decrements the ref-count, sees it go down to 0, and immediately re-schedule it by pushing it onto the taskq. Then, it is picked up by the scheduler, and thereby creates an illusion that it blocks execution of other tasks.

Explicitly manipulating ref-count is extremely prone to errors, we don't usually recommend using it unless users have clear need to use it.

Please consult the TBB reference manual for differences between recycle_as_continuation() and recycle_as_safe_continuation().

Bild des Benutzers Raf Schietekat

Zitat:

If you want to use 'recycle_as_safe_continuation' to reuse a task and reschedule it at a later time, you should set the ref-count to at least 2 and when the time comes, you need to explicitly decrement the count, and if the ref-count is 0, then spawn it again.

This was already considered, but the problem is that scheduling (re-)spawned tasks is done in the "wrong" order relative to "time" (by the same thread at least), while reliably re-enqueuing requires a polling loop instead of a simple conditional enqueue() (which would otherwise make the task re-spawned or re-enqueued based on execution timing issues): the reference count must be seen to have dropped to 1 already (to avoid re-spawn at the end of execution) before it is set to 0 (just to avoid any possible complaints during debugging, because it is not needed here) and the task then unconditionally enqueued. Because polling can easily lead to problems, I suggested to abandon the idea of recycling the TBB tasks altogether.

On the other hand, if a central thread does the timed spawning, the tasks will always be stolen by the workers, so they don't need to be enqueued for scheduling in the right order, and the problem described above goes away. But routine stealing sounds like a bad lifestyle, so then I would wonder how costly it is relative to taking from the enqueue() queue (is it more expensive or not?), and also whether it wouldn't be better to consider affinity (and what that would mean for relative timing?).

Bild des Benutzers Wooyoung Kim (Intel)

If what klaim wants to do is akin to 'discrete event simulation', I would suggest klaim to look at tbb:concurrent_priority_queue and see if it can be used.

Using enqueue does not seem viable here because if a thread finds that it is premature to execute a task, it has to recycle it and re-enqueue it at the back of the queue.

As an alternative to the solution you suggested, you could distribute the responsibility of 'timed spawning' among worker threads (including the main thread).

This probably requires a shared container which contains unfinished tasks. Then, at the end of task execution, each thread could access the container and spawn tasks that are ready.

Bild des Benutzers Raf Schietekat

tbb::concurrent_priority_queue does look better than doing the sorting in a pipeline input stage as I suggested above. It would just require a bit of glue to use it in a work heap pattern with parallel_while/do() (whose feeder would not be used).

I believe we were discussing enqueue() for use with a timer thread where recycling had its own legitimate purpose ("Several different system have updating tasks that need to be executed in repetition."), even if we ended up concluding that the "task" contents should be moved between separate tab::task instances. BTW, it doesn't seem right to have potentially unbounded spawning/enqueuing, if a pipeline or parallel_while/do() could be used instead, but not continuing the conversation along that path might have prevented the solution that was ultimately found.

Bild des Benutzers klaim

Wooyoung Kim thanks for your comments, the priority queue would indeed be of great use to optimize this system (as the global logic of re-trying to execute a task's work later would still be necessary anyway). I will look into it in more details and report here as soon as I meet my bottlenecks.

 BTW, it doesn't seem right to have potentially unbounded spawning/enqueuing, if a pipeline or parallel_while/do() could be used instead, but not continuing the conversation along that path might have prevented the solution that was ultimately found.

As I said before, pipeline is not appropriate in my case (it is not a pipeline pattern).  
I'm not sure for parallel_while/do, I'll have to check, but it would be hard to keep the uniformity of my system for both "looping" and "one-shot" tasks.
I will think more about it. 

Bild des Benutzers Raf Schietekat

You can use pipeline even if it's not strictly a "pipeline pattern". In this case it was just to have sorting in the input stage, but a concurrent_priority_queue with a parallel_while/do() is probably a better equivalent. The pattern implemented directly by parallel_while/do() is "work heap" (if I remember well), but don't let that deter you from building on top of it to achieve what you want to get.

Bild des Benutzers klaim

I'm getting back to this thread because I made some interesting improvement thanks to the discussion here.

In my current code I made the main thread, which so far only was waiting until the application end, manage a concurrent priority queue of tasks which have time intervals (either rescheduling or not). This change, instead of rescheduling the task until it's time to execute it, does indeed made things faster. Now I have a fluid game system.

However there is one thing I would like to make better: each time I schedule my Task instance, I must new a custom tbb::task object. Once executed, the tbb::task instance can't be reused by using tbb::task::enqueue()  which is the point of the whole discussion.

Returning a task as it's own continuation pose problems as described before, because what I want is to enqueue it, not make it a continuation.
Do you guys can think of another way to enqueue this used task that I could try?

 

Bild des Benutzers Raf Schietekat

If you don't want a solution that has already been proposed above, simply contribute a patch. :-)

Bild des Benutzers klaim

I did use the previous solutions (using enqueue() and moving my task object in a new tbb::task when I need rescheduling), I can't use pipelines because it don't match the setup I have, but I just wanted to ask if there is any way to remove the need for a new tbb::task on rescheduling. I don't see what patch I could do other than to remove the debug assertion that the task have to be in 'allocated' state, or add a reschedule function which reset the state to 'allocated' state and make tbb not cry in debug.

Another way would be that I use new/delete to create the Task object but then I would loose far more performance than the current organisation based on moving it.

Anyway, I just did some profiling and it's not at all the new () calls on tasks that takes the more time, it's tbb's threads  work stealing algorithm. I think it suggests that I'm not throwing enough work yet to make optimization work relevant so I'll get back to concrete features now.

Bild des Benutzers Raf Schietekat

I meant a new state (like reexecute, but for enqueue instead), which shouldn't be too difficult. Still, better to go with where profiling leads you.

Melden Sie sich an, um einen Kommentar zu hinterlassen.