Cancellation and uncertainty

Cancellation and uncertainty

Suppose you want to find out if a collection is ordered. You write a parallel_for Body that will reset an atomic<bool> as soon as it finds an anomaly, and that reads the atomic from time to time to see if it's still doing useful work (one anomaly is sufficient). When the parallel_for returns, you read the atomic and take action accordingly, right? Wrong!

Another part of the program may have cancelled what you're doing, and that cancellation may or may not have propagated to that parallel_for call. It may have prevented examination of just that part of the sequence where the anomaly occurs, and you won't know about it... unless you also check for cancellation. That's right, you'll have to go over all your code, see if any of it is vulnerable to cancellation, and add checks to prevent any unfortunate outcome, or decide that they will be harmless.

To make matters worse, the current implementation of propagation has a potential problem: cancellation propagation travels bottom-up in the task context group tree. A propagating thread cancels a descendent task group context, which takes no action by itself, but before propagation goes all the way up the chain to the ancestor that was actually cancelled, the propagating thread goes to sleep, another thread checks whether cancellation might cause a problem, but it does not detect the problem because it has not got the message yet, and it does something based on wrong information. The programmer's task has become even harder because of it, because it is not enough anymore to check the critical places, but the information has to be synthesised up the tree, e.g., in the form of ternary logic.

What is the solution?

Probably cancellation propagation has to be top-down instead of bottom-up. This is an easy fix.

But that still does not fully alleviate the burden on the programmer to check and check and check again, and be constantly mindful of potential cancellation, even if it is just exceptional. But isn't that what exceptions are supposed to take care of? I think by now C++ programmers have grown accustomed to writing exception-safe code, so that shouldn't be a problem anymore. The only thing to remember is to provide an exception handler for the target of cancellation, i.e., only the ancestor itself. (It also fixes the problem about bottom-up propagation, but that's probably not an excuse not to fix that problem independently.)

So maybe cancellation should always cause an exception to be raised, even from inside TBB code like parallel_for? What do you think? Is it a real problem, and are exceptions the solution?

11 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Another issue with cancellation: will parallel_invoke() do anything about it? The prime location for that would be in parallel_invoke_helper::run_and_finish(), but I don't see anything there. task::spawn_and_wait_for_all() perhaps? Well, perhaps that wouldn't be so nice, because some algorithms might want to run to completion even in the context (no pun intended) of cancellation. What should be the expectation? Only algorithms with task_group_context parameters? Only algorithms with variants with task_group_context parameters even if none is passed in the current call?

BTW, since parallel_invoke() is all about recursive parallelism, the Reference Manual should document that it is always the first argument that is executed locally, so that, if work among the descendants is divided unevenly, the caller has a way to avoid having a thief run off with the runt of the litter (loot should be substantial).

RE: cancelation

If a tree falls in the wood and nobody is around does it make a sound?

The cancellation has to be from inside out (bottom up). As you come up from the inner task levels the outer ones may have to take special actions. I think a proper way to deal with this is to not only provide for a catch() in the task, but also provide for a cancellation test in any dtor that needs to take special action. call it a try catch/isCanceled. The state of the atomic anomaly flag may be meaningless, excepting that, if say one inner level branch sets the flag, and a different branch exceeds a resource limit and cancels the search. In this case, the upper most level of the search can see not only task canceled (due to resource issue), but also see anomaly found. A cancellation with anomaly not found (and search not completed) may require some action such as a garbage collection and retry. Whereas, detecting cancellation .AND. anomaly, then you can take a quick return without garbage collection and avoid retry.

Good question. Implimentation will require good programming practices to cover all the bases (or holes).

RE: parallel_invoke

I agree, there should be an established and documented rule to follow with regards to task order vs functor order.

In the QuickThread parallel_invoke, the protocol is the initiating thread begins at the front of the list, all other team members begin at the back of the list. When the tasks are unequal, a good programmer should be able to organize the list such that the initiating thread is last to finish (but not by much).

Jim Dempsey

The question is about the implementation of cancellation propagation, so the answer is different than you might expect from a logical point of view where cancellation only spreads through direct ancestry lines (both up and down). If 000 cancels 0 (assuming telephone-like addresses in a curiously binary tree), this may now propagate to disparate context 111 before it reaches 11, which seems potentially problematic if a task in 11 wants to use that seductive parallel_for and then wants to know whether the returned information is fully accurate or its evaluation was cancelled.

About parallel_invoke(), TBB has taken care to make this a tree instead of a task_list (for better recursive parallelism), so there is no sense of stealing from the back of the list. It is more like a miniature parallel_for than like a task_list. Otherwise you should take care to prepare exponentially larger amounts of work and spawn them from big to small, or carefully study which way around that would be with a task_list, and I've already forgotten the details of that myself for the moment, see another thread about that, but if I remember correctly the right order has been reversed at the transition from depth-based scheduling to deque-based scheduling.

And I beg to differ about the aim of the initiating thread: I think that it should take the smallest amount of work for itself, for the dual purpose of bounded recursiveness (if parallel_invoke() is used for that) and making stealing worthwhile (I'm starting to dislike that term more and more because its twisted morality interferes with giving advice about good choices and because it just makes no sense for collaboration to take the form of stealing): you want to break out the ATM that's just been filled, not the one that's almost run out of cash. It would be sheer luck for the initiating thread to be successful in predicting how it should subdivide the work so that the right number of other threads are there just in time to steal the other piles (those other threads are targeting random victims... including each other) and finish just in time before the initiating thread, which is why we're using an adaptive scheduler instead of a predictive one in the first place. As with insurance, it's better to accept the cost of a limited amount of parallel overhead and buy more parallel slack for that, which can be used for reliable performance, than to think you can predict the future and often guess wrongly.

>>About parallel_invoke(), TBB has taken care to make this a tree instead of a task_list (for better recursive parallelism)

This statement doesn't make sense in the context of the parallel_invoke.


"throws" A,B,C,D,E into a context, as to if you call (implement) this a queue, a list or random collection, this has nothing to do with better recursive parallism.

The above invoke could have been issued at top level of program, and with at least 5 threads available, all A,B,C,D,E could begin at once, or potentially one of them starts before the others, say B, and B's task spawns tasts before the others (A,C,D,E) begin. Being a list, queue, random collection has no effect with respect to recursiveness.

The above invoke could have been issued at n'th level of program, with no threads available other than the initiating thread. While the initiating thread is chewing on the items (list) one or more of the other threads can become available. As to if the availability is due to waiting for other tasks in an enclosing task(collection), or because the thread becomes available due to it popping out of all tasks levels (at root), it really doesn't matter. One of the remaining (A,C,D,E) gets selected and begins. Granted you may approve or object to a deeper nest level based off B one thread of a task(collection) to begin on one of (A,C,D,E). But this is a programmers decsion.

Jim Dempsey

I haven't done any tests, but it seems plausible to me that work gets spread out quicker if it's a tree instead of a list. By how much I don't know. Let's keep this about cancellation, though.


What you are looking for is an extension of SEH to include Tasks Canceled. I am inexperienced at using SEH with TBB, maybe someone viewing this thread might offer their assistence.

Jim Dempsey

Jim Dempsey

I had to look up SEH (Structured Exception Handling), which seems to be Microsoft-specific, and adds termination handling even though it was intentionally left out of C++ because RAII is a decidedly better idiom (I'm fully convinced myself from having used Java's "finally"). I don't see what it is supposed to bring here: there's a defect in the implementation (to avoid that three-letter word) that can be corrected, and I suggested to use exception handling instead of explicit cancelling and repetitive probing (already supported by TBB). Well, the user may still opt to use additional probing, e.g., during sequential loops, but would be freed from having to explicitly detect whether a parallel construct returned normally or abruptly.

There is something that TBB should do there too, though, because it's not just a matter of using what already exists after all: when a construct like parallel_for receives the propagation of the cancellation of an ancestral task_group_context, TBB now just makes it stop spawning additional tasks, but it should also make the construct throw an exception when it is about to return. So, with a (corrected) telephone-number tree, 0000 throws exception my_solution_found, which is caught only by 0, and this automatically causes 011 to receive something like tbb::task_group_cancellation from the parallel_for with implicit context 0110, which 011 may choose to intercept for special handling or not, but the code in 011 may safely assume, right after the return of that parallel_for(), that all the work inside has fully completed, or that the result from the parallel_reduce with context 0111 is final and correct, because otherwise exception handling would have skipped that code.


Try looking at modifying the(a) reduction object for passing the cancelation notification as well as if any valid respons was found.

Jim Dempsey

It would be possible as a workaround to use reduction to find out whether all the subsidiary work was done, but that is not a workable suggestion in general. The current defect in cancellation propagation has to be fixed first, and then new exception handling may alleviate the burden of dealing with it.

It gets even trickier than I've considered so far. What if the affected code is currently moving data around between visible storage and temporary storage using a TBB parallel construct that responds to cancellation by abandoning its work partly finished? That forces you to consider what safety guarantee to provide, from the same categories as the already familiar exception safety guarantees. How to provide even basic safety in a sorting algorithm that uses temporary storage? It may still be precluded from using the original location during its operation, thus requiring 200% extra storage instead of just 100% extra storage. To complicate matters further, quite likely the input is just a pair of iterators and not a container than can be swapped in constant time, but a final "commit" will not be able to use parallel_for, because that can easily be cancelled by an external trigger!

I think this use case is at least another very strong motivator for top-down propagation, but it also casts doubt on the concept of always-transparent cancellation propagation, in any direction. Food for thought and discussion!

Leave a Comment

Please sign in to add a comment. Not a member? Join today