Why is the arena required to relinquish some threads during task stealing?

Why is the arena required to relinquish some threads during task stealing?

I am using a simulator (Sniper) to run the PARSEC benchmarks that have TBB implementations (blackscholes, bodytrack, fluidanimate, streamcluster and swaptions). I have run many simulations using the medium input sets and I noticed a behaviour that I can't understand and I hope you can help me with. 

In the receive_or_steal_task() method of the custom_scheduler class there is an if clause that states the following: 

if ( return_if_no_work && my_arena->my_num_workers_allotted < my_arena->num_workers_active() ){

#if !__TBB_TASK_ARENA
__TBB_ASSERT( is_worker(), NULL );
#endif
if( SchedulerTraits::itt_possible && failure_count != -1 )
ITT_NOTIFY(sync_cancel, this);
return NULL;
}

Because my simulator runs only 1 application at a time, I can't understand why the condition of this if statement sometimes becomes true and the stealing operations fails (returns NULL). To my understanding, the only situation when an arena is required to relinquish some threads is when a new master thread is initialized and the market needs to reassign the existing threads. Since I run only 1 application, there is only 1 master thread for the entire execution. Am I wrong in this assesment? Is there something else in the task scheduler that can make the stealing operation go down this if clause and fail?

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Raf Schietekat's picture

I have no idea (yet), other than a sense of déjà vu (hey, where did the hyphen go?), but I do have some related questions:

While the superfluous workers are trying to escape like this, aren't hardware resources possibly being wasted by undersubscription, and, if so, couldn't extra workers already be brought in to start work in the new arenas, while the workers in the old arenas gradually unwind their mutual entanglement until the superfluous ones can be put to pasture, of course all with proper handoff to avoid oversubscription? What do measurements show? An interesting issue is how to handle the situations that more workers become unemployed waiting for predecessors than are currently superfluous: would such undersubscription be tolerated, or do the non-superfluous threads go about stealing anyway, or would it be worthwhile, to avoid impeding unwinding entanglement, to bring in other extra workers just for stealing?

I am not quite sure if you pose these questions related my situation or for general use of TBB.

In the general case, I don't think that new arenas causes "unemployed threads". To my understanding when a new master thread initializes a new arena no worker threads are actually created. The new arena will make a request to the existing market for a number of slots (number of worker threads) and depending on the market's limitations it will be assigned the requested number or a lower one. No matter the case, workers from the existing pool will be reassigned to handle the tasks in the new arena and no new workers are actually created. I base my statements primarily on this article: http://software.intel.com/en-us/blogs/2011/04/09/tbb-initialization-termination-and-resource-management-details-juicy-and-gory

In my situation, I run my benchmarks in a 1 thread / physical core scenario. The scheduler is initialized only once which means only 1 arena is created. All the worker threads should be assigned to this arena and I see no reason why at some point some of them should be relinquished.

Raf Schietekat's picture

Paragraph by paragraph:

My questions are related to the subject area touched by your question (workers leaving an arena).

"Unemployed" was only for my hypothetical/proposed solution (which I have not completely spelt out). Currently, new workers may indeed be created when a new master thread initialises a new arena, because of "Lazy thread creation" (described in the reference you've provided). Reassignment may take a while, and possibly improving that situation is the intent of my questions.

The déjà vu was about overshooting the target number of threads and having to correct, or something that sounds similar. If it's actually an accurate recollection, it might explain what you've observed, but somebody else should confirm that first (Alexey?).

Also a correction to my first question: no hardware resources are being wasted if stealing isn't impeded for a worker with a non-empty stack (they don't become "unemployed"), which I now believe is currently the case, but that's also why reassignment may take such a long time.

So, overshooting of the target number of threads is the only reason that could explain what I observed? How is possible to overshoot the number of threads in a simulation with only 1 master thread and 1 arena?

Login to leave a comment.