Very disappointing behavior (I would say bug). See example

Very disappointing behavior (I would say bug). See example

bool func()
{
	bool dummy = false;

	tbb::combinable<int> res( 0 );

	tbb::enumerable_thread_specific<int> tls( 0 );

	tbb::parallel_for(0, 1000, [&](int i) {
		int & local_i = tls.local();

		local_i = 1;

		tbb::parallel_for(0, 1000, [&](int k) {
			dummy = true;
		} );

		res.local() += local_i;

		local_i = 0;
	} );

	const int iRes = res.combine( std::plus<int>() );

	printf("iRes == %d\n", iRes);

	return (iRes == 1000);
}

Anybody would say, that iRes will be equal to 1000. But it is wrong. I spend several days trying to find that bug in very complex software. I finally understood why it is happen, but I would consider it as a design bug in tbb.

This is completely disappointing, because we can't rely on enumerable_thread_specific consistency anymore.

PS: MSVS 2012 update 5. TBB 2017.2

Thread Topic: 

Bug Report
6 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Maxim,

It is a known behavior of Intel TBB Task Scheduler. The thread executing internal/nested parallel loop is allowed to process tasks from outer level parallel loop. Therefore, the TLS local value can be overriden with "local_i = 0" when a thread executing nested parallel loop. You may want to read a documentation article about task isolation.

By the way, why do you need two TLS structures in your application? Is it some sort of reduction? Could not it be solved with tbb::parallel_reduce?

Regards, Alex

Hi Alexei,

This is just an example to show and reproduce bug. In application it is much much more complex and completely different.

We often use TLS for memory buffers to optimize allocation. That behavior of TBB Task Scheduler makes it very dangerous.

Alexei, I read your link. Thank you!

I see, it is well known behavior. The problem is that nested parallelization could be inside functions or even inside third-party libraries. Sometimes we don't know about it or this parallelization could appear in new version of code or library with no notice.

As I understood, the only way in such case is to wrap entire outer loop in

tbb::this_task_arena::isolate

?

Am I right?

Should it work?

#define TBB_PREVIEW_TASK_ISOLATION 1
#include <tbb/tbb.h>

bool func()
{
	bool dummy = false;

	tbb::combinable<int> res( 0 );

	tbb::enumerable_thread_specific<int> tls( 0 );

	tbb::parallel_for(0, 1000, [&](int i) {
		tbb::this_task_arena::isolate( [&]{
			int & local_i = tls.local();

			local_i = 1;

			tbb::parallel_for(0, 1000, [&](int k) {
				dummy = true;
			} );

			res.local() += local_i;

			local_i = 0;
		} );
	} );

	const int iRes = res.combine( std::plus<int>() );

	printf("iRes == %d\n", iRes);

	return (iRes == 1000);
}

Unfortunately I can't test it. I don't have "Preview library" to link with.

Yes, it should work. If you want, you can reduce the scope of isolation to guard only tbb::parallel_for (but there is no any difference).

By the way, what Intel TBB package do you use? Usually, the preview library is shipped together with the main library.

Regards,
Alex

Leave a Comment

Please sign in to add a comment. Not a member? Join today