parallel_reduce getting stuck

parallel_reduce getting stuck

Hi,

Is it allowed to call parallel_reduce in a method that is itself already called from parallel_reduce?

A test code, given below, often runs taking a few seconds, but sometimes gets stuck in a busy wait, see stack trace futher down for an example.

Is there a bug in this test code, is it misuse of parallel_reduce or is it some other problem?

If it is misuse, what would be the right way to count the leaf nodes in an n-ary tree?

Thanks

=============================================================

#include "tbb/blocked_range.h"
#include "tbb/parallel_reduce.h"
#include "tbb/task_scheduler_init.h"

#define LOOP 16
#define SIZE 1024
#define NEST 3
#define GRAIN 256

class thing {
public:
void doit (int & nb, int nest);
};

class reduce_test {
private:

thing * global_obj_;
int nest_;
int nb_;

public:

reduce_test (reduce_test & x, tbb::split) :
global_obj_ (x.global_obj_), nest_ (x.nest_), nb_ (0) {}

reduce_test (thing * global_obj, int nest) :
global_obj_ (global_obj), nest_ (nest), nb_ (0) {}

void operator () (const tbb::blocked_range & r)
{
for (size_t i = r.begin (); i != r.end (); ++i)
{
int nb = 0;
global_obj_->doit (nb, nest_);
nb_ += nb;
}
}

void join (const reduce_test & x)
{
nb_ += x.nb_;
}

int nb () {return nb_;}

};


void thing::doit (int & nb, int nest)
{
++nest;

if (nest < NEST)
{
reduce_test rt (this, nest);
tbb::parallel_reduce (tbb::blocked_range(0, SIZE, GRAIN), rt);
nb = rt.nb ();

}
else
{
for (int i = 0; i < LOOP; ++i)
{
for (int j = 0; j < LOOP; ++j)
{
if (i == j) ++nb;
}
}
}
}


int main ()
{
tbb::task_scheduler_init init;

thing * global_obj = new (thing);

int nb = 0;

global_obj->doit (nb, 0);

return 0;
}

=================================================================

TOS for both threads:

(gdb) thread 1
[Switching to thread 1 (Thread 46912498605664 (LWP 10654))]#0 0x0000003b7e9ae159 in sched_yield ()
 from /lib64/tls/libc.so.6
(gdb) thread 2
[Switching to thread 2 (Thread 1077938528 (LWP 10655))]#0 0x0000003b7e9ae159 in sched_yield () from /lib64/tls/libc.so.6

=====================================================================

Full stack trace

[Switching to thread 1 (Thread 46912498605664 (LWP 10654))]#0 0x0000003b7e9ae159 in sched_yield ()
from /lib64/tls/libc.so.6
(gdb) where
#0 0x0000003b7e9ae159 in sched_yield () from /lib64/tls/libc.so.6
#1 0x00002aaaaaac1b15 in tbb::internal::AtomicBackoff::pause (this=0x7fffffe50920) at tbb_mac hine.h:149
#2 0x00002aaaaaac8933 in __TBB_LockByte (flag=@0x7fffffe50d10) at tbb_machine.h:563
#3 0x00002aaaaaacbd4b in tbb::task_group_context::unbind (this=0x7fffffe509a0) at ../../src/tbb/task.cpp:2794
#4 0x00002aaaaaacbaa1 in ~task_group_context (this=0x7fffffe509a0) at ../../src/tbb/task.cpp:2744
#5 0x000000000040168f in tbb::internal::start_reduce, reduce_test, tbb::simple_partitioner>::run (range=@0x7fffffe50a60, body=@0x7fffffe50a90, partitioner=@0x7fffffe50a8f) at parallel_reduce.h:136
#6 0x00000000004015a7 in tbb::parallel_reduce, reduce_test> (range=@0x7fffffe50a60,
body=@0x7fffffe50a90, partitioner=@0x7fffffe50a8f) at parallel_reduce.h:301
#7 0x00000000004013ad in thing::doit (this=0x504ae0, nb=@0x7fffffe50ae4, nest=2) at main.cc:58
#8 0x0000000000401c66 in reduce_test::operator() (this=0x7fffffe50dd0, r=@0x505350) at main.cc:36
#9 0x00000000004019c6 in tbb::internal::start_reduce, reduce_test, tbb::simple_partitioner>::execute (this=0x505340) at parallel_reduce.h:149
#10 0x00002aaaaaacf1da in tbb::internal::CustomScheduler::wait_for_all (
this=0x504680, parent=@0x504d40, child=0x504bc0) at ../../src/tbb/task.cpp:1993
#11 0x00002aaaaaaca294 in tbb::internal::GenericScheduler::spawn_root_and_wait (this=0x504680, first=@0x504bc0,
next=@0x504bb8) at ../../src/tbb/task.cpp:1776
#12 0x00000000004016d4 in tbb::task::spawn_root_and_wait (root=@0x504bc0) at task.h:644
#13 0x000000000040165a in tbb::internal::start_reduce, reduce_test, tbb::simple_partitioner>::run (range=@0x7fffffe50da0, body=@0x7fffffe50dd0, partitioner=@0x7fffffe50dcf) at parallel_reduce.h:136
#14 0x00000000004015a7 in tbb::parallel_reduce, reduce_test> (range=@0x7fffffe50da0,
body=@0x7fffffe50dd0, partitioner=@0x7fffffe50dcf) at parallel_reduce.h:301
#15 0x00000000004013ad in thing::doit (this=0x504ae0, nb=@0x7fffffe50e34, nest=1) at main.cc:58
#16 0x0000000000401443 in main () at main.cc:83
(gdb) thread 2
[Switching to thread 2 (Thread 1077938528 (LWP 10655))]#0 0x0000003b7e9ae159 in sched_yield () from /lib64/tls/libc.so.6
(gdb) where
#0 0x0000003b7e9ae159 in sched_yield () from /lib64/tls/libc.so.6
#1 0x00002aaaaaac1b15 in tbb::internal::AtomicBackoff::pause (this=0x403ffe40) at tbb_machine.h:149
#2 0x00002aaaaaac8933 in __TBB_LockByte (flag=@0x7fffffe50d10) at tbb_machine.h:563
#3 0x00002aaaaaacbc54 in tbb::task_group_context::bind_to (this=0x403fff10, parent=@0x7fffffe50ce0)
at ../../src/tbb/task.cpp:2775
#4 0x00002aaaaaacb19b in tbb::internal::allocate_root_with_context_proxy::allocate (this=0x403fff08, size=48)
at ../../src/tbb/task.cpp:2542
#5 0x000000000040172b in operator new (bytes=48, p=@0x403fff08) at task.h:815
#6 0x0000000000401606 in tbb::internal::start_reduce, reduce_test, tbb::simple_partitioner>::run (range=@0x403fffd0, body=@0x40400000, partitioner=@0x403fffff) at parallel_reduce.h:136
#7 0x00000000004015a7 in tbb::parallel_reduce, reduce_test> (range=@0x403fffd0,
body=@0x40400000, partitioner=@0x403fffff) at parallel_reduce.h:301
#8 0 x00000000004013ad in thing::doit (this=0x504ae0, nb=@0x40400054, nest=2) at main.cc:58
#9 0x0000000000401c66 in reduce_test::operator() (this=0x504ed8, r=@0x5063d0) at main.cc:36
#10 0x00000000004019c6 in tbb::internal::start_reduce, reduce_test, tbb::simple_partitioner>::execute (this=0x5063c0) at parallel_reduce.h:149
#11 0x00002aaaaaacf1da in tbb::internal::CustomScheduler::wait_for_all (
this=0x505e00, parent=@0x506040, child=0x0) at ../../src/tbb/task.cpp:1993
#12 0x00002aaaaaacae93 in tbb::internal::GenericScheduler::worker_routine (arg=0x5045a0) at ../../src/tbb/task.cpp:2430
#13 0x0000003b7fa060aa in start_thread () from /lib64/tls/libpthread.so.0
#14 0x0000003b7e9c53d3 in clone () from /lib64/tls/libc.so.6
#15 0x0000000000000000 in ?? ()

6 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Thank you for reporting the issue.

TBB parallel algortihms should be nestable like in your code. The stacks suggest me that we might have had some problem in the implementation of work cancellation,whichis new functionality in TBB and keeps changing. Could you please tell what TBB package you used?

Hi Alexey,

I'm using tbb20_20080408oss_src

Thanks for looking at it.

Thanks for the good test case! As Alexey said we've recently
reimplemented most of the internal cancellation code. In particular we've
removed all the locks from the normal execution control path (that is no locks
are used when there is no exception or cancellation in flight). I've tested
your example with this new version and it works fine. You can find it in the
last development release (tbb20_20080512oss).

I would also like to ask you (if you don't mind) to contribute your
example through our contribution page so that I could add it to our regression test suite.

Thanks for the quick turn-around. I've submitted the test case per your request for your regression test suite.
I'm curious how you deal with those regression tests by the way. For instance I needed to run this test case a number of times before it would get stuck. A single successful run doesn't guarantee much. I have the same problem of course with the application I'm porting.
Debugging is also a problem. I already have resorted to producing a core file first and then inspecting it with the debugger, because when running from scratch in the debugger the code would not fail. I wonder what your thoughts on these issues are.

Thank you for your contribution!

Honestly speaking we do not have a special infrastructure for regression testing so far. Its absence is more or less compensated by the fact
that we run daily automated test sessions (using our unit tests and example apps) on a few dozens of machines (several
compilers on each machines + debug/release modes). Thus overall each test case is
run daily for about a hundred times, and normally it is enough for most of
even the sporadic bugs to manifest themselves (at least once in a few days). Besides we are working on building performance test suite that will also help with catching correctness bugs.

Yet you are right that to detect bugs with poor reproducibility with
higher degree of reliability multiple runs are necessary. A few recent cases
(one of which is yours) convinced us that we need to extend our test harness to
support multiple runs. This will also require separating regression test cases
with bad reproducibility in a separate group because repeating the whole tests
session for even a hundred times would take a few weeks.

As it regards practical aspects of debugging, then your technique is
probably the most universal one and is what we also often use (when I see some
signs of a sporadic problem I use shell's "for" to run a test for a
thousand times and then either inspect core dump or attach debugger if the test
hangs).

Yet you have another good option to deal with the correctness
problems, which is often unavailable for us (, but I'll explain why it
is so below). There is a great tool called Intel Thread Checker. It is
specifically designed to find all the sorts of correctness issues in
multithreaded applications, and what is really invaluable, it does not require
the problem to actually occur in the test run to detect it. So you may want to
try it out, and I hope that it will help you.

By the way, support for Intel Thread Checker has been significantly
improved in the last development release of TBB. Most of the false positives
(which we were aware about) were eradicated.

And, if you are curious why we cannot use this great tool ourselves
(at least most of the times), this is because TBB scheduler use very specific
mechanisms for inter-thread communication. The only way to let Intel Thread
Checker know about them would be to insert a lot of special API calls, which
would obviously affect TBB's control flow significantly. You, as the TBB user,
are protected from the false positives that might be caused by the TBB
internals, but since these internals are what we usually need to debug - we are
not.

Leave a Comment

Please sign in to add a comment. Not a member? Join today