Problem with Enumerable Thread Specific

Problem with Enumerable Thread Specific

Hello everyone.

I'm working in a project that simulates object collisions. We decided to use TBB to parallelize the app.

Simplyfing the problem, let's assume we're working with only one type of collision object(a particle) and a 2 core machine. One object collision always has, at least, one particle involved. The other object may be a particle or a plane, etc. In this case, as stated above, is always another particle.

So, first we get the next 2 particle collisions to be simulated. We create a root task and inside it, 2 child tasks are spawned. One per particle collision. Inside a child task, the particles involved are cloned to avoid changes in the "real" ones. We simulate locally the collision and store the results( the new collision objects scheduled for each particle ) in thread local storage.

Then, in a continuation task each thread local storage is merged into a vector and then we check if there are any interferences between the collisions simulated.

For example:
In one child task, the collision simulated involves particles 1 and 2 at
time 1 second. It generates a collision 1-3 at time 1.5; and collision
2-4 at time 3.

In the other child task, the collision 3-9 at time 2 generates: 3-10 at time 7 and 9-11 at time 5.

If we were to update the system with this results, the simulator would be corrupted, since there is an inconsistence between collisions 1-3(time 1.5) and one of the "original" collisions, 3-9(time 2). Collision 3-9 should not be done, since it is invalidated by the results of the collision 1-2.

I'll post some code now and then point the problem.

Global typedef public to every task.

typedef tbb::enumerable_thread_specific< std::vector > CollEventEts;

Child Task.
class ChildTask:public tbb::task {

private:

  ContTask* cont;

  CollEventEts* local_ets;

  /// more things.

public:

  ChildTask( ContTask* cont_ ) {

    cont = cont_;

    local_ets = cont->GetEts();

  }

  tbb::task* execute(void) {

    CollEventEts::reference local_vector = local_ets->local();

    /// simulate collisions

    local_vector.push_back( /* Scheduled collision object */ );

  }

};
Continuation Task.
class ContTask: public tbb::task {

private:

  CollEventEts global_coll_ets;

  std::vector< std::vector< CollEvent> > merged;

  /// more things

public:

  /// more things.

  CollEventEts* GetEts(void) { return &global_coll_ets; }

  tbb::task* execute(void) {

    CollEventEts::iterator it( global_coll_ets.begin() );

    const CollEventEts::iterator end( global_coll_ets.end() );
   for(;it!=end;++it) {

      merged.push_back( (*it) );

    }
    /// Check collisions and update.

  }

};
I have previously done some tests with local storage. I ended getting two different results:
a) The global ets had one vector per child task. I. e., two childs = two vectors:
global[0] = {1,2,3,4}
global[1] = {5,6,7,8} b) The global etshad only one vector that contained all the values:
global[0] = {1,2,3,4,5,6,7,8}
Both of them would be ok. However, here comes the problem: in the particle collisions ets, when getting the b option, the ets doesn't contain all of the collisions scheduled, only the local data of one of the childs. Using the previous example of collisions 1-2 and 3-9, instead of having:
global_ets[0] = { 1-3, 2-4, 3-10, 9-11 }it sometimes has
global_ets[0] = { 1-3, 2-4 }or
global_ets[0] = { 3-10, 9-11 }
By the way, before the destruction of the child tasks, I printed to the terminal the contents of the local_vector and each of them had the scheduled collisions as expected:
child task 1 LTS = { 1-3, 2-4 }
child task 2 LTS = { 3-10, 9-11 }
I would really appreciate any help with this, since we're out of ideas.

Thanks and best regards.

16 posts / novo 0
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.

I don't think you're supposed to enumerate over the elements in a thread_local_specific instance while it is being modified concurrently by other threads (the reference might be a bit more explicit about that), and you may be assuming too much about the thread relationship between parent and child tasks. Could any of this be involved (from a first impression only so far)?

(Added 2012-03-27) About "assuming too much about the thread relationship between parent and child tasks": there is no guarantee, only a likelihood that they are executed by the same thread.

Quoting isnork...Then, in a continuation task each thread local storage is merged into a vector and then we check if there are any interferences between the collisions simulated...

[SergeyK] You would like to re-construct some dependencies, right? In that case I would suggest to
use a matrix, instead of a vector.

Why not consider a solution based on a Markov Chain ( MC ), or a simplified form of a State Transition Matrix ( STM ).
It is not just a collisionbetweenparticles Aand B. It is a collision with some probabiity!

...
I would really appreciate any help with this, since we're out of ideas...

Let's say you have two motionlessparticles and a simulation has to be done for timest1, t2 and t3. An STM would look like:

t1 t2 t3

10 10 10
0 1 0 1 0 1

Nothing has changed because particles are motionless. In that case, an MC simulator I have implemented some time agowould output:

State Transition Matrix:

1.00 0.00
0.00 1.00

Number of States ( N ): 2
Length of State Transitions Sequence ( L ): 3
Max Number Of State Transition Permutations ( T = N^(L+1) ): 16

States Transition Sequence: 1111
t1t2 t3
1->1 1->1 1->1 +++ Valid
States Transition Sequence: 1112
1->1 1->1 1->2 ++- Invalid
States Transition Sequence: 1121
1->1 1->2 2->1 +-- Invalid
States Transition Sequence: 1122
1->1 1->2 2->2 +-+ Invalid
States Transition Sequence: 1211
1->2 2->1 1->1 --+ Invalid
States Transition Sequence: 1212
1->2 2->1 1->2 --- Invalid
States Transition Sequence: 1221
1->2 2->2 2->1 -+- Invalid
States Transition Sequence: 1222
1->2 2->2 2->2 -++ Invalid
States Transition Sequence: 2111
2->1 1->1 1->1 -++ Invalid
States Transition Sequence: 2112
2->1 1->1 1->2 -+- Invalid
States Transition Sequence: 2121
2->1 1->2 2->1 --- Invalid
States Transition Sequence: 2122
2->1 1->2 2->2 --+ Invalid
States Transition Sequence: 2211
2->2 2->1 1->1 +-+ Invalid
States Transition Sequence: 2212
2->2 2->1 1->2 +-- Invalid
States Transition Sequence: 2221
2->2 2->2 2->1 ++- Invalid
States Transition Sequence: 2222
2->2 2->2 2->2 +++ Valid

An STM for a motionless particles could be also initialized with all zeros:

t1 t2 t3

0 00 00 0
00 00 0 0

However, in terms of a control theory this is an invalid case:
...
State Transition Matrix is NOT initialized - All State Probabilities are Zeros
...

Here is a case when twoparticles have equal probabilities of colliding with each other:

t1 t2 t3

010101
10101 0

In that case, an MC simulatorwould output:

State Transition Matrix:

0.00 1.00
1.00 0.00

Number of States( N ): 2
Length of State Transitions Sequence ( L ): 3
Max Number Of State Transition Permutations ( T = N^(L+1) ): 16

States Transition Sequence: 1111
t1 t2 t3
1->1 1->1 1->1 --- Invalid
States Transition Sequence: 1112
1->1 1->1 1->2 --+ Invalid
States Transition Sequence: 1121
1->1 1->2 2->1 -++ Invalid
States Transition Sequence: 1122
1->1 1->2 2->2 -+- Invalid
States Transition Sequence: 1211
1->2 2->1 1->1 ++- Invalid
States Transition Sequence: 1212
1->2 2->1 1->2 +++ Valid
States Transition Sequence: 1221
1->2 2->2 2->1 +-+ Invalid
States Transition Sequence: 1222
1->2 2->2 2->2 +-- Invalid
States Transition Sequence: 2111
2->1 1->1 1->1 +-- Invalid
States Transition Sequence: 2112
2->1 1->1 1->2 +-+ Invalid
States Transition Sequence: 2121
2->1 1->2 2->1 +++ Valid
States Transition Sequence: 2122
2->1 1->2 2->2 ++- Invalid
States Transition Sequence: 2211
2->2 2->1 1->1 -+- Invalid
States Transition Sequence: 2212
2->2 2->1 1->2 -++ Invalid
States Transition Sequence: 2221
2->2 2->2 2->1 --+ Invalid
States Transition Sequence: 2222
2->2 2->2 2->2 --- Invalid

And if you add a Binary Space Partitioning (BSP ) a model for collisionsgets more complex.

I wouldn't blame the TBB in your case and I think you should look at an algorithm that simulates collisions.
If that algorithm has a problem than TBB won't fix it.

Best regards,
Sergey

Hi,

thanks for your answers!!

I will read them carefully later.

Hi!!

I the managers are currently discussing about Markov Chains and STM.

However, I have a doubt with the TLS. Sergey, you said that we shouldn't blame TBB.
But shouldn't appear inside the global ETS all of the computed collision objects that are pushed back in each of the child task?
I mean, like in the example in the Reference:

using namespace tbb;

typedef enumerable_thread_specific< std::pair > CounterType;

CounterType MyCounters (std::make_pair(0,0));
struct Body {

    void operator()(const tbb::blocked_range &r) const {

        CounterType::reference my_counter = MyCounters.local();

        ++my_counter.first;

        for (int i = r.begin(); i != r.end(); ++i) {

            ++my_counter.second;

        }

    }
};
int main() {

    tbb::task_scheduler_init init;

    parallel_for( blocked_range(0, 100000000), Body() );

    for (

            CounterType::const_iterator i = MyCounters.begin();

            i != MyCounters.end();

            ++i

    ) {

        printf("Thread stats:n");

        printf(" calls to operator(): %d", i->first);

        printf(" total # of iterations executed: %dnn",i->second);

    }

}

The changes made in the local ets are reflected in the global one. Or did I misunderstood?

Thanks and best regards!

"The changes made in the local ets are reflected in the global one. Or did I misunderstood?"
I know my economical reply above may not look very impressive, but have you considered it at all?

Hi!

Do you mean that what we wanted to do would only work if the continuation task and the child tasks are executed by the same thread?

If so, the TLS would be inadequated, wouldn't it? Could a workaround be a global vector protected by a mutex where to insert the computed next collisions?

Thanks and best regards.

PS: The project manager still didn't say anything about the MC. He must be still thinking about it.

Consider scalability when deciding at what level to lock any shared data structure. Ideally, information is inherited and synthesised through task member variables, if this can be applied at all.

When I was studying Markov Chains I used a book "Markov chains, theory and applications" by Isaacson, Dean L. and Madsen, Richard W.
Thebookis more theoretical than practical.I've madelots of records in my Engineering Notebook and it helped me with
implementations of some softwaresubsystems. There are lots of MC related links, articles and tutorials on the Internet.

Hi,

However, here comes the problem: in the particle collisions ets, when getting the b option, the ets doesn't contain all of the collisions scheduled, only the local data of one of the childs.

Sooo... this was a bug? isnork's code was wrong?

Bye!

It's somewhat hard to understand what you describe, because the names in the code do not match the names you mention later.Let me check whether what you do is what you want. In your code, each of child tasks collects collisions and puts those into a vector stored inside tbb::enumerable_thread_specific (aka ETS). Then the continuation task (that presumably runs after all child tasks) takes all these thread-local vectors (one or more, depending on how many threads executed the child tasks) and pushes the vectors (not collisions!) into another std::vector, i.e. populating a vector of vectors. Is that what you wanted, or perhaps you wanted a single vector of collisions?If a vector of vectors of collisions is what you need (which I doubt :)), then the next question is what you print/inspect? Do you inspect ETS, or the merged vector? And do you only inspect the element with index 0, or all elements (i.e. all collision vectors) stored either inside ETS or inside the "merged" vector?If you really need to merge all collisions into a single vector, then the code in the continuation task is incorrect. You then should have two nested loops, the outer iterating over ETS (as now) and the inner iterating over collisions in a ETS-stored vector. Needless to say that then individual collisions, and not vectors, should be copied into the "merged" vector of all collisions.Alternatively, you may use tbb::flattened2d adaptor to iterate over all the collisions stored in ETS. From the TBB Reference:A flattened2d provides a flattened view of a container of containers. Iterating from begin() to end()visits all of the elements in the inner containers. This can be useful when traversing a enumerable_thread_specific whose elements are containers.A code example for flattened2d is available in the Reference.

Hi!

In your code, each of child tasks collects collisions and puts those
into a vector stored inside tbb::enumerable_thread_specific (aka ETS).
Then the continuation task (that presumably runs after all child tasks)
takes all these thread-local vectors (one or more, depending on how many
threads executed the child tasks) and pushes the vectors (not
collisions!) into another std::vector, i.e. populating a vector of
vectors. Is that what you wanted, or perhaps you wanted a single vector
of collisions?

The idea was to work with a vector of vectors. For example, in the merged_vector[0][0] would be stored the collision that was locally simulated; and the rest would be the collisions scheduled due to the first one.

merged_vector={ // subvectors are stored in ascending order of the timestamp of the first collision.
{1-2,1-3,2-4}, // first vector: collision between particles 1 and 2, collision between particles 1 and 3, etc.
{9-3, 9-11, 10-12 }
}

This way it is easier to check any interferences between collisions, since most of the time it's only needed to check the scheduled collisions in the vector in position i-1 against the first collision in the vector in position i:
1-3 and 2-4 against 9-3. There's an interference between 1-3 and 9-3, so collision 9-3 and all of its scheduled collisions are removed. Since all of them are stored in the same subvector, we simply erase the suitable subvector from the merged one.

is what you print/inspect? Do you inspect ETS, or the merged vector?
And do you only inspect the element with index 0, or all elements (i.e.
all collision vectors) stored either inside ETS or inside the "merged"
vector?

We inspected all the elements inside the ETS in each child task, before leaving the execute method. In the continuation tasks, we also inspected the merged vector, to check wether all the elementes were inserted or not.

Thanks for explanations; I think I now understand. So you say that, while ETS contained the data from both child tasks in global_ets[0] (i.e. in a single collision vector, as both tasks were executed by the same thread), there is less data (only from one of the tasks) in merged[0]. Is that right?

Hi!

Not quite that. When both tasks were executed by the same thread, the global_ets didn't contained all of the elements, only those from one task. However, before destructing the tasks, we checked the elements inside the local_ets of each task and everything expected was there: the local collision event and the scheduled next collision events for the particles involved.

On the other hand, when the tasks are executed by different threads, then global_ets containing two vectors, all the elements are there. None is missing.

So, any idea? :)

Faça login para deixar um comentário.