Push back thread safety

Push back thread safety

Dear all,

I need to update concurrently a concurrent vector, and right now I'm just pushing back items in different threads. Now, since operator[] isn't thread-safe in updates, I'm having doubts:

Is push_back thread-safe when inserting an element? My requirements do not need any particular ordering, so I'm just concerned about the elements not their order.

Thanks!

12 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.
Best Reply

concurrent_vector "can be concurrently grown and accessed", and push_back() is listed under "Concurrent growth operations".

You can safely do any number of both kinds of operations at the same time from different threads.

Thanks! :)

>>since operator[] isn't thread-safe in updates

Not so. It is the programmer's responsibility to generate a proper index. Assuming the vector is properly sized...

vec[fill++] = x; // would not be thread safe if the ++ on fill is not thread safe (without you writing the ++ operator it would not be thread safe.

However, if you overloaded the ++ operator for the type for fill, you could return sync_fetch_and_add(&value, 1), where fill.value is the internal member variable of fill representing its value. Then the vec[fill++]=x would be thread safe.

*** the above is true for multi-thread ++ fill operation, at least for the index generation. In the preceding, a server thread (one that empties the vector), cannot be assured that the producer (the filling thread), has yet completed the store (=x part). Therefore some care must be taken by the consumer.

One technique is to place an invalid value into the cells of the vector indicating empty. As an example, if the vector is a vector of pointers, then a NULL or (void*)-1 could be used. Floating point could use one of the possible NaN's (that is not in the set of processor generated NaN's).

Jim Dempsey

www.quickthreadprogramming.com

I think the question was about concurrent use of push_back() and operator[], which is not safe in std::vector but safe in tbb::concurrent_vector (for elements at different indices, and as long as the code still synchronises between adding and subsequently accessing specific elements). Queueing from producer to concurrent consumer does not seem a very good use case for tbb::concurrent_vector, even less so than for std::vector which at least has constant-time access at a specific index but still doesn't scale well and also has no built-in way to forget about elements after they have been consumed. Using special values probably happens to work for fundamental types, but strictly speaking these are data races because the element type is not atomic.

Well, I am using a concurrent_vector in two points of my program.

The first uses a simple push_back, which I understand now is ok even when multiple threads push elements, since I don't care about the order.

The other is trickier:

void Builder::operator()(const tbb::blocked_range2d<std::size_t, std::size_t>& r) const
{
    // Some score
    int score;
    for (std::size_t source = r.rows().begin(); source != r.rows().end(); source++)
    {
        for (std::size_t target = r.cols().begin(); target != r.cols().end(); target++)
        {
            score = SomeFunction(source, target);
            
            // Would happily avoid this
            tbb::mutex::scoped_lock lock(mutex_);

            // The results variable is a tbb::concurrent_vector
            if (score > results->at(source))
            {
                results->at(source) = score;
            }
        }
    }
}

Here I would like to get rid of the global lock, as you can imagine! I don't know if I can render this swap of values atomic, somehow.

Do you have any suggestions?

Thanks & Cheer

I don't know why the code is shown like that, I used the code button in the editor... Anyway, this is what I do (very basic stuff!):

void Builder::operator()(const tbb::blocked_range2d<std::size_t, std::size_t>& r) const
{
    // Some score
    int score;
    
    for (std::size_t source = r.rows().begin(); source != r.rows().end(); source++)
    {
        for (std::size_t target = r.cols().begin(); target != r.cols().end(); target++)
        {
            score = SomeFunction(source, target);
            
            // Would happily avoid this
            tbb::mutex::scoped_lock lock(mutex_);

            // The results variable is a tbb::concurrent_vector
            if (score > results->at(source))
            {
                results->at(source) = score;
            }
        }
    }
}

lock can be avoided if :

 - there is guarantee that you do not change the same element of vector concurrently i.e. code like this results[7]=0 is not run concurrently  

As well following guarantee should be provided: 

 - elements being constructed at the moment (at different thread) is not accessed until end of construction. i.e. information about fact that elements were added should be transferred somehow to the reader from writer, and simple check of size change is not enought. 

Sensi,

I suggest you use or make yourself an atomic max object. Pseudo code:

... // your atomic max struct/calss

	atomic<int> value; // holds the max *** must be initialized to some min (INT_MIN)

	... // your function to test and apply candidate x to running max value

	int copyOfValue;

	while(x > (copyOfValue = value)) // snapshot copy of value

	   value.compare_and_swap(x, copyOfValue); // use above copy for compare

	

The above has no scoped lock (on every iteration)
It only has an atomic interlocked instruction whenever a new value (x) exceeds the observed value.
An attempt for update may occur multiple times as well as bail-out when other thread has inserted larger value.

Jim Dempsey

www.quickthreadprogramming.com

Let's just check the basics… why is results a tbb::concurrent_vector? I don't see it being grown concurrently in that little piece of code. You might as well use std::vector to do this, because you can safely access all elements of a std::vector concurrently (as if they were different variables) as long as its dimensions aren't also being modified concurrently (by resize(), push_back(), etc.). And std::vector has more efficient random access than tbb::concurrent_vector.

The lock can be avoided if you do a one-dimensional parallel loop over the sources, and use an inner sequential loop over the targets. If you have enough sources for parallelism ("parallel slack"), you don't need additional parallelism in the inner loop, and it might even be better to avoid it because it comes with overhead as well.

But even then you should avoid updating external storage in the innermost loop. You should keep the current maximum in a local variable, and only update the results vector at the end of that inner loop, and so get a significant reduction (no pun intended) in the number of times you acquire that lock (current 2-dimensional loop) or just write to external storage (nested loops). If you make this change, you might perhaps even keep the 2-dimensional loop (there's not always One Right Way to do things).

The reason why reduction might have been a pun is that computing the maximum is a reduction operation. If you have few sources and many targets, it might be beneficial to have an outer loop over the sources (to get rid of contention over the results vector), and an inner parallel_reduce() over the targets (to efficiently compute the maximum).

The first part of Anton's posting at #8 has been applied above by changing the 2-dimensional loop to a nested loop with sources the outermost loop; the second part confirms part of #5.

Jim's posting at #9 is right in the sense that atomics are often more efficient than locks, but in this case a simple refactoring of the loop seems a more appropriate solution.

I agree with Raf in that maintaining a local max is more  (most) efficient...

However, the standard TBB partitioners are not amenable to exposing the iteration space to the task. to wit:
? is this the first iteration of this iterator whereby I have to initialize the local max?
? was this the last iteration of this iterator whereby I have to atomically reduce into the shared max?

You can do this, by digging into the iterator object (or adding a member function).

Jim Dempsey

www.quickthreadprogramming.com

Isn't that the job of the algorithm rather than the Range object? parallel_reduce() either takes an Identity argument, or expects you to appropriately initialise the Body (also in the splitting constructor).

Kommentar hinterlassen

Bitte anmelden, um einen Kommentar hinzuzufügen. Sie sind noch nicht Mitglied? Jetzt teilnehmen