Delusion of tbb::concurrent_vector’s size or 3 ways to traverse in parallel correctly

Like in the previous post about concurrent_vector, I’d like to start from a sentence buried in the Reference Manual, which describes the size() method:

“The result may include elements that are under construction by concurrent calls to any of the growth methods”.

It means that you can not safely traverse across a container instance in parallel to growth operations of other threads, neither using simple for-loop with “i < v.size()” condition nor using the iterators for [begin(), end()) range. But why TBB does not provide a safe version of the size() method? There are few reasons and simple answer: it is almost impossible.

The vector is designed to grow concurrently, it means that a few grow requests can be “in flight” simultaneously:

concurrent_vector
[On this picture the size() represents old, TBB < 2.2 behavior. New size() is equal capacity() here.]
But construction of all requested elements cannot be done at once as atomic operation; they are constructed one by one. And it leads to the picture that you can see above: there can be “holes” of non-constructed elements, and even some segments can still be not allocated.

[Start of TBB < 2.2 specific] The latter is really bad because opens possibilities of segmentation fault for concurrent accesses. In TBB < 2.2, the problem involved also grow_to_at_least() method because it could check that current size() is enough and return immediately keeping segments being allocated. So we decided to fix the container in order to assure that memory is allocated up to the size reported by size() method or specified as the argument to grow_to_at_least().

In TBB < 2.2, you can get some safety by using the minimum of capacity() and size() as the size for which allocation is guaranteed to have occurred. It allows you to use operator[] (instead of safe at() which protects you from wrong indexes) because all elements in that range are sure allocated. [End of TBB < 2.2 specific]

But anyway, nothing in the container itself can prove to you that a given element is constructed completely.  However, I can suggest to you three external ways to traverse whole vector concurrently.

If you have only one thread growing the vector, then there is a simple solution. Add an external atomic variable “vector_size” and after each grow operation just immediately update it:

atomic<int> vector_size;
//…
v.grow_by(10);
vector_size = v.size();

It will give you guaranteed size of the vector, i.e. number of already constructed items. It is a very fast and simple approach but it will not work if you have more than one thread adding data.

In that case, an alternative is to use an external counter of grow operations in flight. So, you can trust to size() (only) when the counter equals to zero:

atomic<size_t> ops_counter; // = 0
//…
ops_counter++;
v.grow_by(10);
ops_counter--;
//…
size_t tmp_size = v.size();
if( !ops_counter ) vector_size = tmp_size;

This solution involves overhead of two additional atomic operations but is still acceptable unless frequency and amount of growth operations keep ops_counter > 0 most of the time. In that case, the last chance is to use some universal but heavy machinery.

Universal solution

The main idea is to detect which item is constructed by checking the item itself. First of all, you need stable state of the memory allocated for new items. For that, you could use zero-filling allocator (tbb::zero_allocator<> since TBB 2.2) or write an adapter that cleans the memory for existing allocator yourself. Then, you need a data type that is able to work starting with zero-filled memory instead of depending upon its constructor to run. For example, you could add a flag variable that is set to 1 at the end of constructor (remember to define it as atomic<> to get right fences and operation order). Another way is to use a pointer to the real data. If the pointer equals to zero, the element is not constructed surely. But when construction is done, you can store the pointer to make it visible for monitoring threads:

// zero-filling allocator will clear the memory before size() can report it
tbb::concurrent_vector<char*, tbb::zero_allocator<char*> > v;

// multiple concurrent tasks call e.g.
v.push_back("string");

// a monitor thread/task traverse the vector
for (int i = 0; i < v.size(); i++) {
       char* s = v.at(i);
       if(s == NULL) continue;
       // process it further
}

 

For more complete information about compiler optimizations, see our Optimization Notice.