TBB containers vs. STL. Functionality rift

I've noticed that developers who start using TBB often ask questions "Whether I should replace STL containers (protected with locks where necessary) by their TBB counterparts in my application? Are TBB containers faster that those from STL?". Unfortunately questions formulated like this are too general to get a specific answer. Therefore I decided to make a concise overview of how TBB concurrent containers differ from the STL ones, why the particular design decisions were made, and what the efficiency of both approaches depends on (though I'm not a person who designed TBB containers I believe my understanding of the matter is basically sound, and in the worst case my colleagues will correct me if I misstate anything).

I'll try to keep this discussion general enough, without delving into the implementation details or analysis of specific use cases, so that after reading it you have an understanding, which criteria should be taken into account when choosing the solution for your particular application.

Let's have a look at how do the scopes of functionality of both container libraries correlate with each other.

First of all one have to understand that despite of similar names and a lot of commonality in interfaces of TBB and STL containers, the former are not simply thread-safe versions of the latter. Though we tried to make TBB interfaces as close as possible to those of STL, most of them have substantial differences in their usage models (and consequently in interfaces). To name the most prominent ones:
· concurrent vector does not support insert and erase operations (new items can only be pushed back), and cannot be shrunk;
· concurrent hash map does not support iterators, and uses accessors instead.

There are two main reasons why concurrent containers cannot be made thread-safe replicas of their STL counterparts. First, concurrent access invalidates some of the assumptions that the serial code relies on. Let's consider vector's method pop_back(). In the serial code it is often used as part of the following idiom:
T item = v.back();
v.pop_back();
But in concurrent case another thread may pop the same item while the first thread is in between of the two above operations.

Leaving such methods in the TBB interface would encourage TBB users to inadvertently write inherently flawed code. Such adverse scenarios would be quite probable since most of the programmers move to writing parallel code after getting substantial experience with the serial programming and its patterns. Another risk is that such unsafe methods would mask dangerous pieces of serial code when it is parallelized. Taking all this into account TBB designers decided to explicitly prohibit use cases that lose their original serial meaning or lead to indeterministic behavior in the multithreaded apps.

The fact that TBB culled STL interfaces still does not mean that what remains is a thread-safe subset of STL functionality. Some operations in TBB containers are not thread safe, like reserve() and clear() in concurrent vector. Other methods are extensions to what STL has and reflect the specifics of the parallel usage models, e.g. the family of grow_xxx() methods in the concurrent vector or pop_if_present() and push_if_not_full() of concurrent queue.

Besides, some of the concepts though supported by concurrent containers may change their meaning slightly in comparison to their serial usage. For example the intuitively simple "first in first out" notion becomes somewhat fuzzy when a value may be pushed by one thread and popped by another. So before starting incorporating concurrent containers into your application for the first time, please read the reference (or at least tutorial) carefully, to avoid unpleasant surprises.

The second reason why the usage models of TBB and STL containers differ is more mundane. Actually it is what the whole multicore shift is about, … right, it's performance. Supporting some of the serial usage models though technically possible would be so inefficient that their applicability area would become marginally small (e.g. iterating over the concurrent hash map).

Besides, from the methodology standpoint, providing such tremendously inefficient APIs would entice unaware developers into writing code that covertly ruins performance of their applications, and increase the probability of the situations when a serial code that apparently has been successfully parallelized, works slower than its serial ancestor even on a quad-core machine.

At last, some of the dropped functionality, if implemented, would have taken unbearably heavy toll on the performance of other operations, and thus render the whole concurrent container next to useless. For example, insert() in concurrent vector, if allowed, would have drastically burdened both iterating and growing operations, which constitute the backbone of the container's usage model.

Now that you have a general idea why TBB concurrent containers are quite unlike even their closest STL relatives, I'd like to attract your attention to the fact that TBB (at least so far) provides noticeably fewer containers than STL does. I believe most of you have already guessed why it is so (disregarding the considerations of TBB team's limited resources). Anyway I'd recommend you to have a look at the blog discussing one particular case of not supported containers: "Linked Lists - Incompatible with Parallel Programming?".

Though I’ve already lavishly used words “performance” and “efficiency”, all that was said before does not give any clue as to which choice should you do in your particular application (and I promised that you’ll learn it (well, at least will have an idea)). I’m not going back on my words, and we’ll talk about it in my next post that will get to you sooner rather than later (hopefully before the end of this week). Cheers.

For more complete information about compiler optimizations, see our Optimization Notice.

Comments

I think that it's not only Ok for concurrent containers to have different interfaces, but it's also the right way. They are subject to distinct usage patterns. The early approaches to concurrent containers were single_threaded_container+mutex, so people used to the fact that concurrent container has the same interface accidentally.

All about lock-free algorithms, multicore, scalability, parallel computing and related topics:
http://www.1024cores.net