Traversing concurrent_hash_map concurrently

People keep asking how to traverse tbb::concurrent_hash_map concurrently with other operations. But it wouldn’t be worth the blog if there was not any problem. The Reference clearly states that:

Concurrent operations (count, find, insert, and erase) invalidate any existing iterators that point into the table, An exception to this rule is that count and find do not invalidate iterators if no insertions or erasures have occurred after the most recent call to method rehash.


So, officially, you cannot traverse the hash map concurrently with deletion, insertion, and even to lookup operations in some cases. Or in other words, we don’t guarantee that it will work. But what will actually happen if you try? Basically, an iterator doesn’t synchronize its operations with any concurrent operation. It keeps moving from pointer to pointer returning unprotected addresses of items. So, the bad news is that a concurrent erase operation can deallocate the item you are traversing through and you’ll get garbage or an access violation exception. However, the good news is that beside erase() no other concurrent operation can deallocate already published address in TBB >=2.2. It hints that concurrent traversal is nevertheless possible if synchronized with concurrent erasure. Let’s see what can happen while traversing while insert and lookup operations are also occurring.

Current concurrent_hash_map implementation uses classic “separate chaining” data structure for the hash table. It means that each bucket has a pointer to a linked list of stored items. And concurrent resizing can happen if the bucket count is less or close to the number of inserted items. But it works without reallocation of the table as in serial algorithms and leads to specific concurrent rehashing which moves some items from an old bucket with lower index into a new bucket with higher index. The following pseudo code shows the sequence of operations used to move an item and can be executed by any concurrent operation:
[cpp:nocontrols]*prev_pointer = node->next; // exclude item’s node from old bucket
node->next = new_bucket->node_list;
new_bucket->node_list = node; // add item’s node to new bucket[/cpp:nocontrols]
The whole movement operation is executed under locks of corresponding buckets. However, as noted before, iterators don’t respect the locks and they are not visible for iterators except perhaps by some memory ordering. Fortunately, modern processors read or write pointer locations atomically (assuming the location is properly aligned) because of they are machine-word size. It means that a thread doing concurrent traversal will not read garbage but on the other hand can read an old, not-yet-updated value. So despite the fact that the whole data-structure will be in a valid state at any given time, an iterator can miss some actually stored items or meet the same item twice. For example, please consider an iterator pointing to an item moved to a new bucket, it has pointers to a bucket and a node of an item:











Traversal thread Concurrent rehashing

[cpp:nocontrols]my_bucket = old_bucket;
my_node = my_bucket->node_list;

my_node = my_node->next;
if( !my_node ) my_node = (++my_bucket)->node_list;[/cpp:nocontrols]

[cpp:nogutter:nocontrols]..
..
my_node->next = new_bucket->node_list;
..
..[/cpp:nogutter:nocontrols]


Here, other items stored in old_bucket are skipped and items stored in new_bucket can come up twice. It significantly limits traversal guarantees we can give for current concurrent_hash_map implementation and it is why we don’t support concurrent traversal officially. However, you might still want to traverse the map if you are fine with a chance of not visiting all your items in one pass or you can apply serial rehash() operation before traversal and so reserve enough buckets to prevent concurrent rehashing. For example, a use-case described here needs just to erase outdated items from the table. So, it is not sensitive to duplications and if in one pass it doesn’t meet some items, it will see them next time.

The code for such a traversal for removing outdated items may be like the following pseudo-code:
[cpp:nocontrols] for(iterator = table.begin(); iterator != table.end(); iterator++ ) {
// To prevent local reordering as discussed in the comments to the blog
_ReadWriteBarrier(); // or acquire fence for weak-ordering processors
// safe w/o concurrent erase but different words may be read inconsistently
// only shallow copy permitted, don't follow pointers
if( is_old( iterator->second.timestamp ) ) { // not protected from concurrent updates
accessor acc; // however key cannot be changed
table.find( acc, iterator->first ); // so get the lock
if( is_old( acc->second.timestamp ) ) // and double-check
table.erase( acc ); // erase by accessor
}
}[/cpp:nocontrols]
Please note that the code takes snapshot without the lock or accessor and so may read some intermediate data but if timestamp is tbb::atomic<> of the word size, it will work fine anyway. The big advantage of this “lockless” approach is that it doesn’t block concurrent operations. If you want to test how does it work in your real hardware, please find the test example attached:
[cpp:collapse]// Example that checks and demonstrates parallel traversal over tbb::concurrent_hash_map.
// Code is by Anton.Malakhov@intel.com

#include <utility>
#include <stdlib.h>
#include <assert.h>
#include \"tbb/concurrent_hash_map.h\"
#include \"tbb/compat/thread\"

typedef tbb::concurrent_hash_map<int,int> table_t;
table_t table;
const int max_keys = 1000000;

void collector() {
while(!table.empty()) for(table_t::iterator it = table.begin(); it != table.end();) {
// safe w/o concurrent erase BUT
// note, it is not protected from concurrent modification by \"consumer\"
std::pair<int,int> snapshot = *it++;
if( snapshot.second < 0 ) { // visited
table.erase( snapshot.first );
assert(snapshot.first == -snapshot.second);
#if VERBOSE
printf(\"%d\r\", snapshot.first );
#endif
} else assert(snapshot.first == snapshot.second);
}
}

void consumer() {
while(!table.empty()) for(int i = 1; i <= max_keys; i++) {
#if VERBOSE
if(!(i%10000)) printf(\"\non %d\n\", i );
#endif
table_t::accessor acc;
if( table.find( acc, i ) ) {
if( acc->second < 0 )
printf(\"Warning: recursion on <%d %d>: bad concurrency?\n\", i, acc->second);
else assert( i == acc->second );
acc->second = -acc->second;
}
}
}

int main() {
table.insert( std::make_pair(max_keys, max_keys) );
std::thread consumer_thread( consumer );
std::thread collector_thread( collector );
for(size_t c = 0; c < 10000000UL; c++) {
int key = 1 + int(rand()) % max_keys;
table.insert( std::make_pair(key, key) );
}
puts(\"awaiting cleanup\");
consumer_thread.join();
collector_thread.join();
puts(\"done\");
return 0;
}[/cpp:collapse]

Bottom-line


Although it is not documented, you can traverse tbb::concurrent_hash_map concurrently if you are fine with the following restrictions and possible workarounds for them:

    • You cannot traverse concurrently with an erase() operation. Use a mutex around erase and traversal code if you still need both running in different threads.

    • Some items can be missed due to concurrent rehashing. Try to reserve enough buckets before traversal using rehash() to prevent this or repeat the pass.

    • Some items can be visited twice or more times. Try to reserve enough buckets before traversal using rehash() to prevent this or put everything to a local serial map first to remove duplicates.

    • Access through iterators is not protected. Know what are you doing, e.g. use additional synchronization primitives like tbb::atomic to protect one member or acquire accessor to protect the rest.



Next time I’ll write about new concurrent hash table with much stronger concurrent traversal guarantees.

Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione