concurrent_hash_map::erase

concurrent_hash_map::erase

Does anyone have a good example of using concurrent_hash_map::erase()?

I have been trying to get this working for the last week, and it appears that it will let me erase a key twice (returns tru in both threads), even though the documents indicate otherwise. Also how are you supposed to when to free the contents of the hash element?

For instance:

tbb::concurrent_hash_map my_map;

some threads:

tbb::concurrent_hash_map::accessor acc_w;

if(my_map.insert(acc_w, new_Key)) {

acc_w->second = new Item();

}

some more threads:

if(my_map.erase(acc_w, some_Key)) {

// get here multiple times for same Key

// don't know what pointer is to free it - could do a find and stash pointer, but since Keys

// can be reused, the contents could change between the find() and the erase()

}

I also tried defining the map just as:

tbb::concurrent_hash_map my_map;

But couldn't get this working as Item has dynamically allocated data members in it, and despite putting in a copy constructor and assignment operators couldn't get it to work.

thanks for any assistance.

Simon Cope

26 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

In the latest development release (tbb20_20080311oss_src)'s concurrent_hash_map.h, there seem to be at least a few problems.

First about the issue at hand... On lines 632 through 637, in lookup(), there is a race between setting return_value and upgrading chain_lock to a write lock, which can be removed by a change such as the following:

#if 0 // race
return_value = true;
chain_lock.upgrade_to_writer();
bucket_mutex_t::scoped_lock item_lock( b->mutex, true );
b_temp = remove_from_list(key,c);
--s.my_logical_size;
goto delete_temp;
#else // solution
if( !chain_lock.upgrade_to_writer() ) {
// Rerun search_list, in case another thread inserted the item during the upgrade.
b = search_list(key,c);
}
if( b ) {
return_value = true;
bucket_mutex_t::scoped_lock item_lock( b->mutex, true );
// TODO: eliminate remove_from_list by reusing the result of search_list,
// which should provide access to whatever needs changing,
// for some additional performance for items near the end of a list
b_temp = remove_from_list(key,c);
--s.my_logical_size;
&n
bsp; goto delete_temp;
}
#endif

Another, similar, problem with lookup() is that there is a race between evaluating grow and write-locking the segment on lines 608 through 618, which can be removed by a change such as the following:

#if 0 // race
segment& s = get_segment(h);
#if TBB_DO_THREADING_TOOLS||TBB_DO_ASSERT
bool grow = op==op_insert && s.internal_grow_predicate();
#else
bool grow = op==op_insert && s.my_logical_size >= s.my_physical_size;
#endif /* TBB_DO_THREADING_TOOLS||TBB_DO_ASSERT */
segment_mutex_t::scoped_lock segment_lock( s.my_mutex, /*write=*/grow );
if( grow ) {
// Load factor is too high
grow_segment(segment_lock,s);
}
#else // solution
segment& s = get_segment(h);
segment_mutex_t::scoped_lock segment_lock( s.my_mutex, /*write=*/false ); // may be upgraded later
bool grow;
// grow must be re-evaluated every time s.my_mutex is temporarily given up,
// and upgrade to writer is only required if( grow ), which is satisfied
// by a do-while with short-circuit "and"
do {
#if TBB_DO_THREADING_TOOLS||TBB_DO_ASSERT
grow = op==op_insert && s.internal_grow_predicate();
#else
grow = op==op_insert && s.my_logical_size >= s.my_physical_size;
#endif /* TBB_DO_THREADING_TOOLS||TBB_DO_ASSERT */
} while (grow /*short-circuit*/&& s.my_mutex.upgrade_to_writer() );
if( grow ) {
// Load factor is too high
grow_segment(segment_lock,s);
// TODO: for better readability, downgrade to reader lock again here instead of inside grow_segment?
// also document the write lock as a precondition for grow_segment (in grow_segment's body)
}
#endif

A suggestion: it would probably benefit the readability of get_segment() to use redundant brackets to indicate relative operator priority (always a good idea when bitwise operators are involved, so maybe there are other locations in TBB where this might be applied).

Some trivial corrections: the comment should be about another thread "removing" the item in this case, and, elsewhere in the file, the comment with the declaration of lookup() should read "//! Does heavy lifting for "find", "insert" and "erase"."

A remark about the second "problem": it is only innocent in the code that I corrected in that it may cause the hash map to temporarly become too large, but there is a potentially dangerous data race on the evaluation itself, because my_physical_size may be changed non-atomically and temporarily contain garbage or even if it too is atomic it might be changed in an unfortunate order relative to my_logical_size, unless it just so happens that the only situation in which a false negative is generated is when another thread is already growing the segment, which probably happens by at least 2 places anyway, but that's too iffy for comfort.

Thank you, Raf! I've fixed the first problem implicitly in my refactored version of the container and thus missed a test for this issue. So, I'll improve the unit-test.

But I do not see what's wrong with the second part.
* my_physical_size is of size of machine-word and aligned to appropriate boundary thus any storage operation to it is atomic.
* bool grow is only the first guess in double-check sequence which ended in grow_segment(). There wrong "false" value cant happen because size is supposed only to grow. And wrong "true" value is possible but doesn't break anything due to second check.

On the other hand, it makes me think that late update of my_physical_size might cause slight performance losing by acquiring write locks instead of read-only ones in concurrent threads. So, thanks for your attention to this.

Raf,

while you are right about the race in op_remove (632-637), the fix for it is simpler:

chain_lock.upgrade_to_writer();
if( b_temp = remove_from_list(key,c) ) {
return_value = true;
--s.my_logical_size;
}
goto delete_temp;

There is no need to re-run search_list because remove_from_list runs through the list anyway, and it returns either NULL or the element removed from the list.

Also I agree with Anton there is no race with regard to segment growth, as my_physical_size is only changed when the write lock is held on the segment, and the need to actually perform grow operation is re-tested inside the lock. It's a classical double-check pattern. The only thing to improve it is to update my_physical_size as soon as it is safe, for sake of its readers.

Anyway Anton is reworking concurrent_hash_map these days, so stay tuned for updates.

As for your comment about parenthesis, gcc 4.3 forced us to put those in all TBB headers.

Simon,

"double" erases are due to the issue we discussed above, and will be fixed soon. Thanks for reporting it!

As for your attempt to keep data by value, it should work if you properly define the assignment operator. concurrent_hash_map::insert() allocates space to hold a user data object, calls the default constructor, and returns the accessor that can be used to fill in the space. You might then assign any value to the object kept in the container, e.g.:

if(my_map.insert(acc_w, new_Key)) {
acc_w->second = my_existing_item;
}

About Anton's reply:

About the second part, I missed the repeated check that avoids growing too large because it is not reflected in the name "grow_segment" or the name "grow" or the comment "Load factor s too high", which hints at a maintenance issue (those increase the likelihood of introducing bugs).

Even if my_physical_size happens to be aligned and sized to ensure atomic operations (but then why is my_logical_size atomic: it has the same size and is, I presume, aligned as well, so there's a contradiction here) on processors currently considered (where is it documented that TBB only targets processors-compiler combinations with data buses at least as wide as size_t, and how likely is this requirement to (continue to) be met by future processors?), I think leaving out a fence (if that is safe!) should only be done explicitly by making my_physical_size atomic anyway and specialising template atomic instead (in a hardware-specific way), which would also increase performance for other atomic's (but might also change the memory model, if there were one and if it would specify tbb::atomic to be similar to Java's volatile, see my as yet unanswered thread on the subject).

A wrong false could happen if some other thread increments my_physical_size before my_logical_size or decrements my_logical_size before my_physical_size, so one would have to check the rest of the code. For example, could clear() be a problem, if it could be called at an inopportune time: no, my_physical_size is decremented first; but there's no documentation of that fact, and it would be, again, a potential maintenance hazard, and a big one this time, and if it couldn't occur at an inopportune time, it already took too much time to find out. Another example is grow_segment(), where my_physical_size is incremented first, so if, for some small current size, a high enough number of threads all evaluate the test at the wrong time and get a wrong false, a problem will occur; even if this is highly unlikely, statistically speaking, I would prefer the code to prevent that, anyhow, especially if it is not documented and motivated because it serves a greater good (I would not know what that could be), and if I'm wrong about this it still hints at a maintenance issue.

Even if the code is currently correct, it should be easier to verify that (by more documentation to motivate any trade of robustness for performance), so that more code can be verified in the same amount of time or less time spent verifying the same amount of code, and it should be easier to preserve correctness, and to be assured of correctness (currently, it's a bit too "iffy" for me even if I'm wrong about grow_segment()).

(I do not understand the remark about "late update of my_physical_size", i.e., what changes you envision.)

About Alexey's reply:

Yes, it is probably simpler to leave search_list() as it is and keep a dedicated remove_from_list().

How would the write lock help to protect a test that is not locked yet, not even by a read lock (which I added in my suggested solution)? The double check should be documented, not hidden by misleading names and comments (although I still suggest my solution instead), and the wrong false might still be a problem (in practice or just for maintenance). I don't know how you would update my_physical_size earlier (got to leave now), but such a change would not be very helpful to prevent wrong-false occurrences.

Sorry for the amount of text in my latest two replies, but that's what's required to point out "maybe an unlikely mishap", and I still prefer "clearly no possible mishap".

Alexey,

I'm not too fond of using pointer values as boolean test values (I prefer Java's rules about that), especially when combined with an assignment (because it might be a mistyped comparison), but this comment about your code change is really about the disappearance of the bucket mutex scoped lock: is there no assurance that, while an item in a hash map is being used, it continues to be an element in that map, and was the previous code wrong in that regard as well, or should the lock be there anyway (I think so!)?

Raf

Anton,

While you're "reworking concurrent_hash_map" as Alexey writes, how about renaming things back to more traditional and familiar hash table speak (and correcting some typos): node<-bucket<-chain<-node_list, bucket<-list, "linked list"(without quotes)<-linked-bucket, zero-intialized<-zero-initialized, Unorderd<-Unordered and maybe element<-item (from leftmost arrow to rightmost arrow, replace source/right of arrow by destination/left of arrow, anywhere in a word, mostly replace all (manually is safer, and some occurrences of chain should be unchanged in the comments); renaming the variables is more laborious)?

Also, my_physical_size is not "Size of chains", it's "Size of array" (before and after, although naming it array_size makes more sense), which could perhaps be slightly better tuned relative to my_logical_size by eliminating some of the smaller sizes and perhaps reallocating the array at an occupation rate of slightly lower or higher than one (segment array occupation rate vs. chain length)... which makes me realise, and I hope I'm right now, that I should retract most of what I wrote about wrong grow values because it doesn't really matter for data structure integrity as long as my_physical_size is at least one for non-empty segments, but then there's no need to double check in grow_segment either, is there? Anyway, sorry for realising that only now (well, it wasn't documented, and I got distracted).

(Added) I just saw in current_hash_map.cpp that Intel Thread Checker agrees with me about the race, but it is not the double check that saves the day, because the program would be correct without it to protect against a wrong true (just occasionally less efficient), and it doesn't protect against a wrong false, which with an empty array would lead to incorrect execution. What saves the day there is that (as Anton indicated) arrays do not shrink or disappear in concurrent methods, but only with the additional reasoning that this would mean that my_physical_size is wrongly read with a nonzero value, but that would mean a race between a non-concurrent operation such as clear() and this one, which the programmer is meant to avoid. The double check avoids premature growth of a segment (which is a strange precaution if the code never bothers to shrink a segment even if my_logical_size drops all the way to zero), but (harmless) temporary overpopulation of a segment due to wrong false can still occur (the fact that arrays only grow has nothing to do with that), although with 64 segments you would need a currently unlikely number of threads before this in itself would lead to overly long chains. The trouble is that, without making all this explicit (or at least the harmful part), it is too easy for the code to change in a way that might lead to incorrect execution, so I'm sticking with my original suggestion after all.

Raf

Raf,

Re: using pointer values as boolean tests etc: code style can be easily put into accordance with TBB coding guidelines (which are partly imposed by various compiler warnings we have to clean up for the sake fo our customers). I put the code just to illustrate the idea.

Re: bucket locking: yes, I removed the lock there because it is excessive. Check my logic:
- removing a bucket from its chain is alwaysdone while the chain lock is held for exclusive access; thus noone else can remove this bucket but other threads can still use it via accessors.
- for bucket destruction, we need to ensure that noone else uses it. Therefore, _after_ removing it from the chain, it should be first locked exclusively; and this is done in the end of lookup (under the delete_temp label).
- on the other hand, taking the lock _before_ removal does not help at all, because it does not prevent other threads to also attempt acquiring the bucket lock.

Alexey,

(Edited) It seems to me that, with the exclusive node lock (I'm going with familiar terminology here, that has a direct chain of element nodes attached to a bucket) in that piece of code, the erase would only succeed with an exclusive lock on the chain and an exclusive lock on the node. That means that nobody is using the element at the time, and by the time the lock on the chain is released for anyone to be able to find() the element (which requires shared access on the chain), the node is already removed from the chain, so it is really the lock in delete_temp that is superfluous because nobody else could possibly have a lock (likewise for a b_temp from a failed insert). And the effect is that when you find() an element, it continues to be part of the map until you are done with it, which is also what James Reinders thought when he wrote "Intel Threading Building Blocks" (bottom of p. 92: "Therefore, before removing the key, it waits on any other extant accesses to the key."), so removing the lock from that piece of code would be breaking (his and my understanding of) the contract (though I admit I don't see anything as explicit in the reference manual).

(Added) The easiest location for a lock is inside remove_from_list(), to protect the statement "*p = b->next;", and the one in delete_temp can then be discarded.

Raf

Well, it's another possiblelock sequencewhich is also correct.

I said "also", because what is the harm of removing a node from its chain while the node is locked? Virtually none, because a) the node is alive as long as the locks are held - so its data are alive as well, and b) node accessor is not an iterator, so noone will try iterating over the container in parallel mode (and the documentation should have explicitly saidthat iterators and other STL compatibility support is thread-unsafe).

Thus I do not see any loss of correctness - as long as a thread holds an accessor, the data are there; as soon as it releases the accessor, there is no guarantee the data are there; and guarantees are the same in my variant, aren't they? For a 3rd party observer the situation is different: in your case, if a node is locked, it will be found in the container by an observer, in my case, it is not guaranteed. But I doubt this property has any use.

Performance-wise, however, my lock sequence is better, because it minimizes the time the chain lock is held for writing. In your case, it's acquired for writing, and then held for a while until the node is also locked for writing. No other thread can ever read the whole chain all this time! In my case, it's acquired for writing,a nodeis removed, and the chain is released, so other threads can read it.

I fully acknowledge that moving the lock to delete_temp is better for concurrency, even after another change to avoid holding a write lock on the chain (see below), but if it is acceptable it is also still synchronous garbage collection. It is even better performance-wise, if memory can be dedicated to it, to set a flag in the node and pass responsibility to whoever is currently using the element (assuming that try_acquire() will only fail if another thread is holding the lock, which is an interesting question in itself, or another safe hand-off method), because then the thread removing the element won't be blocked; there is a free choice here about what a third-party observer should see.

It's all a matter of who might be inconvenienced by behavioral changes, and I don't have any particular preference at this time, as long as the possible impact is duly considered: with the first change, a third party might be able to insert another element with the same key while the one being removed is still in use (which might be surprising), with the second change, the mapped data must be self-managing from an unpredictable thread, not to mention the orthogonal combination.

bucket_mutex_t::scoped_lock item_lock( b->mutex, true );
if( chain_lock.upgrade_to_writer() ) {
// TODO: jump out, releasing both item_lock and chain_lock, and try again
// (otherwise deadlock might occur because locks are acquired
// in a different order, unless another solution to that can be found)
}
else {
b_temp = remove_from_list(key,c);
if( b_temp ) {
return_value = true;
--s.my_logical_size;
}
goto delete_temp;
}

(Added) Just a quick reminder, to anyone not closely following, that the behaviour "in my case" of holding a write lock on the chain while waiting to lock the element node to be deleted was merely inherited from tbb20_20080311oss_src, and that recursive_hash_map tries to have at most one element on average in each chain so this is probably not really an urgent problem anyway. Meanwhile, having a little too much time on my hands, and thinking that one source file is worth a thousand words, I have contributed a fairly invasive revision of recursive_hash_map (it produces a lot of colour on a graphical diff, anyway, although much of that is caused by reverting to more familiar terminology, but I would like to know if my idea of cacheing the hash values in the nodes might be beneficial), and I will try to give this forum a break now.

In the code I wrote immediately above, I had the return value of upgrade_to_writer() the other way around (oops!), and probably it does not even matter whether the upgrade_to_writer() temporarily released the chain lock or not (or does it?).

In the new development release (tbb20_20080402oss_src), for any given key, it will now be possible for an unlimited number of tasks to be independently using or modifying their own private items (all with that same key) that aren't even in the map though the tasks are all holding (const_)accessors, while blocking an equal number of other tasks (or one less than that if one item is still in the map) all waiting to garbage-collect those items. The only way to avoid that is to never call erase() directly on a key (unless you somehow know that nobody else is using it when that might make a difference), but instead to always get an accessor (not a const_accessor) and erase that accessor ("accessor a; if(m.find(a,k)) erase(a);"). Hopefully no existing code will break.

However, there's a funny way of failing (unless it happens to you), if you try to erase an accessor, because, if erase() cannot atomically upgrade_to_writer() on the chain mutex, some other erase() might intervene, remove the element (which is no longer lock-protected), and part of the eriginal erase() runs with a key referring to memory that has been deallocated, possibly causing some other item to be erased instead, or worse. Best to make a copy anyway and/or keep the lock.

And maybe you've also noticed that the workaround really isn't a workaround at all: when erasing an accessor, if the chain lock is temporarily given up, behaviour reverts to a simple erase() on the key (why?), and an unlimited number of other tasks might get an accessor to the same item anyway before it is removed from the map, so I guess a safe erase() really requires a mutex on the concurrent_hash_map. :-)

Both these problems can be remedied, but it is still changing the existing erase() instead of dedicating a new method or new erase() overload to the new behaviour.

Raf, thank you for helping to improve the library and providing your opinion.

The reason to add two overloaded erase() methods that take accessors wasn't to workaround anything; we were asked to provide the methods to erase an element by an accessor. As you seem to think there are more problems in the reworked container than before, can you please provide some examples or code snippets of what do you think is wrong? In particular, I wonder about the followingstatements. What exactly "part"do you mean there:

...part of the eriginal erase() runs with a key referring to memory that has been deallocated...

?
Also from the following statement,

these problems can be remedied, but it is still changing the existing erase() instead of dedicating a new method or new erase() overload to the new behaviour.

it seems that we think different of guarantees provided by concurrent_hash_map in presense of erase() methods being executed in parallel with insert() and find(). What is your vision of guarantees the container should provide?

Also you said in one of above posts that you contributed some portion of code. Usually I automatically receiveall contributions done via the special page on the TBB site, but I am quite sure I do not saw anything like you described. If you used that special page, have you ever get a confirmation that you contribution was received?

Yes, the details, sorry...

First some corrections again (with my apologies, also to myself...): please forget the code in "03-22-2008, 6:08 PM"; in "04-04-2008, 1:06 AM" the problems with erase(accessor), both the bug and the ineligibility as a workaround, are about try_acquire() and giving up the item lock, and it could probably never be made to work as a workaround anyway without also removing the need for a workaround itself.

About "The reason to add two overloaded erase() methods that take accessors wasn't to workaround anything": Of course I did not mean to imply that, that's just what I was trying to use it for here.

About the quote "...part of the eriginal erase() runs with a key referring to memory that has been deallocated...": There is a bug in erase(accessor), which passes the item's key from inside the node by reference to (private method) exclude(), and if exclude() fails to try_acquire() the chain lock (the non-blocking way) it gives up the item lock before calling acquire() (the blocking way), opening up the possibility that another thread intervenes and successfully erases the item including deallocation of the node carrying the item and the key inside it while it is still being referred to by the original thread, so maybe another node will end up being erased, or worse; erase(const_accessor) does not have this problem because it passes a copy of the key.

About "What is your vision of guarantees the container should provide?": That mainly depends on other people's experience, but I just think/feel that abandoning the guarantee that a (const_)accessor keeps the item locked inside the map might unnecessarily cause some existing code to fail and put a burden on future code, even if item deallocation is currently already asynchronous. On the other hand, if, instead of needing a read lock on the chain while waiting for an item lock, a task could transfer that chain lock to a place in the waiting list for the item lock (or a waiting count while spinning) from which it could be expelled by the current owner telling it to try again, concurrency might be improved (well, it's supposed to be just a very short chain anyway), and a task wishing to erase an item could first acquire a write lock on the item, hold onto it while acquiring a chain write lock without risking deadlock, clear the waiting list for the item lock (any expelled tasks should then try again to acquire the chain lock etc.), remove the node from the chain, release all locks, and deallocate the item node, thus keeping the guarantee on (const_)accessor, at the cost of a bigger item lock and more complexity.

Does that make sense? Would it be a waste of time if I tried this (it's not something simple like, say, porting to Solaris)?

The contributions page has/had a (known) issue (I tried GNU/Linux Ubuntu 7.10/Firefox 2.0.0.13/Flash LNX 9,0,115,0 and Sun Solaris Express Developer Edition 1/08/Firefox 2.0.0.9/Flash SOL 9,9,47,0, both giving error "Error Code: HTTP Error, File name: [...], Message: 400" ([...] mine) when I click "Upload Files"), but I now have some alternatives (Mac OS X seems to work).

Raf_Schietekat:There is a bug in erase(accessor), which passes the item's key from inside the node by reference ...; erase(const_accessor) does not have this problem because it passes a copy of the key.

Yes, this is a bug. Thanks for catching it! The key should be copied before the item lock is released.

Raf_Schietekat:... and a task wishing to erase an item could first acquire a write lock on the item, hold onto it while acquiring a chain write lock without risking deadlock, ...

In my opinion, there IS a risk of deadlock in the above scenario, because another thread can lookup for the same key, and lookup acquires chain lock first, and item lock second. This is why the item lock is released when erasing by const_accessor, and also why try_acquire is used for a chain locking attempt when erasing by accessor. And as far as I can judge, lookup can not be changed to different lock acquisition order, thus exclude should follow the same locking order as inlookup: the chain lock should be acquired first, either for reading and subsequent upgrade, or for writing, and only then the item lock could be acquired.

Raf_Schietekat:Does that make sense? Would it be a waste of time if I tried this (it's not something simple like, say, porting to Solaris)?

I understand in current implementation there is no guarantee that erasing by accessor will erase the exact element the accessor points to. Instead it might erase another element with the same key;this is rather non-intuitive behavior and it would be good if it can be improved. In your proposal, if correctness problems are resolved and performance impact isn't significant, it would be definitely worth considering. But it might turn out into a waste of time as well; as you see, the main problem of erasing by an accessor is that it inherently uses the opposite order of lock acquisition than that in lookup. So you decide.

About "In my opinion, there IS a risk of deadlock in the above scenario, because another thread can lookup for the same key, and lookup acquires chain lock first, and item lock second.": I admit that I didn't take this seriously enough earlier in this thread, but perhaps this time I'm not wasting people's time as much. That would be the main novelty: that a task never holds on to the segment and chain locks while acquiring an item lock, but that it would (atomically) abandon its other locks for a (revocable) place in the waiting set for the item lock. The goal is to prevent any blocking on segment and chain (except if the user-provided hashing misbehaves), so the map is free to hold on to the item lock while acquiring segment and chain locks (a small correction: to prevent some false restarts, it should probably remove the node from the chain before clearing the waiting set for the item lock). The item mutex will probably become bigger, though.

I think I might try this, using only platform-neutral functions, if I can get a lease of a week or two on this issue (I don't know how much time I'll have or need).

Let me check my understanding. So you are saying that a thread, after acquiring segment and chain lock and finding the item of interest, would not acquire item lock but instead queue itself into a waitlist for the item, and release those locks. And then what? Before starting to acquire the item lock, the thread will need to check if the lock is still valid, which in your vision seem to be equivalent of the thread still being in the waitlist. The question is, how the waitlist consistency is preserved - by yet another lock?

Yes, I think you could try. The current code of the hash map should be platform-neutral I think, except might be for some workarounds of platform-specific issues. For atomic operations, tbb::atomic should suffice. If some loads or stores need fences, there are __TBB_load_with_acquire and __TBB_store_with_release functions for that purpose.

Sorry for the delay... I made the changes earlier today, which wasn't too difficult after all because I leveraged spin_rw_mutex as a black box and just built on top of it, the result of which seems to pass muster with test_concurrent_hash_map.exe, apparently without performance penalty. It just adds a few atomics (waiting set count and vacate instruction, which should probably be packed together with spin_rw_mutex), and does some spinning of its own.

Raf,

We received your contribution and Anton is evaluating it; and we will share the results here. Thank you very much for providing your feedback and ideas, and demonstrating a good example of community collaboration.

So, thank you, Raf, for fruitful discussion and contribution. Now, I see that segments can be overpopulated in some cases at maximum on (threads-1) items due to "wrong false" inspired by my_logical_size. But it's ok for the moment.
We fixed "guarantee that erasing by accessor will erase the exact element the accessor points to" and deadlock which occurs for insert() while holding another accessor by the same thread and concurrently looking for the same existing item.
I believe changes will be included in coming development release. There is also may be performance benchmarks in which indicated that the contribution is slower than current version of the container.
I'm not sure whether "garbage collector" for the container and optional "blocking style" of erasing will be implemented because they require additional member of node structure at least. Though, I think they could be useful.

I made some changes that should make concurrent_hash_map unblockable (except between threads directly competing for the same key of course, whereas now, e.g., thread 1 holding an accessor lock can block thread 2 competing for the same lock, so far so good, which in turn blocks any thread trying to change the same chain or trying to grow the same segment, which in turn blocks any thread trying to even just read from the same segment), and will also have lower latency while another thread is concurrently reallocating a segment array (using fine-grained locking), all while restoring the guarantee that a key exists at most once in the context of a concurrent_hash_map.

Unfortunately, the target has moved in tbb20_20080512oss_src.

Before I try to merge both if I still feel like doing so myself, here are my changes relative to tbb20_20080408oss_src, for anyone interested.

(Added) Also see thread "Possible deadlock in TBB concurrent_hash_map".

Attachments: 

Leave a Comment

Please sign in to add a comment. Not a member? Join today