concurrent_hash_map() -- sorted insertion, sorting and thread-safe iteration
In addition to not having a sort() method it seems that concurrent_hash_map() doesn't support sorted insertion?
Can anyone from Intel confirm this, and give a valid reason why such a feature existing in standard hash_map) would be
left out in TBB?
Furthermore, looking at the source code, accessors are thread-safe, but what about
iterators?
For example, how to advance through a sorted list of items in one thread which is controlled by
user input (navigation keys) and add/remove items in sorted order from another thread while keeping everything
thread-safe?
I am aware that I am not doing any parallel computation using concurrent_hash_map(), and
therefore I am perhaps not using it as intended, but I would still like to have a thread-safe hash_map().
If you find my post helpfull, please rate it and/or select it as a best answer where applies. Thank you. | |
Re: concurrent_hash_map() -- sorted insertion, sorting and thread-safe iteration
"In addition to not having a sort() method it seems that concurrent_hash_map() doesn't support sorted insertion?" C++'s map is essentially an ordered tree, TBB's concurrent_hash_map is essentially a hash table. std::map doesn't
support hashing, and access has logarithmic complexity, so it's not a one-sided deficiency: you just can't have it both
ways at the same time, I guess.
"Can anyone from Intel confirm this, and give a valid reason why such a
feature existing in standard hash_map) would be left out in TBB?" If you don't mind my giving this a try: C++
is getting an "unordered_map" (say what? OMG!), and maybe a concurrent skip list could be considered in TBB's future
(but you should check for yourself what has been written about this before, or maybe somebody could remind us).
"Furthermore, looking at the source code, accessors are thread-safe, but what about iterators?" Do you
have an example that makes sense with a hash map? And even so, complexity and performance considerations may require
separate implementations without or with concurrent iterators, i.e., concurrent iterators don't come cheap.
"For example, how to advance through a sorted list of items in one thread which is controlled by user input
(navigation keys) and add/remove items in sorted order from another thread while keeping everything thread-safe?" In this situation a conventional locked std::map would do fine: users are so slow you might as well go to sleep
waiting for them to do anything, you don't need the performance of a concurrent_hash_map(), let alone anything even
faster, all with their own trade-offs between user features and performance.
"I am aware that I am not
doing any parallel computation using concurrent_hash_map(), and therefore I am perhaps not using it as intended, but I
would still like to have a thread-safe hash_map()." It looks like you should write your own wrapper around
std::map to hide the lock.
| |
Re: concurrent_hash_map() -- sorted insertion, sorting and thread-safe iteration
Hello, thanks for taking the time to reply.
However, I am not sure I understand what you mean.
1.
Why are you comparing C++ map to the TBB hash_map when C++ also has a hash_map?
2. As far as I know C++
hash_map can be kept sorted or at least I have managed to do it with Microsoft implementation when I was experimenting
with it. Perhaps that was an error or a pure luck?
As for the example, lets say we have a key container:
typedef struct _Key {
wchar_t FileName[MAX_PATH];
DWORD NameLength;
} Key;
And a value container:
typedef struct _Value {
ULONGLONG FileSize;
FILETIME FileTime;
BOOL Selected;
} Value;
Now wrap those in classes:
class CKey
{
public:
CKey(const wchar_t *FileName)
{
wcscpy_s(_Data.FileName, MAX_PATH, FileName);
_Data.NameLength = wcslen(_Data.FileName);
}
~CKey(void)
{
}
// operator < supporting natural sort order
bool operator<(const CKey& x) const
{
return (StrCmpLogicalW(_Data.FileName, x._Data.FileName) < 0);
}
friend wostream& operator<<(wostream& o, const CKey& x)
{
o << L"FileName : " << x._Data.FileName << endl;
return o;
}
Key _Data;
};
And the value as well:
class CValue
{
public:
CValue(void)
{
}
~CValue(void)
{
}
friend wostream& operator<<(wostream& o, const CValue& x) { FILETIME lt; SYSTEMTIME st;
FileTimeToLocalFileTime(&x._Data.FileTime, <); FileTimeToSystemTime(<, &st);
o << L"FileTime : " << st.wYear << L"-"
<< st.wMonth << L"-" << st.wDay << L" " << st.wHour << L":" << st.wMinute
<< L":" << st.wSecond << L"." << st.wMilliseconds << endl; o << L"FileSize
: " << x._Data.FileSize << endl; o << L"Selected : "; if (x._Data.Selected) { o << L"TRUE"; } else { o << L"FALSE"; } o << endl; return
o; }
private:
Value _Data;
};
I have left out CValue methods for the sake of brevity. Now we define types for the list:
typedef hash_map<const CKey, CValue> TList;
typedef hash_map<const CKey, CValue>::iterator TIter;
typedef pair<const CKey, CValue> TPair;
typedef pair TInsert;
And extend stdext with our custom hash function:
namespace stdext {
// (One-at-a-Time Hash method by Bob Jenkins)
size_t hash_value(const CKey& x)
{
size_t h = 0;
for (size_t i = 0; i < x._Data.NameLength; i++) {
h += x._Data.FileName[i];
h += (h << 10);
h ^= (h >> 6);
}
h += (h << 3);
h ^= (h >> 11);
h += (h << 15);
return h;
}
};
Lets say that we have a list class:
class CImageList
{
public:
CImageList(void)
{
}
~CImageList(void)
{
}
BOOL Add(const wchar_t *FileName)
{
CKey key(FileName);
CValue val;
TInsert RetVal;
val.SetSelected(TRUE);
// insert does not invalidate iterators!
RetVal = _List.insert(TPair(key, val));
return RetVal.second;
}
void Dump(void)
{
ENTER_CS;
wcout << L"BEGIN" << endl;
for (TIter i = _List.begin(); i != _List.end(); i++) {
wcout << i->first;
wcout << i->second;
}
wcout << L"END" << endl;
LEAVE_CS;
}
private:
TList _List;
};
Again, I left out some of the methods, we just need add to check. Then we use the list class:
CImageList all;
all.Add(L"z100.jpg");
all.Add(L"z2.jpg");
all.Add(L"z1.jpg");
all.Dump();
This worked for me -- it somehow magically sorted the elements on insertion so I got the output:
z1 z2 z100
when iterating.
Now my questions:
1. Do you have any ideas if what I did
was by accident, or it really works that way? 2. How to make this code thread-safe except by rolling my own
locking?
Finally, what you said about users being slow is true -- what is fast is the file adding/removing
part which is done in another thread by watching a folder which being browsed by the user. Files can be added by a
download manager, archiver (on unpacking) and removed by user (program allows to select and delete multiple files from
the list). Lets say that I need fast add/remove for up to 10,000 elements, naturally sorted list and sequential
iterators and all of it should be thread-safe. If there is a better way to accomplish this than the hash_map() I would
be very glad to learn how.
If you find my post helpfull, please rate it and/or select it as a best answer where applies. Thank you. | |
Re: concurrent_hash_map() -- sorted insertion, sorting and thread-safe iteration
"Why are you comparing C++ map to the TBB hash_map when C++ also has a hash_map?" Not officially yet, but
it does seem widely supported, so my answer turned out to be somewhat pedantic, sorry.
"As far as I know
C++ hash_map can be kept sorted or at least I have managed to do it with Microsoft implementation when I was
experimenting with it. Perhaps that was an error or a pure luck?" If this is not luck (a 17% chance with 3
entries), it still seems suboptimal and probably nonportable.
Intuitively I would presume that a locked std::map is up to the task of tracking file system directory events as
well as user events, so I would start with that and then see/profile what happens, unless someone has a better idea.
| |
Re: concurrent_hash_map() -- sorted insertion, sorting and thread-safe iteration
1. Not officially yet, but it does seem widely supported, so my answer
turned out to be somewhat pedantic, sorry.
2. If this is not luck (a 17% chance with 3 entries), it still
seems suboptimal and probably nonportable.
3.
Intuitively I would presume that a locked std::map is up to the task of tracking file system directory events as well as
user events, so I would start with that and then see/profile what happens, unless someone has a better idea.
1. No problem, I know that it is not part of the C++ standard, that is why Microsoft has moved it to stdext but I
don't care as long as it works.
2. Portability is a non-issue for me, this is a Windows application we are
talking about which has other dependencies much more serious than a simple C++ class. As for suboptimal -- perhaps, but
perhaps still optimal enough for my needs?
3. I see... but my key is the filename -- wouldn't map() be slower
on add/remove/find operations than a hash_map()?
I was hoping that Intel's concurrent_hash_map() will be just
a concurrent implementation of MSVC hash_map() (i.e. that it will have this sorting functionality I need) so I get
thread safety by using it without having to reinvent the wheel. I Guess I was wrong, it seems I will have to roll my own
after all. Too bad that I will have to waste time on this part of the code instead of working on the application
features.
If you find my post helpfull, please rate it and/or select it as a best answer where applies. Thank you. | |
Re: concurrent_hash_map() -- sorted insertion, sorting and thread-safe iteration
"I see... but my key is the filename -- wouldn't map() be slower on add/remove/find operations than a hash_map()?" Complexity is logarithmic resp. constant, so from a certain size a map would become slower, but I don't have any
numbers for that.
| |
Re: concurrent_hash_map() -- sorted insertion, sorting and thread-safe iteration
"I see... but my key is the filename -- wouldn't map() be slower on
add/remove/find operations than a hash_map()?" Complexity is logarithmic resp. constant, so from a certain size a
map would become slower, but I don't have any numbers for that.
Guess I will have to stick with hash_map() then, too bad that I will have to implement locking myself.
My other question regarding concurrent_hash_map() not being sorted still stands.
If you find my post helpfull, please rate it and/or select it as a best answer where applies. Thank you. | |
Re: concurrent_hash_map() -- sorted insertion, sorting and thread-safe iteration
"My other question regarding concurrent_hash_map() not being sorted still stands." And I'm curious about the exact cost of this hybrid map vs. purely ordering or purely hashing
(stdext::hash_map::insert() is documented as O(log N) (just like std::map), vs. C++0x's std::unordered_map::insert()'s
O(1), not considering any constant factors), and how that works out for different use cases. It also seems potentially
annoying, if not blocking, to have to come up with a total order on the elements if no natural total order exists. But I
grant that it may be useful sometimes (as a third, separate option), like it may be useful to make your own poor man's
hybrid of a locked std::map and tbb::concurrent_hash_map.
| | |