May 3, 2008 11:53 PM PDT
Are smart pointers threadsafe?
I'm using the Boost smart pointers but I have a feeling they're not threadsafe? Intrusive_ptr could quite easily be made threadsafe because by design you control the actual counter and accesses to it. Shared_ptr is harder because it would involve modifying the source code.
Now that smart pointers are becoming part of the C++ standard I think they should be considered for TBB support (just like TBB has its own thread-safe container implementations). I realize not everything can be included but smart pointers are widely used and notoriously hard to implement so they're good candidates.
The reference says shared_ptr offers the same thread-safety as built-in types. I've interpreted this as they're like integers for example which need to be updated in atomic operations if they're shared among threads and simultaneously read and written. This is why TBB offers atomic<T> isn't it?
Well, I may be wrong but I don't think shared_ptr is "atomic". Could someone please shed some light on this.
The documentation could be made clearer, but shared_ptr is thread-safe with regard to its reference counter (which is all you really need), even if the shared_ptr object itself is not thread-safe.
Once you have safely copied or assigned p to p1 and p2, e.g., as automatic variables or parameters, which therefore both refer to the same object underneath (the referent), you can do whatever you like to p1 in thread 1 and whatever you like to p2 in thread 2: dereference (as long as the referent is thread-safe for those accesses), reassign, reset, make further copies, anything. If you access the identical pointer instance however, e.g., as a member variable in a shared object, you need a mutex if any of those accesses are writes, perhaps a read/write mutex; there would probably be little or no benefit in extending thread safety to shared_ptr instances, which would become bigger and slower as a result, although you could easily encapsulate a shared_ptr if you would like to do that anyway.
I just don't understand example 4 (how can p2 be used in thread A and go out of scope in thread B?), and they probably should have added some examples of simultaneous use of separate instances.
(Added) I don't think a competing smart pointer should be added to TBB: boost has intentionally limited its number of possible smart pointers as much as it could for the sake of interoperability (however tempting it might be to add more with special behaviour), which is a very good reason.
I have used boost::shared_ptr in TBB applications before and it seemed to work just fine. I believe there are atomic increment and decrement operations within it. For some reason I think there is a mutex buried in there as well, but I'm not sure that there is... I would ask on the Boost mailing lists for a better answer... it should be possible to replace the boost atomic operations with ones from TBB, again ask on the boost mailing list for more details.
In my experience, boost::shared_ptr had horrible performance. However, this is all relative. In my case, the pointers would increment and decrement millions of times in a short time... the majority of my algorithm related to pointer manipulations, so it became a bottleneck in that case... which looking back on it I say "of course". If you have a large number of pointers, which are being used a lot... I would consider a different approach, but of course that was just based on my experience, and code profiling revealed that something insane like 60% of execution time was spent on shared_ptr atomic operations.
I have written pointer containers that use TBB constructs instead, you might be interested in those. They are used for exception safety, to call delete on all contained pointers if it goes out of scope. Boost has something similar too.
Thank you both. I feel confident now that the Boost shared_ptr is as thread-safe as I need it to be.
With Boost shared_ptr as thread-safe as they are it shouldn't be necessary to have them in TBB too, unless there is some significant advantage like they could be made much more efficient or even safer or something.
Regarding the efficiency of reference counting per se. I'm aware that this kind of memory management comes at a price so I'm careful about not overusing it and I don't use it in "algorithm style" code.
Architecture-specific reference count management underneath shared_ptr is in boost/detail/sp_counted_base*.hpp. Note that it uses gcc where TBB confusingly uses linux (in tbb/machine). From a quick glance at gcc_x86 it seems decent enough?
(Added) A mutex is used in boost if there is no architecture-specific support for atomic access to a variable, at the cost of increasing its size and lowering its performance. TBB atomics are the size of the underlying type and always require a minimal set of supported architecture-specific operations, so it will not (be able to) fall back to using mutexes.
I know now that Shared_ptr is thread-safe enougth for my needs and often the best solution but sometimes one wants a more lightweight alternative and I've been using Intrusive_ptr. To make it easier to use I've stolen this class from Beyond the C++ Standard Library by B. Karlsson, If a class inherits it publicly it becomes "intrusive enabled".
class IntrusiveCount { // base class to enable derived class for use with the Boost intrusive smart pointer
The problem with the above class is that its not thread-safe so I've tried to modified it using TBB. I would appreciate if someone could have a look at it and tell me if there's something wrong with my approach.
class IntrusiveCount { // thread-safe base class to enable derived class for use with the Boost intrusive smart pointer
Just as an FYI, this is what I use. The code is available via SVN, and you are free to use it.
There are some improvements that can be made to this code, but here it is.
/* Copyright 2007-2008 Adrien Guillon. All Rights Reserved.
This file is part of TBB Community Code.
TBB Community Code is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation.
TBB Community Code is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with TBB Community Code; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
As a special exception, you may use this file as part of a free software library without restriction. Specifically, if other files instantiate templates or use macros or inline functions from this file, or you compile this file and link it with other files to produce an executable, this file does not by itself cause the resulting executable to be covered by the GNU General Public License. This exception does not however invalidate any other reasons why the executable file might be covered by the GNU General Public License. */
/** * A vector that stores pointers to "owned objects." This container assumes * ownership of the pointers it holds, and will call delete on them when this * object is destructed, or appopriate member function calls are made. * * The goal of this class is to provide a method of storing objects in an * exception safe manner, without using smart pointers. * * Instantiate the template with the type of object it will contain, not * with pointers itself. * * The Mutex can be any TBB mutex type, including RW types. * * The deleter template argument is for a function object which * implements void operator()(T* x), and deletes the object. By * default, normal C++ delete is called. */
/** * Construct an empty ptr_vector */ ptr_vector() { }
/** * The destructor calls delete on all contained object pointers. * * TODO: Should the destructor use the mutex? */ ~ptr_vector() { if(!empty()) { deleteAll(); } }
/** * Takes ownership of an object. This container will now own the object, * and is responsible for the destruction of the object unless the user * destroys it themselves with delete_member(x) or steal_member(x). * * This function is thread safe. */ T& giveOwnership(T* x) { { // BEGIN MUTEX typename Mutex::scoped_lock lock(_ownedObjectsMutex); _ownedObjects.push_back(x); } // END MUTEX
return *x; }
/** * Take ownership of an object away from this container. The object * will no longer be owned by this container, and a pointer will be * returned to the user. * * This function is thread safe. */ T* takeOwnership(T& x) { typename container_type::iterator i; T* objectToRemove;
{ // BEGIN MUTEX
typename Mutex::scoped_lock lock(_ownedObjectsMutex); i = std::remove(_ownedObjects.begin(), _ownedObjects.end(), &x); objectToRemove = *i; _ownedObjects.erase( i, _ownedObjects.end() ); } // END MUTEX
return objectToRemove; }
/** * Call delete on an object owned by this container. The object deleted * is removed from ownership. * * This function is thread safe. */ void deleteMember(T& x) { &nbs
p; T* objectToDelete; objectToDelete = takeOwnership(x); _deleter(objectToDelete); }
/** * Call delete on all objects owned by this container. All objects * deleted are removed from ownership. A mutex is used to govern * access of internal data structures. This is the same function which * is called by the destructor. * * This function is thread safe. */ void deleteAll() { { // BEGIN MUTEX
/** * Check if this container owns anything. * * This function is thread safe. * * @return true if there are no objects owned, false otherwise. */ bool empty() const { return _ownedObjects.empty(); }
/** * Swaps the contents of this ptr_vector with a ptr_vector instance * provided as an argument. * * This function is thread safe, however it requires two mutexes, * one for this ptr_vector and one for the ptr_vector argument. */ void swap(ptr_vector& x) { { // BEGIN MUTEX ON x typename Mutex::scoped_lock lock_x(x._ownedObjectsMutex);
{ // BEGIN MUTEX ON this typename Mutex::scoped_lock lock_this(_ownedObjectsMutex);
/** * Actual object container. */ Container _ownedObjects;
/** * Copying a ptr_vector is forbidden in case the user accidently &
nbsp; * does so. If copying a ptr_vector were permitted, the contents * of the rvalue would have to be cleared, which might not be * what the user expected. Instead, the user must call the function * swap(). */ ptr_vector(const ptr_vector& x);
/** * This is illegal so that users can't accidently lose memory. They * should instead use swap(). */ ptr_vector& operator=(const ptr_vector&);
/** * Instantiate an actual instance of the deleter. */ Deleter _deleter;
/** * Mutex for the internal data structure. */ Mutex _ownedObjectsMutex; };
template <typename T> class ptr_vector_default : public ptr_vector<T, std::vector<T*>, tbb::spin_mutex, boost::checked_deleter<T> > { };
I would appreciate if someone could have a look at it and tell me if there's something wrong with my approach.
class IntrusiveCount { // thread-safe base class to enable derived class for use with the Boost intrusive smart pointer public: ...//skipped when quoting
(p->ref_count.fetch_and_add(-1) == 1) delete p; // decrement counter. If it was 1 before it's 0 now so delete
}
...//skipped when quoting private: IntrusiveCount(const IntrusiveCount&); tbb::atomic<int> ref_count; };
I had to change the counter to an int (from short) otherwise I get this level 2 warning on VS2008 standard,
C:UsersadminDocumentsMint bbsrcinclude bbatomic.h(157) : warning C4244: 'argument' : conversion from 'tbb::internal::atomic_traits<Size,M>::word' to 'short', possible loss of data
You did it almost right, except for one thing. tbb::atomic<>::fetch_and_add returns the value before the atomic addition, i.e. it is equivalent to a postfix increment, while the original code used a prefix increment.
In fact, you do not need to use fetch_and_add at all, because tbb::atomic<> overloads increment and decrement operators, in both postix and prefix forms. So the only change in the original code would be in the declaration of ref_count as tbb::atomic<int>.
By the way, which TBB package did you use? I wonder about this VS2008 warning. I believe our tests would show it; so most probably, it has been fixed in latest updates of TBB.
The code was actually correct, if a bit convoluted, because the result of the postfix decrement was compared with 1, but keeping the original code would have been easier all right.
By the way, which TBB package did you use? I wonder about this VS2008 warning. I believe our tests would show it; so most probably, it has been fixed in latest updates of TBB.
I'm using the latest commersially alligned download, tbb20_020oss.
In fact, you do not need to use fetch_and_add at all, because tbb::atomic<> overloads increment and decrement operators, in both postix and prefix forms. So the only change in the original code would be in the declaration of ref_count as tbb::atomic<int>.
Re the warning: I think this sort of changes for shutting down various compiler warnings was left out of TBB 2.0 commercial updates and thus from com-aligned releases.
So you might try the recent stable release 20080408 and check if the warning is "fixed" there (it should be I believe). You could as well just ignore it for the moment, it's harmless in this particular case of tbb::atomic<short>.
Well, I think I stick with tbb::atomic<int>. It's okay for the time being, my program is far from complete, and I can change it later.
And by the way I see now it's only a level 4 warning. I have other level 4 warnings from the TBB package. I've disabled those because I was told they won't be removed.
Lets say making the Intrusive_ptr thread-safe was as simple as making the counter atomic and something similar is in place for Shared_ptr so that they're equal in this respect from an efficiency standpoint.
Now could it be that Intrusive_ptr has an efficiency advantage because the counter is kept close to the actual object in memory? I don't mean during allocation (where Shared_ptr is at a disadvantage because the counter must be separately allocated on the heap) but later if the counter is frequently updated. Isn't Intrusive_ptr likely to produce fewer page faults for example?
So what do you say. Is the proximity of the counter to the object an advantage for Intrusive_ptr?
Intuitively, I would say that you don't need to touch the counter while using the referent, and not having the counter inside the referent makes that more likely to fit in fewer cache lines (minor influence), one of which other threads are less likely to pry away from you (major influence), so shared_ptr may well perform better (at the cost of more storage and a one-time (de)allocation penalty).
On the other hand, a page fault sounds scary, so I'm not willing to wager a bet on it; the best in performance might be to use aligned allocation and pad the counter up to cache line/sector size, depending on where it is in the object layout, but then how much effort and memory have you spent, maybe causing even more page faults... In another context I heard that you don't need to worry about proximity for dynamic memory allocation, that it tends to happen automagically, so maybe there's no need to worry after all (as long as it doesn't get close enough to cause false sharing).
It would be interesting to hear about relevant test results.
I think it would depend on the pattern of accesses of pointers and referent. If a thread adding a reference was always followed by an action to modify the referent, an intrusive pointer could provide a modicum of latency-hiding. If, however, a bunch of threads are adding/removing references while some other thread possesses "ownership" of the referent (meaning it's modifying it), some small argument could be made for the nonintrusive model. The refering threads will still fight over the cache line containing the reference count, and should any of those threads try reading the referent, then the owner will have to spill the beans and flush the modified cache lines to memory.
Given these very specialized cases, the real benefit of the shared pointer has nothing to with access efficiency and everything to do with not having to modify the original object just to add a smart pointer.