From: SourceForge.net <no...@so...> - 2011-10-26 00:16:25
|
Bugs item #1408845, was opened at 2006-01-18 00:50 Message generated for change (Comment added) made by rich_sposato You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=396644&aid=1408845&group_id=29557 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Open >Resolution: Accepted Priority: 7 Private: No Submitted By: Steven Barbaglia (stevenuk) >Assigned to: Richard Sposato (rich_sposato) Summary: Race condition in RefCountedMTAdj::Release Initial Comment: Consider the following usage example: MySmartPtr shared; // both writer and reader can access this smart pointer // Writer thread MySmartPtr update(new MyObj); // new referenced object shared = update; // shared smart pointer updated // Reader thread MySmartPtr read(shared); // shared pointer copied to reserve referenced object // access object through smart pointer The referenced object is never updated once it is wrapped in a smart pointer, but the shared smart pointer is. The reader takes a local copy of the shared smart pointer in the expectation that thereafter it can access the referenced object. Should the writer thread update the shared object again (by assignment with a smart pointer referencing a new object), the local smart pointer in the reader will still be holding onto the old referenced object. This means that the writer and reader threads only potentially contend during the assignment in the writer and the copy-construction in the reader. At other times the reader and writer can work at their own pace: for example, the writer might be very fast and update the shared pointer often, whilst the reader might save to database the occasionally-retrieved snapshot. Can the reader thread rely on the smart pointer handling itself in a thread-safe way? My contention is that there is a race condition not addressed in RefCountedMT. Consider the following code extract from the Loki library: template <template <class> class ThreadingModel> struct RefCountedMTAdj { //... bool Release(const P&) { if (!ThreadingModel<RefCountedMT>:: AtomicDecrement(*pCount_)) { SmallObject<ThreadingModel>:: operator delete( const_cast<CountType *>(pCount_), sizeof(*pCount_)); return true; } return false; } //... } There is nothing to prevent the writer thread from being interupted once the AtomicDecrement operation has been completed, but before the delete operation (ie, counter has reached zero). Imagine the reader in my example takes its local copy of the shared smart pointer precisely at that moment. The code for both the copy constructor the the cloning method suggest the operation will complete and the reference counter will be incremented (ie, counter back to one). Once the writer thread wakes up and completes the delete operation, the smart pointer in the reader will be holding a dangling pointer. Furthermore, the Release method will return 'true', causing the deletion of the referenced object. The reader might be in the middle of accessing the refenrenced object through its local smart pointer... ops... As it stands, simultaneous reads and writes to smart pointers can have unpredicable results (as in example above). This limitation should be either documented or removed with an enhancement to the code. ---------------------------------------------------------------------- >Comment By: Richard Sposato (rich_sposato) Date: 2011-10-25 17:16 Message: I might be able to fix this bug by using GCC's atomic operation functions - or with InterlockedIncrement and InterlockedDecrement on Visual-Studio. Reopening bug and assigning to myself for further study. ---------------------------------------------------------------------- Comment By: Richard Sposato (rich_sposato) Date: 2007-04-27 16:38 Message: Logged In: YES user_id=749922 Originator: NO Hi Folks, I asked on comp.lang.c++.moderated if anybody had recommendations for how to fix this bug. http://groups.google.com/group/comp.lang.c++.moderated/browse_thread/thread/93c6579e620a40fa/be06b51a7bccb6bb?hl=en#be06b51a7bccb6bb I did not see any good recommendations. My conclusion is that no change to Loki's SmartPtr can fix this bug. There is no thread-safe way to write copy-constructor that works while the copied object's destructor is active in another thread. Even if we could modify the copy-ctor and dtor so they run okay in most situations, that won't fix the situation where the source object is completely destroyed in one thread before the copy-constructor even begins in another thread. Indeed, the fact that one object is trying to copy another while that other object is being destroyed indicates a higher-level design problem. I reluctantly recommend we close this bug as "Won't Fix". We can document this issue with a note in the SmartPtr header file. Cheers, Rich ---------------------------------------------------------------------- Comment By: Andreas Kunit (segelohr) Date: 2006-07-14 04:04 Message: Logged In: YES user_id=1555499 I don't see a race condition. Hypothesis: The copy constructor is called right before the delete operation of the smart pointer is called. Proof by contradiction: 1. When the delete operation is to call, there's only one smart pointer pointing to the pointee. 2. The one and only smart pointer must be the base for any copy operation. 3. When the delete operation is to call, the one and only smart pointer leaves his scope of validity. 4. When the smart pointer leaves his scope of validity, it may not be the base for a copy operation anymore. 5. No copy constructor may be called right before the delete operation. ------------------------------------------------- The other remark in the text is that smart pointers don't feature a synchronisation mechanism. IMO they don't have to. There are other patterns which cover this concern better. ---------------------------------------------------------------------- Comment By: Peter Kuemmel (syntheticpp) Date: 2006-05-30 07:32 Message: Logged In: YES user_id=1159765 Modified Files: SmartPtr.h Log Message: add warning about mt bug Index: SmartPtr.h =================================================================== RCS file: /cvsroot/loki-lib/loki/include/loki/SmartPtr.h,v retrieving revision 1.30 retrieving revision 1.31 diff -u -d -r1.30 -r1.31 --- SmartPtr.h 17 May 2006 16:04:32 -0000 1.30 +++ SmartPtr.h 30 May 2006 14:30:19 -0000 1.31 @@ -305,6 +305,10 @@ /// Implementation of the OwnershipPolicy used by SmartPtr /// Implements external reference counting for multithreaded programs /// Policy Usage: RefCountedMTAdj<ThreadingModel>::RefCountedMT +/// +/// \par Warning +/// There could be a race condition, see bug "Race condition in RefCountedMTAdj::Release" +/// http://sourceforge.net/tracker/index.php?func=detail&aid=1408845&group_id=29557&atid=396644 //////////////////////////////////////////////////////////////////////////////// template <template <class, class> class ThreadingModel, @@ -1521,6 +1525,9 @@ ---------------------------------------------------------------------- Comment By: Steven Barbaglia (stevenuk) Date: 2006-04-20 09:20 Message: Logged In: YES user_id=1420849 some more observations... some re-iteration of Peter's comments using different words to elicit fresh ideas... a bit negative only because of my inability to see a solution at present... 1) There is still no warning in the documentation that SmartPtr does not support simultaneous read and writes (ie, there will be impredictable results in changing what an instance of SmartPtr is pointing to when the said instance is accessabile by other threads) 2) Can a volatile resource (eg, the counter in RefCountedMT) be used to synchronise distinct instances? Isn't there a contraddiction here? Is this the theoretical principle that was breached? 3) The race condition on the writer side extends from the decision to delete the counter, through the deletion of the counter and referenced object and finally to the assignment of the new counter and referenced object. The reader should not be allowed to get hold of the pointers to the doomed counter and doomed referenced object. Currently, this 'transaction' is covered by distinct methods in RefCountedMT (as illustrated by Peter's example code), so I do not see how synchronisation can be achieved within the class. 4) Object-level synchronisation is not an option, since multiple instances are involved. A class level synchronisation object is a heavy-handed solution and could be a bottleneck. Could a fallback position be for the reader to be able to detect that it now holds dangling pointers? Maybe throw an exception? ---------------------------------------------------------------------- Comment By: Peter Kuemmel (syntheticpp) Date: 2006-02-28 08:03 Message: Logged In: YES user_id=1159765 /* Hi Rich, I can't see atomic-member-function-calls-only eliminates the problem. To demonstrate this I've written a "byte code" simulation of the problem, and it seems to me that atomic calls can't avoid the order in void bug(). */ // storage policy: double *ptr; void del(bool del) { if(del) delete ptr; } // essence of RefCountedMT int* pcount; struct Ref { void ctor(int* rhs) { pCount = rhs; } void clone() { ++*pCount; } bool Release() { --*pCount; return *pCount==0; } private: int* pCount; }; Ref t1; // thread 1 Ref t2; // thread 2 bool ok1; bool ok2; void init() { // both smart pointers have somewhere the pure pointer ptr = new double; // init counter pcount = new int; *pcount = 0; // t1 already points to ptr t1.ctor(pcount); t1.clone(); } void bug() { // this is the faulty order // t1 releases and t2 gets a copy // all calls correspond to atomic member function calls!! t2.ctor(pcount); ok1 = t1.Release(); t2.clone(); del(ok1); // t2 releases ok2 = t2.Release(); del(ok2); } int main() { init(); bug(); return 0; } ---------------------------------------------------------------------- Comment By: Richard Sposato (rich_sposato) Date: 2006-02-07 17:39 Message: Logged In: YES user_id=749922 We might be able to use a trick from the SmallObj.h file to solve this. What if the code was changed to something like: template <template <class> class ThreadingModel> struct RefCountedMTAdj { typedef ThreadingModel< RefCountedMT, MutexPolicy > MyLockingModel; //... bool Release(const P&) { typename MyLockingModel::Lock lock; (void)lock; --*pCount; if (!*pCount_)) { SmallObject<ThreadingModel>::operator delete( const_cast<CountType *>(pCount_), sizeof(*pCount_)); return true; } return false; } //... } The MyLockingModel::Lock object will keep the SmartPtr locked for the entire body of the function. Once locked, the count can be decremented directly without resorting to the AtomicDecrement function. When the function returns, the lock on the count is automatically released. ---------------------------------------------------------------------- Comment By: Peter Kuemmel (syntheticpp) Date: 2006-01-19 03:34 Message: Logged In: YES user_id=1159765 Any ideas for a fix? Here a first step: A fix should avoid that Release deletes pCount_ and returns true when "at the same time" the use of the copy constructor starts. So there must be a synchronization between RefCountedMT(const RefCountedMT& rhs) and bool Release(const P&) Incrementing *pCount_ in the ctor is not sufficient, I think. Also locking with ThreadingModel<RefCountedMT>::Lock does not avoid some cases where *pCount is decremented before the constructor it increments. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=396644&aid=1408845&group_id=29557 |