Consider the following usage example:
MySmartPtr shared; // both writer and reader can
access this smart pointer
// Writer thread
MySmartPtr update(new MyObj); // new referenced
object
shared = update; // shared smart pointer updated
// Reader thread
MySmartPtr read(shared); // shared pointer copied
to reserve referenced object
// access object through smart pointer
The referenced object is never updated once it is
wrapped in a smart pointer, but the shared smart
pointer is.
The reader takes a local copy of the shared smart
pointer in the expectation that thereafter it can
access the referenced object. Should the writer thread
update the shared object again (by assignment with a
smart pointer referencing a new object), the local
smart pointer in the reader will still be holding onto
the old referenced object. This means that the writer
and reader threads only potentially contend during the
assignment in the writer and the copy-construction in
the reader. At other times the reader and writer can
work at their own pace: for example, the writer might
be very fast and update the shared pointer often,
whilst the reader might save to database the
occasionally-retrieved snapshot.
Can the reader thread rely on the smart pointer
handling itself in a thread-safe way? My contention is
that there is a race condition not addressed in
RefCountedMT. Consider the following code extract from
the Loki library:
template <template <class> class ThreadingModel>
struct RefCountedMTAdj
{
//...
bool Release(const P&)
{
if (!ThreadingModel<RefCountedMT>::
AtomicDecrement(*pCount_))
{
SmallObject<ThreadingModel>::
operator delete(
const_cast<CountType
*>(pCount_),
sizeof(*pCount_));
return true;
}
return false;
}
//...
}
There is nothing to prevent the writer thread from
being interupted once the AtomicDecrement operation has
been completed, but before the delete operation (ie,
counter has reached zero).
Imagine the reader in my example takes its local copy
of the shared smart pointer precisely at that moment.
The code for both the copy constructor the the cloning
method suggest the operation will complete and the
reference counter will be incremented (ie, counter back
to one). Once the writer thread wakes up and completes
the delete operation, the smart pointer in the reader
will be holding a dangling pointer. Furthermore, the
Release method will return 'true', causing the deletion
of the referenced object. The reader might be in the
middle of accessing the refenrenced object through its
local smart pointer... ops...
As it stands, simultaneous reads and writes to smart
pointers can have unpredicable results (as in example
above). This limitation should be either documented or
removed with an enhancement to the code.
Logged In: YES
user_id=1159765
Any ideas for a fix?
Here a first step:
A fix should avoid that Release deletes pCount_ and returns
true
when "at the same time" the use of the copy constructor starts.
So there must be a synchronization between
RefCountedMT(const RefCountedMT& rhs) and
bool Release(const P&)
Incrementing *pCount_ in the ctor is not sufficient, I think.
Also locking with ThreadingModel<RefCountedMT>::Lock
does not avoid some cases where *pCount is decremented before
the constructor it increments.
Logged In: YES
user_id=749922
We might be able to use a trick from the SmallObj.h file to
solve this. What if the code was changed to something like:
template <template <class> class ThreadingModel>
struct RefCountedMTAdj
{
typedef ThreadingModel< RefCountedMT, MutexPolicy >
MyLockingModel;
//...
bool Release(const P&)
{
typename MyLockingModel::Lock lock;
(void)lock;
--*pCount;
if (!*pCount_))
{
SmallObject<ThreadingModel>::operator delete(
const_cast<CountType *>(pCount_),
sizeof(*pCount_));
return true;
}
return false;
}
//...
}
The MyLockingModel::Lock object will keep the SmartPtr
locked for the entire body of the function. Once locked,
the count can be decremented directly without resorting to
the AtomicDecrement function. When the function returns,
the lock on the count is automatically released.
Logged In: YES
user_id=1159765
/*
Hi Rich,
I can't see atomic-member-function-calls-only eliminates
the problem.
To demonstrate this I've written a "byte code" simulation
of the problem, and it seems to me that atomic calls can't
avoid the order in void bug().
*/
// storage policy:
double *ptr;
void del(bool del)
{
if(del)
delete ptr;
}
// essence of RefCountedMT
int* pcount;
struct Ref
{
void ctor(int* rhs)
{
pCount = rhs;
}
void clone()
{
++*pCount;
}
bool Release()
{
--*pCount;
return *pCount==0;
}
private:
int* pCount;
};
Ref t1; // thread 1
Ref t2; // thread 2
bool ok1;
bool ok2;
void init()
{
// both smart pointers have somewhere the pure pointer
ptr = new double;
// init counter
pcount = new int;
*pcount = 0;
// t1 already points to ptr
t1.ctor(pcount);
t1.clone();
}
void bug()
{
// this is the faulty order
// t1 releases and t2 gets a copy
// all calls correspond to atomic member function calls!!
t2.ctor(pcount);
ok1 = t1.Release();
t2.clone();
del(ok1);
// t2 releases
ok2 = t2.Release();
del(ok2);
}
int main()
{
init();
bug();
return 0;
}
Logged In: YES
user_id=1420849
some more observations... some re-iteration of Peter's
comments using different words to elicit fresh ideas... a
bit negative only because of my inability to see a solution
at present...
1) There is still no warning in the documentation that
SmartPtr does not support simultaneous read and writes (ie,
there will be impredictable results in changing what an
instance of SmartPtr is pointing to when the said instance
is accessabile by other threads)
2) Can a volatile resource (eg, the counter in RefCountedMT)
be used to synchronise distinct instances? Isn't there a
contraddiction here? Is this the theoretical principle that
was breached?
3) The race condition on the writer side extends from the
decision to delete the counter, through the deletion of the
counter and referenced object and finally to the assignment
of the new counter and referenced object. The reader should
not be allowed to get hold of the pointers to the doomed
counter and doomed referenced object. Currently, this
'transaction' is covered by distinct methods in RefCountedMT
(as illustrated by Peter's example code), so I do not see
how synchronisation can be achieved within the class.
4) Object-level synchronisation is not an option, since
multiple instances are involved. A class level
synchronisation object is a heavy-handed solution and could
be a bottleneck. Could a fallback position be for the
reader to be able to detect that it now holds dangling
pointers? Maybe throw an exception?
Logged In: YES
user_id=1159765
Modified Files:
SmartPtr.h
Log Message:
add warning about mt bug
Index: SmartPtr.h
RCS file: /cvsroot/loki-lib/loki/include/loki/SmartPtr.h,v
retrieving revision 1.30
retrieving revision 1.31
diff -u -d -r1.30 -r1.31
--- SmartPtr.h 17 May 2006 16:04:32 -0000 1.30
+++ SmartPtr.h 30 May 2006 14:30:19 -0000 1.31
@@ -305,6 +305,10 @@
/// Implementation of the OwnershipPolicy used by SmartPtr
/// Implements external reference counting for
multithreaded programs
/// Policy Usage:
RefCountedMTAdj<ThreadingModel>::RefCountedMT
+///
+/// \par Warning
+/// There could be a race condition, see bug "Race
condition in RefCountedMTAdj::Release"
+///
http://sourceforge.net/tracker/index.php?func=detail&aid=1408845&group_id=29557&atid=396644
////////////////////////////////////////////////////////////////////////////////
template <template <class, class> class ThreadingModel,
@@ -1521,6 +1525,9 @@
Logged In: YES
user_id=1555499
I don't see a race condition.
Hypothesis:
The copy constructor is called
right before the delete operation of
the smart pointer is called.
Proof by contradiction:
1. When the delete operation is to call,
there's only one smart pointer pointing
to the pointee.
2. The one and only smart pointer must be
the base for any copy operation.
3. When the delete operation is to call,
the one and only smart pointer leaves
his scope of validity.
4. When the smart pointer leaves his
scope of validity, it may not be the base
for a copy operation anymore.
5. No copy constructor may be called
right before the delete operation.
-------------------------------------------------
The other remark in the text is that smart pointers
don't feature a synchronisation mechanism. IMO
they don't have to. There are other patterns which
cover this concern better.
Logged In: YES
user_id=749922
Originator: NO
Hi Folks,
I asked on comp.lang.c++.moderated if anybody had recommendations for how to fix this bug.
http://groups.google.com/group/comp.lang.c++.moderated/browse_thread/thread/93c6579e620a40fa/be06b51a7bccb6bb?hl=en#be06b51a7bccb6bb
I did not see any good recommendations. My conclusion is that no change to Loki's SmartPtr can fix this bug. There is no thread-safe way to write copy-constructor that works while the copied object's destructor is active in another thread. Even if we could modify the copy-ctor and dtor so they run okay in most situations, that won't fix the situation where the source object is completely destroyed in one thread before the copy-constructor even begins in another thread. Indeed, the fact that one object is trying to copy another while that other object is being destroyed indicates a higher-level design problem.
I reluctantly recommend we close this bug as "Won't Fix". We can document this issue with a note in the SmartPtr header file.
Cheers,
Rich
I might be able to fix this bug by using GCC's atomic operation functions - or with InterlockedIncrement and InterlockedDecrement on Visual-Studio.
Reopening bug and assigning to myself for further study.