Loki / Bugs / #85 Race condition in RefCountedMTAdj::Release

Peter Kuemmel - 2006-01-19

priority: 5 --> 7
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Peter Kuemmel - 2006-01-19

Logged In: YES
user_id=1159765

Any ideas for a fix?

Here a first step:

A fix should avoid that Release deletes pCount_ and returns
true
when "at the same time" the use of the copy constructor starts.
So there must be a synchronization between
RefCountedMT(const RefCountedMT& rhs) and
bool Release(const P&)

Incrementing *pCount_ in the ctor is not sufficient, I think.
Also locking with ThreadingModel<RefCountedMT>::Lock
does not avoid some cases where *pCount is decremented before
the constructor it increments.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Richard Sposato - 2006-02-08

Logged In: YES
user_id=749922

We might be able to use a trick from the SmallObj.h file to
solve this. What if the code was changed to something like:

template <template <class> class ThreadingModel>
struct RefCountedMTAdj
{
typedef ThreadingModel< RefCountedMT, MutexPolicy >
MyLockingModel;

//...
bool Release(const P&)
{
typename MyLockingModel::Lock lock;
(void)lock;
--*pCount;
if (!*pCount_))
{
SmallObject<ThreadingModel>::operator delete(
const_cast<CountType *>(pCount_),
sizeof(*pCount_));
return true;
}
return false;
}
//...
}

The MyLockingModel::Lock object will keep the SmartPtr
locked for the entire body of the function. Once locked,
the count can be decremented directly without resorting to
the AtomicDecrement function. When the function returns,
the lock on the count is automatically released.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Peter Kuemmel - 2006-02-28

Logged In: YES
user_id=1159765

/*
Hi Rich,
I can't see atomic-member-function-calls-only eliminates
the problem.
To demonstrate this I've written a "byte code" simulation
of the problem, and it seems to me that atomic calls can't
avoid the order in void bug().
*/

// storage policy:
double *ptr;

void del(bool del)
{
if(del)
delete ptr;
}

// essence of RefCountedMT
int* pcount;
struct Ref
{
void ctor(int* rhs)
{
pCount = rhs;
}
void clone()
{
++*pCount;
}
bool Release()
{
--*pCount;
return *pCount==0;
}

private:
int* pCount;
};

Ref t1; // thread 1
Ref t2; // thread 2

bool ok1;
bool ok2;

void init()
{
// both smart pointers have somewhere the pure pointer
ptr = new double;

// init counter
pcount = new int;
*pcount = 0;

// t1 already points to ptr
t1.ctor(pcount);
t1.clone();
}

void bug()
{
// this is the faulty order
// t1 releases and t2 gets a copy
// all calls correspond to atomic member function calls!!
t2.ctor(pcount);
ok1 = t1.Release();
t2.clone();
del(ok1);

// t2 releases
ok2 = t2.Release();
del(ok2);
}

int main()
{
init();

bug();

return 0;
}

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Steven Barbaglia - 2006-04-20

Logged In: YES
user_id=1420849

some more observations... some re-iteration of Peter's
comments using different words to elicit fresh ideas... a
bit negative only because of my inability to see a solution
at present...

1) There is still no warning in the documentation that
SmartPtr does not support simultaneous read and writes (ie,
there will be impredictable results in changing what an
instance of SmartPtr is pointing to when the said instance
is accessabile by other threads)

2) Can a volatile resource (eg, the counter in RefCountedMT)
be used to synchronise distinct instances? Isn't there a
contraddiction here? Is this the theoretical principle that
was breached?

3) The race condition on the writer side extends from the
decision to delete the counter, through the deletion of the
counter and referenced object and finally to the assignment
of the new counter and referenced object. The reader should
not be allowed to get hold of the pointers to the doomed
counter and doomed referenced object. Currently, this
'transaction' is covered by distinct methods in RefCountedMT
(as illustrated by Peter's example code), so I do not see
how synchronisation can be achieved within the class.

4) Object-level synchronisation is not an option, since
multiple instances are involved. A class level
synchronisation object is a heavy-handed solution and could
be a bottleneck. Could a fallback position be for the
reader to be able to detect that it now holds dangling
pointers? Maybe throw an exception?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Peter Kuemmel - 2006-05-30

Logged In: YES
user_id=1159765

Modified Files:
SmartPtr.h
Log Message:
add warning about mt bug

Index: SmartPtr.h

RCS file: /cvsroot/loki-lib/loki/include/loki/SmartPtr.h,v
retrieving revision 1.30
retrieving revision 1.31
diff -u -d -r1.30 -r1.31
--- SmartPtr.h 17 May 2006 16:04:32 -0000 1.30
+++ SmartPtr.h 30 May 2006 14:30:19 -0000 1.31
@@ -305,6 +305,10 @@
/// Implementation of the OwnershipPolicy used by SmartPtr
/// Implements external reference counting for
multithreaded programs
/// Policy Usage:
RefCountedMTAdj<ThreadingModel>::RefCountedMT
+///
+/// \par Warning
+/// There could be a race condition, see bug "Race
condition in RefCountedMTAdj::Release"
+///
http://sourceforge.net/tracker/index.php?func=detail&aid=1408845&group_id=29557&atid=396644
////////////////////////////////////////////////////////////////////////////////

template <template <class, class> class ThreadingModel,
@@ -1521,6 +1525,9 @@

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andreas Kunit - 2006-07-14

Logged In: YES
user_id=1555499

I don't see a race condition.

Hypothesis:
The copy constructor is called
right before the delete operation of
the smart pointer is called.

Proof by contradiction:
1. When the delete operation is to call,
there's only one smart pointer pointing
to the pointee.

2. The one and only smart pointer must be
the base for any copy operation.

3. When the delete operation is to call,
the one and only smart pointer leaves
his scope of validity.

4. When the smart pointer leaves his
scope of validity, it may not be the base
for a copy operation anymore.

5. No copy constructor may be called
right before the delete operation.

-------------------------------------------------

The other remark in the text is that smart pointers
don't feature a synchronisation mechanism. IMO
they don't have to. There are other patterns which
cover this concern better.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Richard Sposato - 2007-04-27

status: open --> closed-wont-fix
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Richard Sposato - 2007-04-27

Logged In: YES
user_id=749922
Originator: NO

Hi Folks,

I asked on comp.lang.c++.moderated if anybody had recommendations for how to fix this bug.
http://groups.google.com/group/comp.lang.c++.moderated/browse_thread/thread/93c6579e620a40fa/be06b51a7bccb6bb?hl=en#be06b51a7bccb6bb

I did not see any good recommendations. My conclusion is that no change to Loki's SmartPtr can fix this bug. There is no thread-safe way to write copy-constructor that works while the copied object's destructor is active in another thread. Even if we could modify the copy-ctor and dtor so they run okay in most situations, that won't fix the situation where the source object is completely destroyed in one thread before the copy-constructor even begins in another thread. Indeed, the fact that one object is trying to copy another while that other object is being destroyed indicates a higher-level design problem.

I reluctantly recommend we close this bug as "Won't Fix". We can document this issue with a note in the SmartPtr header file.

Cheers,

Rich

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Richard Sposato - 2011-10-26

I might be able to fix this bug by using GCC's atomic operation functions - or with InterlockedIncrement and InterlockedDecrement on Visual-Studio.

Reopening bug and assigning to myself for further study.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Richard Sposato - 2011-10-26

assigned_to: nobody --> rich_sposato

status: closed-wont-fix --> open-accepted
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Race condition in RefCountedMTAdj::Release

Group

Searches

Help

#85 Race condition in RefCountedMTAdj::Release

Discussion

Index: SmartPtr.h