[ loki-lib-Bugs-1408845 ] Race condition in RefCountedMTAdj::Release

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Bugs item #1408845, was opened at 2006-01-18 00:50
Message generated for change (Comment added) made by rich_sposato
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=396644&aid=1408845&group_id=29557

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Open
>Resolution: Accepted
Priority: 7
Private: No
Submitted By: Steven Barbaglia (stevenuk)
>Assigned to: Richard Sposato (rich_sposato)
Summary: Race condition in RefCountedMTAdj::Release

Initial Comment:
Consider the following usage example:

	MySmartPtr shared; // both writer and reader can 
access this smart pointer 
    
    // Writer thread
	MySmartPtr update(new MyObj); // new referenced 
object
	shared = update; // shared smart pointer updated

    // Reader thread
	MySmartPtr read(shared); // shared pointer copied 
to reserve referenced object
    // access object through smart pointer

The referenced object is never updated once it is 
wrapped in a smart pointer, but the shared smart 
pointer is.  

The reader takes a local copy of the shared smart 
pointer in the expectation that thereafter it can 
access the referenced object.  Should the writer thread 
update the shared object again (by assignment with a 
smart pointer referencing a new object), the local 
smart pointer in the reader will still be holding onto 
the old referenced object.  This means that the writer 
and reader threads only potentially contend during the 
assignment in the writer and the copy-construction in 
the reader.  At other times the reader and writer can 
work at their own pace: for example, the writer might 
be very fast and update the shared pointer often, 
whilst the reader might save to database the 
occasionally-retrieved snapshot.

Can the reader thread rely on the smart pointer 
handling itself in a thread-safe way?  My contention is 
that there is a race condition not addressed in 
RefCountedMT.  Consider the following code extract from 
the Loki library:

    template <template <class> class ThreadingModel>
    struct RefCountedMTAdj
    { 
    //...
            bool Release(const P&)
            {
                if (!ThreadingModel<RefCountedMT>::
AtomicDecrement(*pCount_))
                {
                    SmallObject<ThreadingModel>::
operator delete(
                        const_cast<CountType 
*>(pCount_), 
                        sizeof(*pCount_));
                    return true;
                }
                return false;
            }
    //...
    }
    
There is nothing to prevent the writer thread from 
being interupted once the AtomicDecrement operation has 
been completed, but before the delete operation (ie, 
counter has reached zero).

Imagine the reader in my example takes its local copy 
of the shared smart pointer precisely at that moment.  
The code for both the copy constructor the the cloning 
method suggest the operation will complete and the 
reference counter will be incremented (ie, counter back 
to one).  Once the writer thread wakes up and completes 
the delete operation, the smart pointer in the reader 
will be holding a dangling pointer.  Furthermore, the 
Release method will return 'true', causing the deletion 
of the referenced object.  The reader might be in the 
middle of accessing the refenrenced object through its 
local smart pointer... ops... 

As it stands, simultaneous reads and writes to smart 
pointers can have unpredicable results (as in example 
above).  This limitation should be either documented or 
removed with an enhancement to the code.

----------------------------------------------------------------------

>Comment By: Richard Sposato (rich_sposato)
Date: 2011-10-25 17:16

Message:
I might be able to fix this bug by using GCC's atomic operation functions -
or with InterlockedIncrement and InterlockedDecrement on Visual-Studio.

Reopening bug and assigning to myself for further study.

----------------------------------------------------------------------

Comment By: Richard Sposato (rich_sposato)
Date: 2007-04-27 16:38

Message:
Logged In: YES 
user_id=749922
Originator: NO

Hi Folks,

I asked on comp.lang.c++.moderated if anybody had recommendations for how
to fix this bug.
http://groups.google.com/group/comp.lang.c++.moderated/browse_thread/thread/93c6579e620a40fa/be06b51a7bccb6bb?hl=en#be06b51a7bccb6bb

I did not see any good recommendations.  My conclusion is that no change
to Loki's SmartPtr can fix this bug.  There is no thread-safe way to write
copy-constructor that works while the copied object's destructor is active
in another thread.  Even if we could modify the copy-ctor and dtor so they
run okay in most situations, that won't fix the situation where the source
object is completely destroyed in one thread before the copy-constructor
even begins in another thread.  Indeed, the fact that one object is trying
to copy another while that other object is being destroyed indicates a
higher-level design problem.

I reluctantly recommend we close this bug as "Won't Fix".  We can document
this issue with a note in the SmartPtr header file.

Cheers,

Rich


----------------------------------------------------------------------

Comment By: Andreas Kunit (segelohr)
Date: 2006-07-14 04:04

Message:
Logged In: YES 
user_id=1555499

I don't see a race condition.

Hypothesis:
The copy constructor is called
right before the delete operation of
the smart pointer is called.

Proof by contradiction:
1. When the delete operation is to call,
there's only one smart pointer pointing
to the pointee.

2. The one and only smart pointer must be
the base for any copy operation.

3. When the delete operation is to call,
the one and only smart pointer leaves
his scope of validity.

4. When the smart pointer leaves his
scope of validity, it may not be the base
for a copy operation anymore.

5. No copy constructor may be called
right before the delete operation.

-------------------------------------------------

The other remark in the text is that smart pointers
don't feature a synchronisation mechanism. IMO
they don't have to. There are other patterns which
cover this concern better.



----------------------------------------------------------------------

Comment By: Peter Kuemmel (syntheticpp)
Date: 2006-05-30 07:32

Message:
Logged In: YES 
user_id=1159765

Modified Files:
	SmartPtr.h 
Log Message:
add warning about mt bug

Index: SmartPtr.h
===================================================================
RCS file: /cvsroot/loki-lib/loki/include/loki/SmartPtr.h,v
retrieving revision 1.30
retrieving revision 1.31
diff -u -d -r1.30 -r1.31

--- SmartPtr.h	17 May 2006 16:04:32 -0000	1.30
+++ SmartPtr.h	30 May 2006 14:30:19 -0000	1.31
@@ -305,6 +305,10 @@
 ///  Implementation of the OwnershipPolicy used by SmartPtr
 ///  Implements external reference counting for
multithreaded programs
 ///  Policy Usage:
RefCountedMTAdj<ThreadingModel>::RefCountedMT
+///
+///  \par Warning
+///  There could be a race condition, see bug "Race
condition in RefCountedMTAdj::Release"
+/// 
http://sourceforge.net/tracker/index.php?func=detail&aid=1408845&group_id=29557&atid=396644

////////////////////////////////////////////////////////////////////////////////
     
     template <template <class, class> class ThreadingModel,
@@ -1521,6 +1525,9 @@

----------------------------------------------------------------------

Comment By: Steven Barbaglia (stevenuk)
Date: 2006-04-20 09:20

Message:
Logged In: YES 
user_id=1420849

some more observations... some re-iteration of Peter's 
comments using different words to elicit fresh ideas... a 
bit negative only because of my inability to see a solution 
at present...

1) There is still no warning in the documentation that 
SmartPtr does not support simultaneous read and writes (ie, 
there will be impredictable results in changing what an 
instance of SmartPtr is pointing to when the said instance 
is accessabile by other threads)

2) Can a volatile resource (eg, the counter in RefCountedMT) 
be used to synchronise distinct instances? Isn't there a 
contraddiction here?  Is this the theoretical principle that 
was breached?

3) The race condition on the writer side extends from the 
decision to delete the counter, through the deletion of the 
counter and referenced object and finally to the assignment 
of the new counter and referenced object.  The reader should 
not be allowed to get hold of the pointers to the doomed 
counter and doomed referenced object.  Currently, this 
'transaction' is covered by distinct methods in RefCountedMT 
(as illustrated by Peter's example code), so I do not see 
how synchronisation can be achieved within the class.

4) Object-level synchronisation is not an option, since 
multiple instances are involved.  A class level 
synchronisation object is a heavy-handed solution and could 
be a bottleneck.  Could a fallback position be for the 
reader to be able to detect that it now holds dangling 
pointers?  Maybe throw an exception?


----------------------------------------------------------------------

Comment By: Peter Kuemmel (syntheticpp)
Date: 2006-02-28 08:03

Message:
Logged In: YES 
user_id=1159765

/*
Hi Rich,
I can't see atomic-member-function-calls-only eliminates
the problem.
To demonstrate this I've written a "byte code" simulation
of the problem, and it seems to me that atomic calls can't
avoid the order in void bug().
*/

// storage policy:
double *ptr;

void del(bool del)
{  
    if(del) 
        delete ptr;  
}

// essence of RefCountedMT
int* pcount;
struct Ref
{
    void ctor(int* rhs)
    {  
        pCount = rhs;  
    }
    void clone()
    {  
        ++*pCount;  
    }
    bool Release()
    {  
        --*pCount; 
        return *pCount==0;  
    }

private:
    int* pCount;
};


Ref t1; // thread 1
Ref t2; // thread 2

bool ok1;
bool ok2;


void init()
{
    // both smart pointers have somewhere the pure pointer
    ptr = new double;

    // init counter
    pcount = new int;
    *pcount = 0;

    // t1 already points to ptr
    t1.ctor(pcount);
    t1.clone();
}

void bug()
{
    // this is the faulty order
    // t1 releases and t2 gets a copy
    // all calls correspond to atomic member function calls!!
    t2.ctor(pcount);
    ok1 = t1.Release();
    t2.clone();
    del(ok1);

    // t2 releases
    ok2 = t2.Release();
    del(ok2);
}



int main()
{
    init();

    bug();

    return 0;
}




----------------------------------------------------------------------

Comment By: Richard Sposato (rich_sposato)
Date: 2006-02-07 17:39

Message:
Logged In: YES 
user_id=749922

We might be able to use a trick from the SmallObj.h file to
solve this.  What if the code was changed to something like:

template <template <class> class ThreadingModel> 
struct RefCountedMTAdj 
{  
typedef ThreadingModel< RefCountedMT, MutexPolicy >
MyLockingModel;

//... 
bool Release(const P&) 
{
typename MyLockingModel::Lock lock;
(void)lock;
--*pCount;
if (!*pCount_)) 
{ 
SmallObject<ThreadingModel>::operator delete( 
const_cast<CountType *>(pCount_),  
sizeof(*pCount_)); 
return true; 
} 
return false; 
} 
//... 
} 

The MyLockingModel::Lock object will keep the SmartPtr
locked for the entire body of the function.  Once locked,
the count can be decremented directly without resorting to
the AtomicDecrement function.  When the function returns,
the lock on the count is automatically released.


----------------------------------------------------------------------

Comment By: Peter Kuemmel (syntheticpp)
Date: 2006-01-19 03:34

Message:
Logged In: YES 
user_id=1159765

Any ideas for a fix?

Here a first step:

A fix should avoid that Release deletes pCount_ and returns
true 
when "at the same time" the use of the copy constructor starts.
So there must be a synchronization between 
RefCountedMT(const RefCountedMT& rhs) and
bool Release(const P&)

Incrementing *pCount_ in the ctor is not sufficient, I think.
Also locking with ThreadingModel<RefCountedMT>::Lock
does not avoid some cases where *pCount is decremented before
the constructor it increments.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=396644&aid=1408845&group_id=29557