#85 Race condition in RefCountedMTAdj::Release

open-accepted
None
7
2014-12-17
2006-01-18
No

Consider the following usage example:

MySmartPtr shared; // both writer and reader can
access this smart pointer

// Writer thread
MySmartPtr update(new MyObj); // new referenced
object
shared = update; // shared smart pointer updated

// Reader thread
MySmartPtr read(shared); // shared pointer copied
to reserve referenced object
// access object through smart pointer

The referenced object is never updated once it is
wrapped in a smart pointer, but the shared smart
pointer is.

The reader takes a local copy of the shared smart
pointer in the expectation that thereafter it can
access the referenced object. Should the writer thread
update the shared object again (by assignment with a
smart pointer referencing a new object), the local
smart pointer in the reader will still be holding onto
the old referenced object. This means that the writer
and reader threads only potentially contend during the
assignment in the writer and the copy-construction in
the reader. At other times the reader and writer can
work at their own pace: for example, the writer might
be very fast and update the shared pointer often,
whilst the reader might save to database the
occasionally-retrieved snapshot.

Can the reader thread rely on the smart pointer
handling itself in a thread-safe way? My contention is
that there is a race condition not addressed in
RefCountedMT. Consider the following code extract from
the Loki library:

template <template <class> class ThreadingModel>
struct RefCountedMTAdj
{
//...
bool Release(const P&)
{
if (!ThreadingModel<RefCountedMT>::
AtomicDecrement(*pCount_))
{
SmallObject<ThreadingModel>::
operator delete(
const_cast<CountType
*>(pCount_),
sizeof(*pCount_));
return true;
}
return false;
}
//...
}

There is nothing to prevent the writer thread from
being interupted once the AtomicDecrement operation has
been completed, but before the delete operation (ie,
counter has reached zero).

Imagine the reader in my example takes its local copy
of the shared smart pointer precisely at that moment.
The code for both the copy constructor the the cloning
method suggest the operation will complete and the
reference counter will be incremented (ie, counter back
to one). Once the writer thread wakes up and completes
the delete operation, the smart pointer in the reader
will be holding a dangling pointer. Furthermore, the
Release method will return 'true', causing the deletion
of the referenced object. The reader might be in the
middle of accessing the refenrenced object through its
local smart pointer... ops...

As it stands, simultaneous reads and writes to smart
pointers can have unpredicable results (as in example
above). This limitation should be either documented or
removed with an enhancement to the code.

Discussion

  • Peter Kuemmel

    Peter Kuemmel - 2006-01-19
    • priority: 5 --> 7
     
  • Peter Kuemmel

    Peter Kuemmel - 2006-01-19

    Logged In: YES
    user_id=1159765

    Any ideas for a fix?

    Here a first step:

    A fix should avoid that Release deletes pCount_ and returns
    true
    when "at the same time" the use of the copy constructor starts.
    So there must be a synchronization between
    RefCountedMT(const RefCountedMT& rhs) and
    bool Release(const P&)

    Incrementing *pCount_ in the ctor is not sufficient, I think.
    Also locking with ThreadingModel<RefCountedMT>::Lock
    does not avoid some cases where *pCount is decremented before
    the constructor it increments.

     
  • Richard Sposato

    Richard Sposato - 2006-02-08

    Logged In: YES
    user_id=749922

    We might be able to use a trick from the SmallObj.h file to
    solve this. What if the code was changed to something like:

    template <template <class> class ThreadingModel>
    struct RefCountedMTAdj
    {
    typedef ThreadingModel< RefCountedMT, MutexPolicy >
    MyLockingModel;

    //...
    bool Release(const P&)
    {
    typename MyLockingModel::Lock lock;
    (void)lock;
    --*pCount;
    if (!*pCount_))
    {
    SmallObject<ThreadingModel>::operator delete(
    const_cast<CountType *>(pCount_),
    sizeof(*pCount_));
    return true;
    }
    return false;
    }
    //...
    }

    The MyLockingModel::Lock object will keep the SmartPtr
    locked for the entire body of the function. Once locked,
    the count can be decremented directly without resorting to
    the AtomicDecrement function. When the function returns,
    the lock on the count is automatically released.

     
  • Peter Kuemmel

    Peter Kuemmel - 2006-02-28

    Logged In: YES
    user_id=1159765

    /*
    Hi Rich,
    I can't see atomic-member-function-calls-only eliminates
    the problem.
    To demonstrate this I've written a "byte code" simulation
    of the problem, and it seems to me that atomic calls can't
    avoid the order in void bug().
    */

    // storage policy:
    double *ptr;

    void del(bool del)
    {
    if(del)
    delete ptr;
    }

    // essence of RefCountedMT
    int* pcount;
    struct Ref
    {
    void ctor(int* rhs)
    {
    pCount = rhs;
    }
    void clone()
    {
    ++*pCount;
    }
    bool Release()
    {
    --*pCount;
    return *pCount==0;
    }

    private:
    int* pCount;
    };

    Ref t1; // thread 1
    Ref t2; // thread 2

    bool ok1;
    bool ok2;

    void init()
    {
    // both smart pointers have somewhere the pure pointer
    ptr = new double;

    // init counter
    pcount = new int;
    *pcount = 0;

    // t1 already points to ptr
    t1.ctor(pcount);
    t1.clone();
    }

    void bug()
    {
    // this is the faulty order
    // t1 releases and t2 gets a copy
    // all calls correspond to atomic member function calls!!
    t2.ctor(pcount);
    ok1 = t1.Release();
    t2.clone();
    del(ok1);

    // t2 releases
    ok2 = t2.Release();
    del(ok2);
    }

    int main()
    {
    init();

    bug();

    return 0;
    }

     
  • Steven Barbaglia

    Logged In: YES
    user_id=1420849

    some more observations... some re-iteration of Peter's
    comments using different words to elicit fresh ideas... a
    bit negative only because of my inability to see a solution
    at present...

    1) There is still no warning in the documentation that
    SmartPtr does not support simultaneous read and writes (ie,
    there will be impredictable results in changing what an
    instance of SmartPtr is pointing to when the said instance
    is accessabile by other threads)

    2) Can a volatile resource (eg, the counter in RefCountedMT)
    be used to synchronise distinct instances? Isn't there a
    contraddiction here? Is this the theoretical principle that
    was breached?

    3) The race condition on the writer side extends from the
    decision to delete the counter, through the deletion of the
    counter and referenced object and finally to the assignment
    of the new counter and referenced object. The reader should
    not be allowed to get hold of the pointers to the doomed
    counter and doomed referenced object. Currently, this
    'transaction' is covered by distinct methods in RefCountedMT
    (as illustrated by Peter's example code), so I do not see
    how synchronisation can be achieved within the class.

    4) Object-level synchronisation is not an option, since
    multiple instances are involved. A class level
    synchronisation object is a heavy-handed solution and could
    be a bottleneck. Could a fallback position be for the
    reader to be able to detect that it now holds dangling
    pointers? Maybe throw an exception?

     
  • Peter Kuemmel

    Peter Kuemmel - 2006-05-30

    Logged In: YES
    user_id=1159765

    Modified Files:
    SmartPtr.h
    Log Message:
    add warning about mt bug

    Index: SmartPtr.h

    RCS file: /cvsroot/loki-lib/loki/include/loki/SmartPtr.h,v
    retrieving revision 1.30
    retrieving revision 1.31
    diff -u -d -r1.30 -r1.31
    --- SmartPtr.h 17 May 2006 16:04:32 -0000 1.30
    +++ SmartPtr.h 30 May 2006 14:30:19 -0000 1.31
    @@ -305,6 +305,10 @@
    /// Implementation of the OwnershipPolicy used by SmartPtr
    /// Implements external reference counting for
    multithreaded programs
    /// Policy Usage:
    RefCountedMTAdj<ThreadingModel>::RefCountedMT
    +///
    +/// \par Warning
    +/// There could be a race condition, see bug "Race
    condition in RefCountedMTAdj::Release"
    +///
    http://sourceforge.net/tracker/index.php?func=detail&aid=1408845&group_id=29557&atid=396644
    ////////////////////////////////////////////////////////////////////////////////

    template <template <class, class> class ThreadingModel,
    @@ -1521,6 +1525,9 @@

     
  • Andreas Kunit

    Andreas Kunit - 2006-07-14

    Logged In: YES
    user_id=1555499

    I don't see a race condition.

    Hypothesis:
    The copy constructor is called
    right before the delete operation of
    the smart pointer is called.

    Proof by contradiction:
    1. When the delete operation is to call,
    there's only one smart pointer pointing
    to the pointee.

    2. The one and only smart pointer must be
    the base for any copy operation.

    3. When the delete operation is to call,
    the one and only smart pointer leaves
    his scope of validity.

    4. When the smart pointer leaves his
    scope of validity, it may not be the base
    for a copy operation anymore.

    5. No copy constructor may be called
    right before the delete operation.

    -------------------------------------------------

    The other remark in the text is that smart pointers
    don't feature a synchronisation mechanism. IMO
    they don't have to. There are other patterns which
    cover this concern better.

     
  • Richard Sposato

    Richard Sposato - 2007-04-27
    • status: open --> closed-wont-fix
     
  • Richard Sposato

    Richard Sposato - 2007-04-27

    Logged In: YES
    user_id=749922
    Originator: NO

    Hi Folks,

    I asked on comp.lang.c++.moderated if anybody had recommendations for how to fix this bug.
    http://groups.google.com/group/comp.lang.c++.moderated/browse_thread/thread/93c6579e620a40fa/be06b51a7bccb6bb?hl=en#be06b51a7bccb6bb

    I did not see any good recommendations. My conclusion is that no change to Loki's SmartPtr can fix this bug. There is no thread-safe way to write copy-constructor that works while the copied object's destructor is active in another thread. Even if we could modify the copy-ctor and dtor so they run okay in most situations, that won't fix the situation where the source object is completely destroyed in one thread before the copy-constructor even begins in another thread. Indeed, the fact that one object is trying to copy another while that other object is being destroyed indicates a higher-level design problem.

    I reluctantly recommend we close this bug as "Won't Fix". We can document this issue with a note in the SmartPtr header file.

    Cheers,

    Rich

     
  • Richard Sposato

    Richard Sposato - 2011-10-26

    I might be able to fix this bug by using GCC's atomic operation functions - or with InterlockedIncrement and InterlockedDecrement on Visual-Studio.

    Reopening bug and assigning to myself for further study.

     
  • Richard Sposato

    Richard Sposato - 2011-10-26
    • assigned_to: nobody --> rich_sposato
    • status: closed-wont-fix --> open-accepted
     

Log in to post a comment.