|
From: Joris K. <Jor...@ai...> - 2012-01-06 17:01:40
|
Hi all,
While trying to run some of our projects through helgrind, I encountered
a race condition which I fail to understand for a 100% but I think is a
false positive in my case. I've tried to disable that (assumed) false
positive using annotations but in all my attempts helgrind kept
reporting the race condition. I'd like some help in explaining why
helgrind thinks it is a racecondition, if it really is a false positive
and (if so) how to annotate it to prohibit this false positive.
The problem is with a base class that does (atomic) reference counting
from which abstract classes are derived virtually. I've stripped down
our sources to a single example file that demonstrates the problem.
Here's refcnttst.cc:
------begin refcnttst.cc --------------------
#include <pthread.h>
class Base {
public:
Base() : m_nRefCnt(0) {
};
virtual ~Base() {
};
void addRef() {
__sync_add_and_fetch(&m_nRefCnt, 1);
}
void releaseRef() {
int nRefCnt = __sync_add_and_fetch(&m_nRefCnt, -1);
if (nRefCnt == 0) {
delete this;
}
}
private:
int m_nRefCnt;
};
class Derived : INHERIT Base {
public:
virtual ~Derived() {
};
void fooBar() {
};
};
Derived* myDerived = 0;
pthread_mutex_t mu;
pthread_cond_t cv;
int done = 0;
void* Thread(void*) {
Derived* d = myDerived;
d->addRef();
pthread_mutex_lock(&mu);
done++;
pthread_cond_signal(&cv);
pthread_mutex_unlock(&mu);
d->releaseRef();
}
static const int nThreads = 6;
int main(int argc, char* argv[]) {
Derived* d = new Derived();
d->addRef();
myDerived = d;
pthread_t t[nThreads];
pthread_mutex_init(&mu, 0);
pthread_cond_init(&cv, 0);
// start threads.
for (int i = 0; i < nThreads; i++) {
pthread_create(&t[i], 0, Thread, 0);
}
// wait for threads to all have copied 'myDerived' and executed the
addRef
pthread_mutex_lock(&mu);
while (done != nThreads)
pthread_cond_wait(&cv, &mu);
pthread_mutex_unlock(&mu);
// this releaseRef or the last one inside the thread will actually
// delete this object
d->releaseRef();
return 0;
}
------end refcnttst.cc --------------------
Next compile and run valgrind twice, once using ordinary inheritance,
once using virtual inheritance:
joris@grazer:/tmp/refcnttst$ g++ -g -DINHERIT="public" refcnttst.cc
-lpthread
joris@grazer:/tmp/refcnttst$ valgrind --tool=helgrind -q ./a.out
joris@grazer:/tmp/refcnttst$ g++ -g -DINHERIT="virtual public"
refcnttst.cc -lpthread
joris@grazer:/tmp/refcnttst$ valgrind --tool=helgrind -q ./a.out
==2638== Thread #1 is the program's root thread
==2638==
==2638== Thread #7 was created
==2638== at 0x543585E: clone (clone.S:77)
==2638== by 0x4E36E7F: do_clone.constprop.3 (createthread.c:75)
==2638== by 0x4E38604: pthread_create@@GLIBC_2.2.5
(createthread.c:256)
==2638== by 0x4C29B23: pthread_create_WRK (hg_intercepts.c:257)
==2638== by 0x4C29CA7: pthread_create@* (hg_intercepts.c:288)
==2638== by 0x400A50: main (refcnttst.cc:67)
==2638==
==2638== Possible data race during write of size 8 at 0x5b8d040 by
thread #1
==2638== at 0x400BB3: Derived::~Derived() (refcnttst.cc:29)
==2638== by 0x400C1F: Derived::~Derived() (refcnttst.cc:30)
==2638== by 0x400B90: Base::releaseRef() (refcnttst.cc:19)
==2638== by 0x400AAE: main (refcnttst.cc:77)
==2638== This conflicts with a previous read of size 8 by thread #7
==2638== at 0x400986: Thread(void*) (refcnttst.cc:51)
==2638== by 0x4C29C90: mythread_wrapper (hg_intercepts.c:221)
==2638== by 0x4E37EFB: start_thread (pthread_create.c:304)
==2638== by 0x543589C: clone (clone.S:112)
==2638== Address 0x5b8d040 is 0 bytes inside a block of size 24 alloc'd
==2638== at 0x4C28B75: operator new(unsigned long)
(vg_replace_malloc.c:261)
==2638== by 0x4009B7: main (refcnttst.cc:57)
==2638==
So with 'normal' inheritance I do not get the race condition, but with
virtual inheritance I do. I'm suspecting that internally the g++
compiler will compile virtual inherited method calls by using an extra
vtable member per virtually inherited class, i.e. in source we're it
says:
d->releaseRef();
it compiles to something equivalent to
d->vtable_Base->releaseRef();
If this is indeed the case, then this 'vtable_Base' is potentially not
accessible after the releaseRef() call (since the other thread destroyed
the object). But in my specific case after calling 'releaseRef' I can
guarantee that the object will no longer be used by that thread (since
in our real source this is controlled by a smartptr wrapper class).
I suspect that, at the caller side, there are some extra instructions
generated *AFTER* the method call that *do stuff* with the potentially
destroyed object to put the appropriate 'this' pointer back into ecx or
something similar...
I hope you can give me some insight / ideas / corrections on this
reported race condition. I'm pretty sure it is a false positive, but I'm
clueless on how to annotate this.
Best Regards,
Joris
CONFIDENTIALITY: This e-mail and any attachments are confidential and may be privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person, use it for any purpose or store or copy the information in any medium.
|
|
From: Bart V. A. <bva...@ac...> - 2012-01-08 09:34:25
|
On Fri, Jan 6, 2012 at 4:04 PM, Joris Koster <Jor...@ai...> wrote:
> void releaseRef() {
> int nRefCnt = __sync_add_and_fetch(&m_nRefCnt, -1);
> if (nRefCnt == 0) {
> delete this;
> }
> }
Please read the documentation of ANNOTATE_HAPPENS_BEFORE() and
ANNOTATE_HAPPENS_AFTER() in the Valgrind documentation first.
Thanks,
Bart.
|