|
From: Tangi V. <tan...@co...> - 2003-07-17 12:56:53
|
Hi, I've got the same issue : - helgrind with default sanity-level : data race writing outside my code - helgrind with sanity-level=2 : valgrind: the `impossible' happened: mallocSanityCheckArena - memcheck : nothing reported except KLUDGED & IGNORED calls within libpthread.so Details below. Tangi $ valgrind --skin=helgrind --num-callers=15 test1 ==19289== Helgrind, a data race detector for x86-linux. ==19289== Copyright (C) 2002, and GNU GPL'd, by Nicholas Nethercote. ==19289== Using valgrind-1.9.6, a program instrumentation system for x86-linux. ==19289== Copyright (C) 2000-2002, and GNU GPL'd, by Julian Seward. ==19289== Estimated CPU clock rate is 2002 MHz ==19289== For more details, rerun with: -v ==19289== ==19289== valgrind's libpthread.so: KLUDGED call to: pthread_getschedparam ==19289== valgrind's libpthread.so: IGNORED call to: pthread_setschedparam ==19289== valgrind's libpthread.so: IGNORED call to: pthread_attr_setinheritsched ==19289== valgrind's libpthread.so: IGNORED call to: pthread_attr_destroy ==19289== Thread 2: ==19289== Possible data race writing variable at 0x400120B0 (_rtld_global+144) ==19289== at 0x40006E0D: _dl_lookup_versioned_symbol_internal (in /lib/ld-2.2.93.so) ==19289== by 0x4000A01E: fixup (in /lib/ld-2.2.93.so) ==19289== by 0x4000A18F: _dl_runtime_resolve (in /lib/ld-2.2.93.so) ==19289== by 0x4022C41C: thread_wrapper (vg_libpthread.c:671) ==19289== by 0x4009B16B: do__quit (vg_scheduler.c:2154) ==19289== Address 0x400120B0 is in data section of /lib/ld-2.2.93.so ==19289== Previous state: shared RO, no locks $ valgrind --skin=helgrind --num-callers=15 --sanity-level=2 test1 ==21303== Helgrind, a data race detector for x86-linux. ==21303== Copyright (C) 2002, and GNU GPL'd, by Nicholas Nethercote. ==21303== Using valgrind-1.9.6, a program instrumentation system for x86-linux. ==21303== Copyright (C) 2000-2002, and GNU GPL'd, by Julian Seward. ==21303== Estimated CPU clock rate is 1996 MHz ==21303== For more details, rerun with: -v ==21303== --21303-- mSC [core ]: 1 sbs, 190 tot bs, 1/1 free bs, 16 lists, 1048576 mmap, 7240 loan --21303-- mSC [skin ]: 1 sbs, 52 tot bs, 1/1 free bs, 16 lists, 1048576 mmap, 1212 loan --21303-- mSC [symtab ]: 0 sbs, 0 tot bs, 0/0 free bs, 16 lists, 0 mmap, 0 loan --21303-- mSC [JITter ]: 0 sbs, 0 tot bs, 0/0 free bs, 16 lists, 0 mmap, 0 loan --21303-- mSC [client ]: 0 sbs, 0 tot bs, 0/0 free bs, 16 lists, 0 mmap, 0 loan --21303-- mSC [demangle]: 0 sbs, 0 tot bs, 0/0 free bs, 16 lists, 0 mmap, 0 loan --21303-- mSC [exectxt ]: 0 sbs, 0 tot bs, 0/0 free bs, 16 lists, 0 mmap, 0 loan --21303-- mSC [errors ]: 0 sbs, 0 tot bs, 0/0 free bs, 16 lists, 0 mmap, 0 loan --21303-- mSC [transien]: 0 sbs, 0 tot bs, 0/0 free bs, 16 lists, 0 mmap, 0 loan blockSane: fail -- redzone-hi mallocSanityCheckArena: sb 0x405D8000, block 2944 (bszW 10): BAD valgrind: the `impossible' happened: mallocSanityCheckArena Basic block ctr is approximately 50000 sched status: Thread 1: status = Runnable, associated_mx = 0x0, associated_cv = 0x0 ==21303== at 0x4000E448: strcmp (in /lib/ld-2.2.93.so) ==21303== by 0x40006E7B: _dl_lookup_versioned_symbol_internal (in /lib/ld-2.2.93.so) ==21303== by 0x4000A01E: fixup (in /lib/ld-2.2.93.so) ==21303== by 0x4000A18F: _dl_runtime_resolve (in /lib/ld-2.2.93.so) ==21303== by 0x401C8861: _STL::_Locale_impl::_Locale_impl(char const*) (in /usr/lib/libstlport_gcc.so.4.5) ==21303== by 0x401C8B3E: _STL::_Locale_impl::make_classic_locale() (in /usr/lib/libstlport_gcc.so.4.5) ==21303== by 0x401C996A: _STL::locale::_S_initialize() (in /usr/lib/libstlport_gcc.so.4.5) ==21303== by 0x401C98C4: _STL::ios_base::_Loc_init::_Loc_init() (in /usr/lib/libstlport_gcc.so.4.5) ==21303== by 0x401CACD4: __static_initialization_and_destruction_0(int, int) (in /usr/lib/libstlport_gcc.so.4.5) ==21303== by 0x401CAD41: _GLOBAL__I__ZN4_STL7_LocaleC2ERKNS_12_Locale_implE (in /usr/lib/libstlport_gcc.so.4.5) ==21303== by 0x401DCD54: (within /usr/lib/libstlport_gcc.so.4.5) ==21303== by 0x40179564: (within /usr/lib/libstlport_gcc.so.4.5) ==21303== by 0x4000A7A1: _dl_init_internal (in /lib/ld-2.2.93.so) ==21303== by 0x40000B6C: (within /lib/ld-2.2.93.so) |
|
From: Jeremy F. <je...@go...> - 2003-07-17 16:12:36
|
On Thu, 2003-07-17 at 06:02, Tangi Vass wrote: > Hi, > > I've got the same issue : > - helgrind with default sanity-level : data race writing outside my code > - helgrind with sanity-level=2 : valgrind: the `impossible' happened: > mallocSanityCheckArena > - memcheck : nothing reported except KLUDGED & IGNORED calls within > libpthread.so > > Details below. Can you send me your test program? J |
|
From: Tangi V. <tan...@co...> - 2003-07-17 16:37:51
|
> Can you send me your test program? I just upgraded to release 20030716 and I'm now able to go further with helgrind if I don't attach gdb (which seems to start much before the segfault). Found plenty of data race writings in my code. My test suite is quite big and needs many libraries. I'll try a bit further and send you the whole package when I can reproduce it at will again. Tangi |
|
From: Jeremy F. <je...@go...> - 2003-07-17 18:31:04
|
On Thu, 2003-07-17 at 09:43, Tangi Vass wrote: > > Can you send me your test program? > > I just upgraded to release 20030716 and I'm now able to go further with > helgrind if I don't attach gdb (which seems to start much before the > segfault). Hm. > Found plenty of data race writings in my code. Excellent! I'd be interested to know what your real bug:false bug ratio is. It seems to me that helgrind is a bit sensitive to false positives, but I haven't tried it on many real programs. I'd also be interested in what classes of errors hit you most (races vs. lock ordering). > My test suite is quite big and needs many libraries. I'll try a bit > further and send you the whole package when I can reproduce it at will > again. Hm, OK. The last time this crashing bug came up it was with a large complex thing as well. I'm hoping that there's a relatively simple way of reproducing it. J |
|
From: Tangi V. <tan...@co...> - 2003-07-17 19:15:58
|
> Excellent! I'd be interested to know what your real bug:false bug ratio > is. It seems to me that helgrind is a bit sensitive to false positives, > but I haven't tried it on many real programs. I'd also be interested in > what classes of errors hit you most (races vs. lock ordering). I might have found an interesting false bug family. Consider: - thread A producing sets of data - thread B using these - a ref counted MT-safe global smart pointer to pass those sets from thread A to thread B. It seems like Helgrind complains about the fact that thread B is deleting (because of the ref-counted mechanism) data created by thread A as no common mutex was locked during filling and emptying of the sets. A mutex is locked only during the ownership tranfer between thread A and B. As I've got tons of data, each piece producing such a data race writing warning, I can harly see the others. The idea of comparing locked mutexes between two "concurrent" writes of data is bright but in many cases a thread may have lost his reference to a data when another is modifying it (because of ownership transfer). That's of course only how I figure out the problem but being a very new user of Valgrind/helgrind, I may say stupid things. Tangi |
|
From: Jeremy F. <je...@go...> - 2003-07-17 19:39:36
|
On Thu, 2003-07-17 at 12:21, Tangi Vass wrote:
> > Excellent! I'd be interested to know what your real bug:false bug ratio
> > is. It seems to me that helgrind is a bit sensitive to false positives,
> > but I haven't tried it on many real programs. I'd also be interested in
> > what classes of errors hit you most (races vs. lock ordering).
>
> I might have found an interesting false bug family.
> Consider:
> - thread A producing sets of data
> - thread B using these
> - a ref counted MT-safe global smart pointer to pass those sets from
> thread A to thread B.
>
> It seems like Helgrind complains about the fact that thread B is
> deleting (because of the ref-counted mechanism) data created by thread A
> as no common mutex was locked during filling and emptying of the sets. A
> mutex is locked only during the ownership tranfer between thread A and
> B.
>
> As I've got tons of data, each piece producing such a data race writing
> warning, I can harly see the others.
Yes, that's the prime source of false error reports from helgrind.
There are two ways in which false errors can arise here:
1. If your MT-safe smart pointer library is using atomic
instructions for manipulating the count (rather than
pthread_mutex), which is likely, then I don't think helgrind
will realize it is a safe operation. You could use
VALGRIND_HG_KNOWN_RACE(&refcount, sizeof(refcount)) when
initializing the refcount in each object so that helgrind won't
bother reporting "problems" with the refcount.
2. The ownership transfer. You can use VALGRIND_HG_CLEAN_MEMORY to
reset your object's memory state so that the new thread will
become the owner of the memory. If you have many places in
which the ownership transfer happens, this might not be
practical (but if there's one site which does 90% of the
transfers, that would cut your noise level a lot).
> That's of course only how I figure out the problem but being a very new
> user of Valgrind/helgrind, I may say stupid things.
Helgrind is a relatively new skin, it *does* say many stupid things.
J
|