|
From: Bart V. A. <bar...@gm...> - 2006-03-04 18:25:12
|
Good news: my data race detection tool, although far from finished, is
already producing some output. It can already show the list of
conflicting accesses between threads. This list is based on the
following information:
- instrumentation of all memory loads.
- instrumentation of all stores to memory.
- at which time new threads are created.
I really want to filter out memory accesses protected by mutexes. How
should my tool be informed about calls to pthread_mutex_lock() and
pthread_mutex_unlock() ? Is anyone willing to make
VG_(track_{pre|post}_mutex_{lock|unlock}) working again ? I also
really need to know when pthread_join is called.
$ inst/bin/valgrind --tool=3Ddrd drd/tests/fp_race
=3D=3D4811=3D=3D drd, a data race detector.
=3D=3D4811=3D=3D Copyright (C) 2006, and GNU GPL'd, by Bart Van Assche.
=3D=3D4811=3D=3D Using LibVEX rev 1579, a library for dynamic binary transl=
ation.
=3D=3D4811=3D=3D Copyright (C) 2004-2005, and GNU GPL'd, by OpenWorks LLP.
=3D=3D4811=3D=3D Using valgrind-3.2.0.SVN, a dynamic binary instrumentation=
framework.
=3D=3D4811=3D=3D Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward e=
t al.
=3D=3D4811=3D=3D For more details, rerun with: -v
=3D=3D4811=3D=3D
s_d1 =3D 1 (should be 1)
s_d2 =3D 2 (should be 2)
s_d3 =3D 5 (should be 5)
=3D=3D4811=3D=3D
Conflicting accesses between thread 1 and thread 2 (vector clocks [ 1:
2 ] and [ 1: 1, 2: 1 ])
0x040161C8 W W
0x040161C9 W W
0x040161CA W W
0x040161CB W W
0x04016EF0 W R
0x04016EF1 W W
0x04019000 W W
0x04019001 W W
0x04019002 W W
0x04019003 W W
0x04019004 W W
0x04019005 W W
0x04019006 W W
0x04019007 W W
0x04019008 W W
0x04019009 W W
0x0401900A W W
0x0401900B W W
0x0401900C W W
0x0401900D W W
0x0401900E W W
0x0401900F W W
0x04019010 W W
0x04019011 W W
0x04019012 W W
0x04019013 W W
0x04019014 W W
0x04019015 W W
0x04019016 W W
0x04044960 W R
0x04044961 W W
0x0426E600 W W
0x0426E601 W W
0x0426E602 W W
0x0426E603 W W
0x0426E604 W W
0x0426E605 W W
0x0426E606 W W
0x0426E607 W W
0x0426E608 W W
0x0426E609 W W
0x0426E60A W W
0x0426E60B W W
0x0426E60C W W
0x0426E60D W W
0x0426E60E W W
0x0426E60F W W
0x0426E610 W W
0x0426E611 W W
0x0426E612 W W
0x0426E613 W W
0x0426E614 W W
0x0426E615 W W
0x0426E616 W W
0x0426E617 W W
0x0426E618 W W
0x0426E619 W W
0x0426E61A W W
0x0426E61B W W
0x0426E61C W W
0x0426E61D W W
0x0426E61E W W
0x0426E61F W W
0x0426E620 W W
0x0426E621 W W
0x0426E622 W W
0x0426E623 W W
0x0426E64C W R
0x0426E64D W R
0x0426E64E W R
0x0426E64F W R
0x0426E650 W R
0x0426E651 W R
0x0426E652 W R
0x0426E653 W R
0x0426E65C W W
0x0426E65D W W
0x0426E65E W W
0x0426E65F W W
0x0426F7C8 W W
0x0426F7C9 W W
0x0426F7CA W W
0x0426F7CB W W
0x0426F7CC W W
0x0426F7CD W W
0x0426F7CE W W
0x0426F7CF W W
0x0426F7D0 W W
0x0426F7D1 W W
0x0426F7D2 W W
0x0426F7D3 W W
0x04472C08 W W
0x04472C09 W W
0x04472C0A W W
0x04472C0B W W
0x04472D98 W R
0x04472D99 W R
0x04472D9A W R
0x04472D9B W R
0x08049B24 R W
0x08049B25 R W
0x08049B26 R W
0x08049B27 R W
0x08049B48 R W
0x08049B49 R W
0x08049B4A R W
0x08049B4B R W
0x08049B4C R W
0x08049B4D R W
0x08049B4E R W
0x08049B4F R W
0x08049B50 W W
0x08049B51 W W
0x08049B52 W W
0x08049B53 W W
0x08049B54 W W
0x08049B55 W W
0x08049B56 W W
0x08049B57 W W
0x08049B60 W W
0x08049B61 W W
0x08049B62 W W
0x08049B63 W W
|
|
From: Julian S. <js...@ac...> - 2006-03-05 14:09:52
Attachments:
notify_mxlock.patch
|
> Good news: my data race detection tool, although far from finished, is
> already producing some output. It can already show the list of
> conflicting accesses between threads.
Cool.
> pthread_mutex_unlock() ? Is anyone willing to make
> VG_(track_{pre|post}_mutex_{lock|unlock}) working again ?
Attached is a patch against r5712 which does track_{pre|post}_mutex_lock
and track_post_mutex_unlock. There is no track_pre_mutex_unlock
(since it never blocks) although one could be created if you want.
Tracking pthread_join is not done yet. It's more difficult; I
have not yet figured out how to find out the TId of the thread
being joined to. We know the pthread_t of that thread, but
I don't see an obvious way to find its TId (valgrind's internal
thread-id).
Anyway, this should give some idea how to build/modify the
notifications you need.
- vg_preloaded.c runs on the simulated CPU, and intercepts (wraps)
the relevant pthread functions.
- These wrappers use the client request mechanism to pass event
notifications to the scheduler (scheduler.c).
- The scheduler passes these notifications on to the tool, if it
has asked to see them.
I haven't committed this. Maybe you can mess with the patch to get
it more like you want.
J
|
|
From: Tom H. <to...@co...> - 2006-03-06 07:20:58
|
In message <200...@ac...>
Julian Seward <js...@ac...> wrote:
> > pthread_mutex_unlock() ? Is anyone willing to make
> > VG_(track_{pre|post}_mutex_{lock|unlock}) working again ?
>
> Attached is a patch against r5712 which does track_{pre|post}_mutex_lock
> and track_post_mutex_unlock. There is no track_pre_mutex_unlock
> (since it never blocks) although one could be created if you want.
Hmm... That patch just notifies the tool directly though rather than
going through the thread model code.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Bart V. A. <bar...@gm...> - 2006-03-11 19:31:42
Attachments:
svn-diffs.txt
|
Hello,
I am now one step further: my drd tool is now notified about
thread creation, thread termination and mutex locking / unlocking. The
list of conflicting accesses shortened significantly. I have attached
the svn diffs against version 1594:5748M.
Can someone please review/comment on the changes I made ?
Some difficulties I encountered:
- I need the thread ID of the joined thread for
track_post_pthread_join(). There are two difficulties:
* There are applications that call pthread_join with zero as the first
argument (i.e. thread not specified).
* Even if the first argument of pthread_join() is nonzero,
VG_(get_lwp_tid)() cannot be called since this information is cleaned
up as soon as the thread stops.
- For each mutex, I need the following information: recursion count
(depth of recursive locking) and at the time pthread_mutex_lock() is
called, the thread ID of the last thread that called
pthread_mutex_unlock(). This information is now stored in my tool. Is
this the right place, or should this information be managed by the
Valgrind core such that it is also accessible by Helgrind ?
- To be implemented: a notification when either
pthread_mutex_destroy() is called or the mutex memory is freed (POSIX
mutexes do not have to be initialized / destroyed via
pthread_mutex_init() / pthread_mutex_destroy()). I have to investigate
this further.
Examining thread 2 (vc [ 1: 3, 2: 1 ]) versus thread 1 (vc [ 1: 4 ])
0x040161C8 W W
0x040161C9 W W
0x040161CA W W
0x040161CB W W
0x04016EF1 W W
0x04472D98 R W
0x04472D99 R W
0x04472D9A R W
0x04472D9B R W
0x08049B80 W W
0x08049B81 W W
0x08049B82 W W
0x08049B83 W W
0x08049B84 W W
0x08049B85 W W
0x08049B86 W W
0x08049B87 W W
0x08049B98 W W
0x08049B99 W W
0x08049B9A W W
0x08049B9B W W
On 3/5/06, Julian Seward <js...@ac...> wrote:
>
> > Good news: my data race detection tool, although far from finished, is
> > already producing some output. It can already show the list of
> > conflicting accesses between threads.
>
> Cool.
>
> > pthread_mutex_unlock() ? Is anyone willing to make
> > VG_(track_{pre|post}_mutex_{lock|unlock}) working again ?
>
> Attached is a patch against r5712 which does track_{pre|post}_mutex_lock
> and track_post_mutex_unlock. There is no track_pre_mutex_unlock
> (since it never blocks) although one could be created if you want.
>
> Tracking pthread_join is not done yet. It's more difficult; I
> have not yet figured out how to find out the TId of the thread
> being joined to. We know the pthread_t of that thread, but
> I don't see an obvious way to find its TId (valgrind's internal
> thread-id).
>
> Anyway, this should give some idea how to build/modify the
> notifications you need.
>
> - vg_preloaded.c runs on the simulated CPU, and intercepts (wraps)
> the relevant pthread functions.
>
> - These wrappers use the client request mechanism to pass event
> notifications to the scheduler (scheduler.c).
>
> - The scheduler passes these notifications on to the tool, if it
> has asked to see them.
>
> I haven't committed this. Maybe you can mess with the patch to get
> it more like you want.
|
|
From: Julian S. <js...@ac...> - 2006-03-06 10:58:55
|
On Monday 06 March 2006 07:20, Tom Hughes wrote:
> In message <200...@ac...>
>
> Julian Seward <js...@ac...> wrote:
> > > pthread_mutex_unlock() ? Is anyone willing to make
> > > VG_(track_{pre|post}_mutex_{lock|unlock}) working again ?
> >
> > Attached is a patch against r5712 which does track_{pre|post}_mutex_lock
> > and track_post_mutex_unlock. There is no track_pre_mutex_unlock
> > (since it never blocks) although one could be created if you want.
>
> Hmm... That patch just notifies the tool directly though rather than
> going through the thread model code.
Hmm. That's a good point. I was never clear how the plumbing for
shipping thread events around the place was arranged. I lack the
Big Picture on this.
Would it be more robust to have the core hand events both to the tool
and to the thread model? That means tools can get thread events even
if the thread model isn't operating. My impression from Bart is that
drd just wants to know about thread-sync points, and at least for
handling pthreads, we don't need a thread model involved for that.
J
|
|
From: Tom H. <to...@co...> - 2006-03-06 11:10:40
|
In message <200...@ac...>
Julian Seward <js...@ac...> wrote:
> Hmm. That's a good point. I was never clear how the plumbing for
> shipping thread events around the place was arranged. I lack the
> Big Picture on this.
I believe the idea was that the thread model would be an idealised
generic representation of threads and thread related objects existed
that was not tied to any particular implementation like pthreads.
The thread model code would take care of the issuing the core thread
warning messages and of notifying the tools of thread events.
There would be then be code for pthreads (and any other threading
systems that we wanted to support) that updated the thread model as
necessary - that primarily means watching function calls but there
might also be implied events like automatically unlocking resources
on thread exit and so on.
Actually, looking at it the thread model code does implement some
higher level stuff as well - for example it unlocks a mutex when it
is destroyed (and hence notifies tools of the unlock).
> Would it be more robust to have the core hand events both to the tool
> and to the thread model? That means tools can get thread events even
> if the thread model isn't operating. My impression from Bart is that
> drd just wants to know about thread-sync points, and at least for
> handling pthreads, we don't need a thread model involved for that.
I think the problem is that it is nowhere as simple as mapping each
function call to an event - there are sorts of implied changes in
state and so on.
For example waiting on a CV unlocks the associated mutex, which your
patch won't currently notify about.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Julian S. <js...@ac...> - 2006-03-06 11:27:06
|
> I believe the idea was that the thread model would be an idealised > generic representation of threads and thread related objects existed > that was not tied to any particular implementation like pthreads. > > The thread model code would take care of the issuing the core thread > warning messages and of notifying the tools of thread events. > > [...] > I think the problem is that it is nowhere as simple as mapping each > function call to an event - there are sorts of implied changes in > state and so on. > > For example waiting on a CV unlocks the associated mutex, which your > patch won't currently notify about. Hmm yes I see. Looks like there's a lot of good stuff in m_pthreadmodel.c and m_threadmodel.c which we should bring back to life. I'm inclined to wait until Bart has a better picture of what drd really needs by way of thread events, and then maybe rearrange this stuff accordingly. J |
|
From: Bart V. A. <bar...@gm...> - 2006-03-06 18:32:26
|
As far as I know there is currently one thread-related event that is tracked by valgrind, and that is track_post_thread_create. I found the following statement in file coregrind/m_syswrap/thread_wrapper(): VG_TRACK ( post_thread_create, tst->os_state.parent, tid ); You explained me also another approach, that is, instrumenting functions by commenting out the stuff between #if 0 / #endif in file coregrind/vg_preloaded.c. I'm sorry to say but none of both approaches is sufficient. The problem is that there exist functions like pthread_cond_wait(). The drd tool has to be notified both after pthread_cond_wait() unlocks its mutex and after pthread_cond_wait() locked its mutex again. I think the only way to implement this properly is via a thread model. On 3/6/06, Julian Seward <js...@ac...> wrote: > > Would it be more robust to have the core hand events both to the tool > and to the thread model? That means tools can get thread events even > if the thread model isn't operating. My impression from Bart is that > drd just wants to know about thread-sync points, and at least for > handling pthreads, we don't need a thread model involved for that. |
|
From: Nicholas N. <nj...@cs...> - 2006-03-06 23:36:54
|
On Mon, 6 Mar 2006, Bart Van Assche wrote: > As far as I know there is currently one thread-related event that > is tracked by valgrind, and that is track_post_thread_create. I found > the following statement in file coregrind/m_syswrap/thread_wrapper(): > VG_TRACK ( post_thread_create, tst->os_state.parent, tid ); That sounds right. There was post_mutex_lock and others but they're currently not working. > I'm sorry to say but none of both approaches is sufficient. The > problem is that there exist functions like pthread_cond_wait(). The > drd tool has to be notified both after pthread_cond_wait() unlocks its > mutex and after pthread_cond_wait() locked its mutex again. I think > the only way to implement this properly is via a thread model. Perhaps I missed some email about this -- this tool is a data-race detector, right? How is it different to Helgrind? Nick |
|
From: Bart V. A. <bar...@gm...> - 2006-03-07 17:04:04
|
Hello Nicholas, The reason why I want to develop a new data race detector is that helgrind (at least in the tests I performed) reports false positives. This is because helgrind uses an Eraser-style algorithm. There exist algorithms that do not produce false positives -- see e.g. http://escher.elis.ugent.be/publ/Edocs/DOC/P104_116.pdf On 3/7/06, Nicholas Nethercote <nj...@cs...> wrote: > On Mon, 6 Mar 2006, Bart Van Assche wrote: > > > As far as I know there is currently one thread-related event that > > is tracked by valgrind, and that is track_post_thread_create. I found > > the following statement in file coregrind/m_syswrap/thread_wrapper(): > > VG_TRACK ( post_thread_create, tst->os_state.parent, tid ); > > That sounds right. There was post_mutex_lock and others but they're > currently not working. > > > I'm sorry to say but none of both approaches is sufficient. The > > problem is that there exist functions like pthread_cond_wait(). The > > drd tool has to be notified both after pthread_cond_wait() unlocks its > > mutex and after pthread_cond_wait() locked its mutex again. I think > > the only way to implement this properly is via a thread model. > > Perhaps I missed some email about this -- this tool is a data-race detect= or, > right? How is it different to Helgrind? |
|
From: Julian S. <js...@ac...> - 2006-03-19 03:03:37
|
> I am now one step further: my drd tool is now notified about > thread creation, thread termination and mutex locking / unlocking. The > list of conflicting accesses shortened significantly. I have attached > the svn diffs against version 1594:5748M. Good. Is it possible to see also the contents of drd, enough that we can actually run the system? Do you also have some small test programs which demonstrate races etc that drd can find? > Can someone please review/comment on the changes I made ? They seem plausible. My view is, this first phase is to construct a proof-of-concept patch which we can play with a bit, to see how well it works. If that looks good then the next stage is to consider the cleanest way to integrate it. As a result, in this first stage it doesn't matter much if there are ugly infrastructure hacks. > Some difficulties I encountered: > - I need the thread ID of the joined thread for > track_post_pthread_join(). There are two difficulties: > * There are applications that call pthread_join with zero as the first > argument (i.e. thread not specified). Are you sure? POSIX doesn't appear to say anything about that: http://www.opengroup.org/onlinepubs/007908799/xsh/pthread_join.html The impression I get from that URL and also the Linux man page is that the joined thread must be specified. > * Even if the first argument of pthread_join() is nonzero, > VG_(get_lwp_tid)() cannot be called since this information is cleaned > up as soon as the thread stops. I guess that's part of what m_pthreadmodel.c is supposed to do: /* [...] One tricky problem we need to solve is the mapping between pthread_t identifiers and internal thread identifiers. */ Precisely what information do you need to handle pthread_join correctly? Getting the tid is tricky because there could be a race condition (V reallocates the same lwp_tid to a new thread before you get the required info from the scheduler) so we will have to be careful with that. > - For each mutex, I need the following information: recursion count > (depth of recursive locking) and at the time pthread_mutex_lock() is > called, the thread ID of the last thread that called > pthread_mutex_unlock(). This information is now stored in my tool. Is > this the right place, or should this information be managed by the > Valgrind core such that it is also accessible by Helgrind ? Maybe this stuff should be in m_pthreadmodel.c. I think you will have to make friends with that module to be really successful with drd. > - To be implemented: a notification when either > pthread_mutex_destroy() is called or the mutex memory is freed (POSIX > mutexes do not have to be initialized / destroyed via > pthread_mutex_init() / pthread_mutex_destroy()). I have to investigate > this further. Could be expensive to check all frees etc to know when mutex memory is destroyed. J |
|
From: Bart V. A. <bar...@gm...> - 2006-03-19 17:48:25
|
On 3/19/06, Julian Seward <js...@ac...> wrote: > Good. Is it possible to see also the contents of drd, enough > that we can actually run the system? Do you also have some > small test programs which demonstrate races etc that drd can find? There is still some work to do - I'd like to realize the first three items before I send you the tool: - Implement segment merging and detection of obsolete segments (see also th= e DIOTA paper) -- the memory use of the tool keeps increasing each time a pthread function is called. - Test the tool with nontrivial applications. - Implement support for reuse of thread-ID's - currently it is assumed that thread-ID's are not reused. - Free the memory allocated (inside the tool) for mutex state information i= f a mutex is no longer in use. - Error reporting: call stacks of both conflicting accesses. - Writing test programs for the drd tool. > Some difficulties I encountered: > > - I need the thread ID of the joined thread for > > track_post_pthread_join(). There are two difficulties: > > * There are applications that call pthread_join with zero as the first > > argument (i.e. thread not specified). > > Are you sure? POSIX doesn't appear to say anything about that: > http://www.opengroup.org/onlinepubs/007908799/xsh/pthread_join.html > The impression I get from that URL and also the Linux man page is > that the joined thread must be specified. Thanks for the link -- I won't try to support pthread_join(0). By the way, you reminded me of the fact that pthread_t does not have to be a scalar datatype -- even assuming that (pthread_t)0 is not a valid thread ID is nonportable. Precisely what information do you need to handle pthread_join > correctly? Getting the tid is tricky because there could be > a race condition (V reallocates the same lwp_tid to a new > thread before you get the required info from the scheduler) > so we will have to be careful with that. I think I have a solution for this race condition: I removed the assignment "tst->status =3D VgTs_Empty;" from run_a_thread_NORETURN() and moved it to scheduler.c, just after the point where VG_TRACK(post_thread_join) is calle= d from the pthread_join wrapper. But I'm afraid this will break calls to clone() that do not originate from pthread_create() ? Note: it's no problem for the drd tool that pthread_t thread ID's are reuse= d before track_post_thread_join() is called, it only must be ensured that V's ThreadId is not reused after pthread_join() finishes and before track_post_thread_join() is called. > - For each mutex, I need the following information: recursion count > > (depth of recursive locking) and at the time pthread_mutex_lock() is > > called, the thread ID of the last thread that called > > pthread_mutex_unlock(). This information is now stored in my tool. Is > > this the right place, or should this information be managed by the > > Valgrind core such that it is also accessible by Helgrind ? > > Maybe this stuff should be in m_pthreadmodel.c. I think you > will have to make friends with that module to be really successful > with drd. By the way, this state information is already managed by the drd tool. This state information is updated inside the mutex tracking functions. It would be nice if that stuff would be in m_pthreadmodel.c, but for me it's not essential. |
|
From: Bart V. A. <bar...@gm...> - 2006-08-14 17:03:35
|
On 3/19/06, Julian Seward <js...@ac...> wrote: > My view is, this first phase is to construct a proof-of-concept > patch which we can play with a bit, to see how well it works. > If that looks good then the next stage is to consider the cleanest > way to integrate it. As a result, in this first stage it doesn't > matter much if there are ugly infrastructure hacks. > A proof-of-concept version of drd is available at the following location: http://home.scarlet.be/~bvassche/drd/valgrind-5999.patch (apply it with patch -p0 <...) http://home.scarlet.be/~bvassche/drd/valgrind-5999-drd-2006-08-14.tar.bz2(extract with tar xjf ...) This version is not yet ready for a release -- e.g. it does not yet support floating-point instructions, and uses more memory than acceptable for some test cases. But it works well enough to demonstrate the DIOTA data race detection algorithm. And for the cases I tested, it works consistenly and reliably. After running the following commands ./autogen.sh && ./configure --prefix=$PWD/inst && make install the drd tool can be started e.g. on the unit test fp_race. This test program accesses the static variable s_d3 from two different threads. When started without command line options, s_d3 is accessed without locking (triggers a race condition). When started with command line option -m, s_d3 accesses are guarded by a mutex (no race condition on s_d3). This is reported correctly by the drd tool: $ inst/bin/valgrind --tool=drd drd/tests/fp_race ==5637== drd, a data race detector. ==5637== Copyright (C) 2006, and GNU GPL'd, by Bart Van Assche. THIS SOFTWARE IS A PROTOTYPE, AND IS NOT YET RELEASED ==5637== Using LibVEX rev 1579, a library for dynamic binary translation. ==5637== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP. ==5637== Using valgrind-3.3.0.SVN, a dynamic binary instrumentation framework. ==5637== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al. ==5637== For more details, rerun with: -v ==5637== &s_d1 = 0x8049cf8; &s_d2 = 0x8049d00; &s_d3 = 0x8049d08 ==5637== Detected data races. Context: ==5637== 1st segment start (VG t 2, kernel t 5638, POSIX t 78117792) ==5637== at 0x421A648: clone (in /lib/libc-2.4.so) ==5637== ==5637== 1st segment end (VG t 2, kernel t 5638, POSIX t 78117792) ==5637== at 0x40423BD: start_thread (in /lib/libpthread-2.4.so) ==5637== by 0x421A65D: clone (in /lib/libc-2.4.so) ==5637== ==5637== 2nd segment start (VG t 1, kernel t 5637, POSIX t 69721984) ==5637== at 0x421A648: clone (in /lib/libc-2.4.so) ==5637== by 0x40429BC: pthread_create@@GLIBC_2.1 (in /lib/libpthread- 2.4.so) ==5637== by 0x401CAAF: pthread_create@* (vg_preloaded.c:135) ==5637== by 0x804884C: main (fp_race.cpp:97) ==5637== ==5637== 2nd segment end (VG t 1, kernel t 5637, POSIX t 69721984) ==5637== at 0x401C4FA: pthread_join (vg_preloaded.c:164) ==5637== by 0x8048876: main (fp_race.cpp:108) ==5637== ==5637== Actual data races: ==5637== Thread 2: ==5637== 0x0401B310 sz 4 W W (_rtld_local (offset 752, size 1524) in /lib/ld-2.4.so, ld-linux.so.2:Data) ==5637== at 0x40423BD: start_thread (in /lib/libpthread-2.4.so) ==5637== by 0x421A65D: clone (in /lib/libc-2.4.so) ==5637== ==5637== 0x0401E47D sz 1 R W (unknown) ==5637== at 0x40423BD: start_thread (in /lib/libpthread-2.4.so) ==5637== by 0x421A65D: clone (in /lib/libc-2.4.so) ==5637== ==5637== 0x04A7FC04 sz 4 W W (stack of VG t 2; kernel t 5638; POSIX t 78117792) ==5637== at 0x40423BD: start_thread (in /lib/libpthread-2.4.so) ==5637== by 0x421A65D: clone (in /lib/libc-2.4.so) ==5637== ==5637== 0x04A7FD9C sz 4 R W (stack of VG t 2; kernel t 5638; POSIX t 78117792) ==5637== at 0x40423BD: start_thread (in /lib/libpthread-2.4.so) ==5637== by 0x421A65D: clone (in /lib/libc-2.4.so) ==5637== ==5637== 0x08049D08 sz 8 W W (s_d3 (offset 0, size 8) in /home/bart/software/valgrind-svn/drd/tests/fp_race, NONE:BSS) ==5637== at 0x40423BD: start_thread (in /lib/libpthread-2.4.so) ==5637== by 0x421A65D: clone (in /lib/libc-2.4.so) ==5637== ==5637== 0xBEFF5F60 sz 8 R W (stack of VG t 1; kernel t 5637; POSIX t 69721984) ==5637== at 0x40423BD: start_thread (in /lib/libpthread-2.4.so) ==5637== by 0x421A65D: clone (in /lib/libc-2.4.so) ==5637== ==5637== 0xBEFF5F68 sz 4 W W (stack of VG t 1; kernel t 5637; POSIX t 69721984) ==5637== at 0x40423BD: start_thread (in /lib/libpthread-2.4.so) ==5637== by 0x421A65D: clone (in /lib/libc-2.4.so) ==5637== End of detected data races ==5637== ==5637== ERROR SUMMARY: 7 errors from 7 contexts (suppressed: 0 from 0) > inst/bin/valgrind --tool=drd drd/tests/fp_race -m ==5650== drd, a data race detector. ==5650== Copyright (C) 2006, and GNU GPL'd, by Bart Van Assche. THIS SOFTWARE IS A PROTOTYPE, AND IS NOT YET RELEASED ==5650== Using LibVEX rev 1579, a library for dynamic binary translation. ==5650== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP. ==5650== Using valgrind-3.3.0.SVN, a dynamic binary instrumentation framework. ==5650== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al. ==5650== For more details, rerun with: -v ==5650== &s_d1 = 0x8049cf8; &s_d2 = 0x8049d00; &s_d3 = 0x8049d08 ==5650== ==5650== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) |
|
From: Nicholas N. <nj...@cs...> - 2006-08-14 23:20:06
|
On Mon, 14 Aug 2006, Bart Van Assche wrote: > A proof-of-concept version of drd is available at the following location: The code itself looks pretty good, judging from a quick look-over. > This version is not yet ready for a release -- e.g. it does not yet support > floating-point instructions, and uses more memory than acceptable for some > test cases. But it works well enough to demonstrate the DIOTA data race > detection algorithm. And for the cases I tested, it works consistenly and > reliably. Good. How much memory does it use? Does it still have one shadow word per memory word like Helgrind? > ==5637== Detected data races. Context: > [...] > ==5637== Actual data races: What's the difference between detected and actual races? Nick |
|
From: Bart V. A. <bar...@gm...> - 2006-08-15 08:15:36
|
On 8/15/06, Nicholas Nethercote <nj...@cs...> wrote: > > On Mon, 14 Aug 2006, Bart Van Assche wrote: > > > A proof-of-concept version of drd is available at the following > location: > > The code itself looks pretty good, judging from a quick look-over. Please note that the valgrind patch I made available breaks a feature of Valgrind (see also drd/TODO.txt): currently Valgrind handles clone() calls originating from outside pthread_create() correctly. With my patch such clone() calls cause a memory leak (the associated thread remains in Vg_Empty state and never reaches Vg_Zombie). This can be solved easily however. > This version is not yet ready for a release -- e.g. it does not yet > support > > floating-point instructions, and uses more memory than acceptable for > some > > test cases. But it works well enough to demonstrate the DIOTA data race > > detection algorithm. And for the cases I tested, it works consistenly > and > > reliably. > > Good. How much memory does it use? Does it still have one shadow word > per > memory word like Helgrind? Two bits per accessed byte per segment (an access is either a read or a write). A new segment is created upon most pthread_...() calls. > ==5637== Detected data races. Context: > > [...] > > ==5637== Actual data races: > > What's the difference between detected and actual races? > The output of the drd tool is structured as follows: - First the text "Detected data races. Context:". - Next, four call stacks (start of first segment involved in the race, end of first segment, start of second segment and end of second segment). This information contains the exact thread IDs involved in the data race, and shows the regions of code involved in the data race. There is no exact information on the location of the data race -- I have no idea how to report the exact location in code of a data race without recording a call stack upon each memory access. - Next, the text "Actual data races:" - Next, a list of addresses for which a data race was detected. Each entry in this list contains the following information: * a line with the start address, range size, access pattern by first and second thread (W or R), and how the memory was allocated. * same call stack as the end of the first segment. I would like to suppress this output, but have no idea how this is possible with VG_(maybe_record_error)(). * for dynamically allocated data only, the call stack at the time of allocation of the data (there is no example of this in the output included in my previous E-mail). |
|
From: Julian S. <js...@ac...> - 2006-08-22 02:35:22
|
> A proof-of-concept version of drd is available at the following location:
Just trying it now.
> This version is not yet ready for a release -- e.g. it does not yet support
> floating-point instructions,
Huh? I guess you mean it does not handle the dirty helper calls used
for doing x87 80-bit loads/stores. I rewrote the Ist_Dirty case in
drd_instrument() in drd_main.c as shown below and that appears to fix
it (at least it does not die on fldt/fstpt). For the Ifx_Modify case
I called drd_trace_load and then drd_trace_store with the same args
since there is no drd_trace_modify function. Is that OK ?
So now, I can start konqueror and it runs for ~ 60 seconds
(doing fontconfig crap) but I had to control-C it before the konq
window appeared, due to memory use exceeding 450MB.
So it looks promising. At this point I have two questions:
(1) The memory use .. seems huge.
Can you say what it is that the memory use depends on?
Is there a worst-case bound?
Can the current behaviour be improved?
(2) Nick asked:
> What's the difference between detected and actual races?
>
The output of the drd tool is structured as follows:
- First the text "Detected data races. Context:".
[ ...]
So .. I read all this but still didn't understand what the
meaning of these phrases is. Can you clarify?
If the tool is to become popular we need to have a way to
explain to programmers in a simple way what it does and how
to interpret the results. I for one would like to know .. at
the moment all I know is that it finds data races by some
entirely mysterious means, and gives fewer false positives
than the Eraser style algorithms.
J
-----
Patch for drd_instrument() in drd_main.c to make x86 FP work:
(replaces entire previous Ist_Dirty case)
case Ist_Dirty:
{
IRDirty* d = st->Ist.Dirty.details;
IREffect const mFx = d->mFx;
switch (mFx) {
case Ifx_None:
break;
case Ifx_Read:
case Ifx_Write:
case Ifx_Modify:
tl_assert(d->mAddr);
tl_assert(d->mSize > 0);
argv = mkIRExprVec_2(d->mAddr, mkIRExpr_HWord(d->mSize));
if (mFx == Ifx_Read || mFx == Ifx_Modify) {
di = unsafeIRDirty_0_N(
/*regparms*/2,
"drd_trace_load",
VG_(fnptr_to_fnentry)(drd_trace_load),
argv);
addStmtToIRBB(bb, IRStmt_Dirty(di));
}
if (mFx == Ifx_Write || mFx == Ifx_Modify) {
di = unsafeIRDirty_0_N(
/*regparms*/2,
"drd_trace_store",
VG_(fnptr_to_fnentry)(drd_trace_store),
argv);
addStmtToIRBB(bb, IRStmt_Dirty(di));
}
break;
default:
tl_assert(0);
}
}
addStmtToIRBB(bb, st);
break;
|
|
From: Julian S. <js...@ac...> - 2006-08-22 10:48:02
|
> So now, I can start konqueror and it runs for ~ 60 seconds > (doing fontconfig crap) but I had to control-C it before the konq > window appeared, due to memory use exceeding 450MB. Update: I can get the konq main window, but it exhausts my 2G swap partition before it can render the first page. At the point the process died its total size was using about 2600M. Watching it with top, there were many places where it seemed to increase in size at a rate of almost 20MB/sec, and I believe it averaged 10MB/sec overall. One thing I observe is that you're intercepting malloc/free. Is that necessary? Does that make the algorithm work better somehow? J |
|
From: Bart V. A. <bar...@gm...> - 2006-08-22 16:18:47
|
Hello Julian,
Regarding interception of malloc() and free(): the only reason drd
intercepts those is for recording the call stack at the time malloc() is
called, such that this call stack can be included when reporting a data
race. I don't think this causes a lot of overhead ?
Before I can say something about the memory use of drd, I have to
explain its algorithm. In short, the data race detection algorithm works as
follows:
- A process being analyzed by drd consists of a number of threads.
- Each thread consists of a sequence of actions. The actions relevant to drd
are: memory load, memory store, and synchronization actions (lock mutex,
unlock mutex, create thread, join thread, ...).
- The sequence of load and stores performed by a single thread between two
successive synchronization actions is called a segment.
- The drd tool records the order in which segments are executed. Within a
thread, this order is represented by a single integer number. The order over
threads is represented by something called a "vector clock". This is a
standard way of representing the partial order relationship between actions
performed by different threads.
- For each segment it is recorded via a three-level bitmap which memory
locations have been read from or written to (at the lowest level, two bits
are needed per byte: one bit representing read access, one representing
write access).
- Actions within a thread are always ordered, actions performed by different
threads are only considered as ordered when an order has been enforced by
synchronization actions.
- A data race is defined as two threads that access the same memory
location, where at least one of the two threads performs a write action, and
the order between the two accesses is not enforced by a synchronization
action.
- Segments that can no longer be involved in a data race are freed
(VG_(free)()).
This means that the memory consumption of the drd tool is proportional to:
- the number of threads running simultaneously.
- the number of segments allocated within a thread.
- the amount of unused memory allocated within the bitmap allocated for a
segment. When e.g. iterating over a byte-array, and only reading every
1024th byte, only 0.1% of the memory allocated for the bitmap will be used
-- very inefficient.
Or: it's not yet clear to me which of the above three reasons applies to
konqueror.
It must be possible however to run software like konqueror under drd, since
it is possible with DIOTA. DIOTA uses the same approach as drd. One of the
differences between DIOTA and drd that I know of, is that DIOTA uses a
9-level bitmap while drd only uses a 3-level bitmap.
I hope the above explanation is comprehensible ?
On 8/22/06, Julian Seward <js...@ac...> wrote:
>
>
> > So now, I can start konqueror and it runs for ~ 60 seconds
> > (doing fontconfig crap) but I had to control-C it before the konq
> > window appeared, due to memory use exceeding 450MB.
>
> Update: I can get the konq main window, but it exhausts my 2G swap
> partition before it can render the first page. At the point the process
> died its total size was using about 2600M. Watching it with top,
> there were many places where it seemed to increase in size at a
> rate of almost 20MB/sec, and I believe it averaged 10MB/sec overall.
>
> One thing I observe is that you're intercepting malloc/free. Is that
> necessary? Does that make the algorithm work better somehow?
>
> J
>
--
Met vriendelijke groeten,
Bart Van Assche.
|
|
From: Josef W. <Jos...@gm...> - 2006-08-22 19:54:50
|
On Tuesday 22 August 2006 18:18, Bart Van Assche wrote: > ... > This means that the memory consumption of the drd tool is proportional to: > - the number of threads running simultaneously. > - the number of segments allocated within a thread. > - the amount of unused memory allocated within the bitmap allocated for a > segment. When e.g. iterating over a byte-array, and only reading every > 1024th byte, only 0.1% of the memory allocated for the bitmap will be used > -- very inefficient. > > Or: it's not yet clear to me which of the above three reasons applies to > konqueror. Hmmm... AFAIK, konqueror does not do multithreading. Why does drd for the 1-threaded "segment" at process start already need to store read/write accesses? I would say that there can never be any race until a second thread is created. Or I am wrong and the drd tool is not storing read/write accesses for konqueror, but only the 500627 (*) mallocs with stack traces for potential later error messages. Josef (*) This is only konqueror startup, until the blank window appears (KDE 3.5.3). Just checked with callgrind... |
|
From: Nicholas N. <nj...@cs...> - 2006-08-22 23:10:30
|
On Tue, 22 Aug 2006, Bart Van Assche wrote: > - The drd tool records the order in which segments are executed. Within a > thread, this order is represented by a single integer number. The order over > threads is represented by something called a "vector clock". This is a > standard way of representing the partial order relationship between actions > performed by different threads. > - For each segment it is recorded via a three-level bitmap which memory > locations have been read from or written to (at the lowest level, two bits > are needed per byte: one bit representing read access, one representing > write access). I'd guess that this is where the memory consumption is coming from. > - A data race is defined as two threads that access the same memory > location, where at least one of the two threads performs a write action, and > the order between the two accesses is not enforced by a synchronization > action. > - Segments that can no longer be involved in a data race are freed > (VG_(free)()). That sounds important. When can they be freed? > This means that the memory consumption of the drd tool is proportional to: > - the number of threads running simultaneously. > - the number of segments allocated within a thread. > - the amount of unused memory allocated within the bitmap allocated for a > segment. When e.g. iterating over a byte-array, and only reading every > 1024th byte, only 0.1% of the memory allocated for the bitmap will be used > -- very inefficient. So 1024 bytes is the smallest memory chunk you can individually represent? > It must be possible however to run software like konqueror under drd, since > it is possible with DIOTA. DIOTA uses the same approach as drd. One of the > differences between DIOTA and drd that I know of, is that DIOTA uses a > 9-level bitmap while drd only uses a 3-level bitmap. Wow, 9 levels sounds like total overkill. How small are the memory chunks then? BTW, do you know/are you involved with the DIOTA people? Judging from your name it seems plausible :) > I hope the above explanation is comprehensible ? It helps a lot! Thanks. Nick |
|
From: Bart V. A. <bar...@gm...> - 2006-08-23 19:12:14
|
On 8/22/06, Julian Seward <js...@ac...> wrote: > > > So it looks promising. At this point I have two questions: > > (1) The memory use .. seems huge. > Can you say what it is that the memory use depends on? > Is there a worst-case bound? > Can the current behaviour be improved? I have done some tests, and the huge memory consumption is probably due to a memory leak when doing segment deallocation. The memory used by drd keeps increasing for programs like konqueror or kate even although the number of segments kept in memory is limited to 16 for single-threaded programs. Even when I decreased this limit to max. 1 segment in memory at a time, memory consumption kept increasing ... (2) Nick asked: > > > What's the difference between detected and actual races? > > If the tool is to become popular we need to have a way to > explain to programmers in a simple way what it does and how > to interpret the results. I for one would like to know .. at > the moment all I know is that it finds data races by some > entirely mysterious means, and gives fewer false positives > than the Eraser style algorithms. > Was my explanation comprehensible, or should I illustrate it with a picture ? BTW: thanks for the patch ! |
|
From: Bart V. A. <bar...@gm...> - 2006-08-23 19:21:48
|
On 8/23/06, Nicholas Nethercote <nj...@cs...> wrote: > > On Tue, 22 Aug 2006, Bart Van Assche wrote: > > > - A data race is defined as two threads that access the same memory > > location, where at least one of the two threads performs a write action, > and > > the order between the two accesses is not enforced by a synchronization > > action. > > - Segments that can no longer be involved in a data race are freed > > (VG_(free)()). > > That sounds important. When can they be freed? At the time free() is called from the client. BTW: pub_tool_execontext.h defines VG_(record_ExeContext)() but no corresponding cleanup function ? > This means that the memory consumption of the drd tool is proportional to: > > - the number of threads running simultaneously. > > - the number of segments allocated within a thread. > > - the amount of unused memory allocated within the bitmap allocated for > a > > segment. When e.g. iterating over a byte-array, and only reading every > > 1024th byte, only 0.1% of the memory allocated for the bitmap will be > used > > -- very inefficient. > > So 1024 bytes is the smallest memory chunk you can individually represent? The current bitmap implementation splits each 32-bit address in 10+10+12 bits -- that means that the lowest level corresponds to 4096 client bytes. > It must be possible however to run software like konqueror under drd, > since > > it is possible with DIOTA. DIOTA uses the same approach as drd. One of > the > > differences between DIOTA and drd that I know of, is that DIOTA uses a > > 9-level bitmap while drd only uses a 3-level bitmap. > > Wow, 9 levels sounds like total overkill. How small are the memory chunks > then? >From the DIOTA sources (DIOTA-0.91-20060222/backend_dr/bitmap10c.c): each 32-bit address is split as follows: 32 = 1+3+3+3+3+3+3+3+3+2+5 (that's 10 levels in total -- the 9 levels was OTOH). BTW, do you know/are you involved with the DIOTA people? Judging from your > name it seems plausible :) During several years I shared an office with Michiel Ronsse at the University of Ghent -- that's where I learned about DIOTA. I already told him about drd. In the past Michiel has tried to attract a student for writing a tool like drd, but without success. |
|
From: Nicholas N. <nj...@cs...> - 2006-08-24 06:33:03
|
On Wed, 23 Aug 2006, Bart Van Assche wrote: > BTW: pub_tool_execontext.h defines VG_(record_ExeContext)() but no > corresponding cleanup function ? That's correct. From coregrind/pub_core_execontext.h: // PURPOSE: This module provides an abstract data type, ExeContext, // which is a stack trace stored in such a way that duplicates are // avoided. This also facilitates fast comparisons if necessary. If an ExeContext is obtained that is the same as a previous one, the previous one that is already in the table will be used. This avoids having multiple copies of identical ExeContexts, which is really important, eg. if an instruction that triggers an error is run 100,000 times. (An ExeContext is really just a stack-trace that has been put into this table and so has this only-one-copy property.) Nick |
|
From: Bart V. A. <bar...@gm...> - 2006-08-23 19:24:53
|
On 8/22/06, Josef Weidendorfer <Jos...@gm...> wrote: > > On Tuesday 22 August 2006 18:18, Bart Van Assche wrote: > > Hmmm... AFAIK, konqueror does not do multithreading. > Why does drd for the 1-threaded "segment" at process start already need > to store read/write accesses? I would say that there can never be any > race until a second thread is created. Josef, you're right about this. Not recording read and write accesses until a second thread is created would be a nice optimization of the drd tool. Or I am wrong and the drd tool is not storing read/write accesses for > konqueror, but only the 500627 (*) mallocs with stack traces for potential > later error messages. > I have tried to disable malloc()/free() wrapping, but I didn't see an obvious difference in memory consumption of the drd tool. |
|
From: Julian S. <js...@ac...> - 2006-08-23 19:30:31
|
On Wednesday 23 August 2006 20:24, Bart Van Assche wrote: > On 8/22/06, Josef Weidendorfer <Jos...@gm...> wrote: > > On Tuesday 22 August 2006 18:18, Bart Van Assche wrote: > > > > Hmmm... AFAIK, konqueror does not do multithreading. > > Why does drd for the 1-threaded "segment" at process start already need > > to store read/write accesses? I would say that there can never be any > > race until a second thread is created. > > Josef, you're right about this. Not recording read and write accesses until > a second thread is created would be a nice optimization of the drd tool. That is indeed try (and the DIOTA paper mentions exactly this optimisation); however I would prefer first to simplify the memory management scenario as much as possible until we can figure out what's going on, and fix it. > I have tried to disable malloc()/free() wrapping, but I didn't see an > obvious difference in memory consumption of the drd tool. No, maybe not, but let's leave it disabled for now since that gives a simpler situation to debug. J |