|
From: <sv...@va...> - 2008-12-22 00:39:45
|
Author: sewardj
Date: 2008-12-22 00:39:41 +0000 (Mon, 22 Dec 2008)
New Revision: 8860
Log:
Finish off updates to the Helgrind manual.
Modified:
trunk/helgrind/docs/hg-manual.xml
Modified: trunk/helgrind/docs/hg-manual.xml
===================================================================
--- trunk/helgrind/docs/hg-manual.xml 2008-12-21 23:11:14 UTC (rev 8859)
+++ trunk/helgrind/docs/hg-manual.xml 2008-12-22 00:39:41 UTC (rev 8860)
@@ -51,7 +51,7 @@
<para><link linkend="hg-manual.data-races">
Data races -- accessing memory without adequate locking
or synchronisation</link>.
- Note that Helgrind in Valgrind 3.4.0 and later uses a
+ Note that race detection in versions 3.4.0 and later uses a
different algorithm than in 3.3.x. Hence, if you have been using
Helgrind in 3.3.x, you may want to re-read this section.
</para>
@@ -320,7 +320,7 @@
function <computeroutput>main</computeroutput> at line 13 in the
program.</para>
-<para>The error message shows two other important:</para>
+<para>Two important parts of the message are:</para>
<itemizedlist>
<listitem>
@@ -337,8 +337,9 @@
one of these will be a write (since two concurrent, unsynchronised
reads are harmless), and they will of course be from different
threads.</para>
- <para>By examining your program at the two locations, it should be
- fairly clear what the root cause of the problem is.</para>
+ <para>By examining your program at the two locations, you should be
+ able to get at least some idea of what the root cause of the
+ problem is.</para>
</listitem>
<listitem>
<para>For races which occur on global or stack variables, Helgrind
@@ -367,8 +368,8 @@
<para>Most programmers think about threaded programming in terms of
the abstractions provided by the threading library (POSIX Pthreads):
-thread creation, thread joining, locks, condition variables and
-barriers.</para>
+thread creation, thread joining, locks, condition variables,
+semaphores and barriers.</para>
<para>The effect of using locks, barriers, etc, is to impose on a
threaded program, constraints upon the order in which memory accesses
@@ -376,22 +377,25 @@
"happens-before relationship". Once you understand the happens-before
relationship, it is easy to see how Helgrind finds races in your code.
Fortunately, the happens-before relationship is itself easy to
-understand, and, additionally, is by itself a useful tool for
-reasoning about the behaviour of parallel programs. We now introduce
-it using a simple example.</para>
+understand, and is by itself a useful tool for reasoning about the
+behaviour of parallel programs. We now introduce it using a simple
+example.</para>
<para>Consider first the following buggy program:</para>
<programlisting><![CDATA[
- int var;
+Parent thread: Child thread:
- create child
-
- var = 20; var = 10;
- exit
+int var;
- wait for child
- print(var);
+// create child thread
+pthread_create(...)
+var = 20; var = 10;
+ exit
+
+// wait for child
+pthread_join(...)
+printf("%d\n", var);
]]></programlisting>
<para>The parent thread creates a child. Both then write different
@@ -418,18 +422,21 @@
send a message from one thread to the other:</para>
<programlisting><![CDATA[
- int var;
+Parent thread: Child thread:
- create child
-
- var = 20;
- send message
- wait for message
- var = 10;
- exit
+int var;
- wait for child
- print(var);
+// create child thread
+pthread_create(...)
+var = 20;
+// send message to child
+ // wait for message to arrive
+ var = 10;
+ exit
+
+// wait for child
+pthread_join(...)
+printf("%d\n", var);
]]></programlisting>
<para>Now the program reliably prints "10", regardless of the speed of
@@ -464,7 +471,7 @@
<computeroutput>x</computeroutput> is less than, equal to, or greater
than
<computeroutput>y</computeroutput>. A partial ordering is like a
-total ordering, but it can also express the concepts that two elements
+total ordering, but it can also express the concept that two elements
are neither equal, less or greater, but merely unordered with respect
to each other.</para>
@@ -495,14 +502,14 @@
although with some complication so as to allow correct handling of
reads vs writes.</para>
</listitem>
- <listitem><para>When a condition variable is signed on by thread T1
- and some other thread T2 is thereby released from a wait on the same
- CV, then the memory accesses in T1 prior to the signalling must
- happen-before those in T2 after it returns from the wait. If no
- thread was waiting on the CV then there is no
+ <listitem><para>When a condition variable (CV) is signalled on by
+ thread T1 and some other thread T2 is thereby released from a wait
+ on the same CV, then the memory accesses in T1 prior to the
+ signalling must happen-before those in T2 after it returns from the
+ wait. If no thread was waiting on the CV then there is no
effect.</para>
</listitem>
- <listitem><para>If instead T1 broadcasts on a CV then all of the
+ <listitem><para>If instead T1 broadcasts on a CV, then all of the
waiting threads, rather than just one of them, acquire a
happens-before dependency on the broadcasting thread at the point it
did the broadcast.</para>
@@ -532,20 +539,20 @@
</listitem>
</itemizedlist>
-<para>Helgrind intercepts the above listed events, and builds a
+<para>In summary: Helgrind intercepts the above listed events, and builds a
directed acyclic graph represented the collective happens-before
dependencies. It also monitors all memory accesses.</para>
<para>If a location is accessed by two different threads, but Helgrind
cannot find any path through the happens-before graph from one access
-to the other, then it complains of a race.</para>
+to the other, then it reports a race.</para>
<para>There are a couple of caveats:</para>
<itemizedlist>
- <listitem><para>Helgrind doesn't check in the case where both
- accesses are reads. That would be silly, since concurrent reads are
- harmless.</para>
+ <listitem><para>Helgrind doesn't check for a race in the case where
+ both accesses are reads. That would be silly, since concurrent
+ reads are harmless.</para>
</listitem>
<listitem><para>Two accesses are considered to be ordered by the
happens-before dependency even through arbitrarily long chains of
@@ -627,8 +634,8 @@
requires considerable amounts of memory, for large programs.
</para>
-<para>Once you have your two call stacks, how do you begin to get to
-the root problem?</para>
+<para>Once you have your two call stacks, how do you find the root
+cause of the race?</para>
<para>The first thing to do is examine the source locations referred
to by each call stack. They should both show an access to the same
@@ -644,14 +651,14 @@
Did you perhaps forget the locking at one or other of the
accesses?</para>
</listitem>
- <listitem><para>Alternatively, you intended to use a some other
- scheme to make it safe, such as signalling on a condition variable.
- In all such cases, try to find a synchronisation event (or a chain
- thereof) which separates the earlier-observed access (as shown in the
- second call stack) from the later-observed access (as shown in the
- first call stack). In other words, try to find evidence that the
- earlier access "happens-before" the later access. See the previous
- subsection for an explanation of the happens-before
+ <listitem><para>Alternatively, perhaps you intended to use a some
+ other scheme to make it safe, such as signalling on a condition
+ variable. In all such cases, try to find a synchronisation event
+ (or a chain thereof) which separates the earlier-observed access (as
+ shown in the second call stack) from the later-observed access (as
+ shown in the first call stack). In other words, try to find
+ evidence that the earlier access "happens-before" the later access.
+ See the previous subsection for an explanation of the happens-before
relationship.</para>
<para>
The fact that Helgrind is reporting a race means it did not observe
@@ -932,62 +939,71 @@
<!-- start of xi:include in the manpage -->
<variablelist id="hg.opts.list">
- <varlistentry id="opt.happens-before" xreflabel="--happens-before">
+ <varlistentry id="opt.track-lockorders"
+ xreflabel="--track-lockorders">
<term>
- <option><![CDATA[--happens-before=none|threads|all
- [default: all] ]]></option>
+ <option><![CDATA[--track-lockorders=no|yes
+ [default: yes] ]]></option>
</term>
<listitem>
- <para>Helgrind always regards locks as the basis for
- inter-thread synchronisation. However, by default, before
- reporting a race error, Helgrind will also check whether
- certain other kinds of inter-thread synchronisation events
- happened. It may be that if such events took place, then no
- race really occurred, and so no error needs to be reported.
- See <link linkend="hg-manual.data-races.exclusive">above</link>
- for a discussion of transfers of exclusive ownership states
- between threads.
- </para>
- <para>With <varname>--happens-before=all</varname>, the
- following events are regarded as sources of synchronisation:
- thread creation/joinage, condition variable
- signal/broadcast/waits, and semaphore posts/waits.
- </para>
- <para>With <varname>--happens-before=threads</varname>, only
- thread creation/joinage events are regarded as sources of
- synchronisation.
- </para>
- <para>With <varname>--happens-before=none</varname>, no events
- (apart, of course, from locking) are regarded as sources of
- synchronisation.
- </para>
- <para>Changing this setting from the default will increase your
- false-error rate but give little or no gain. The only advantage
- is that <option>--happens-before=threads</option> and
- <option>--happens-before=none</option> should make Helgrind
- less and less sensitive to the scheduling of threads, and hence
- the output more and more repeatable across runs.
- </para>
+ <para>When enabled (the default), Helgrind performs lock order
+ consistency checking. For some buggy programs, the large number
+ of lock order errors reported can become annoying, particularly
+ if you're only interested in race errors. You may therefore find
+ it helpful to disable lock order checking.</para>
</listitem>
</varlistentry>
- <varlistentry id="opt.trace-addr" xreflabel="--trace-addr">
+ <varlistentry id="opt.show-conflicts"
+ xreflabel="--show-conflicts">
<term>
- <option><![CDATA[--trace-addr=0xXXYYZZ
- ]]></option> and
- <option><![CDATA[--trace-level=0|1|2 [default: 1]
- ]]></option>
+ <option><![CDATA[--show-conflicts=no|yes
+ [default: yes] ]]></option>
</term>
<listitem>
- <para>Requests that Helgrind produces a log of all state changes
- to location 0xXXYYZZ. This can be helpful in tracking down
- tricky races. <varname>--trace-level</varname> controls the
- verbosity of the log. At the default setting (1), a one-line
- summary of is printed for each state change. At level 2 a
- complete stack trace is printed for each state change.</para>
+ <para>When enabled (the default), Helgrind collects enough
+ information about "old" accesses that it can produce two stack
+ traces in a race report -- both the stack trace for the
+ current access, and the trace for the older, conflicting
+ access.</para>
+ <para>Collecting such information is expensive in both speed and
+ memory. This flag disables collection of such information.
+ Helgrind will run significantly faster and use less memory,
+ but without the conflicting access stacks, it will be very
+ much more difficult to track down the root causes of
+ races. However, this option may be useful in situations where
+ you just want to check for the presence or absence of races,
+ for example, when doing regression testing of a previously
+ race-free program.</para>
</listitem>
</varlistentry>
+ <varlistentry id="opt.conflict-cache-size"
+ xreflabel="--conflict-cache-size">
+ <term>
+ <option><![CDATA[--conflict-cache-size=N
+ [default: 1000000] ]]></option>
+ </term>
+ <listitem>
+ <para>Information about "old" conflicting accesses is stored in
+ a cache of limited size, with LRU-style management. This is
+ necessary because it isn't practical to store a stack trace
+ for every single memory access made by the program.
+ Historical information on not recently accessed locations is
+ periodically discarded, to free up space in the cache.</para>
+ <para>This flag controls the size of the cache, in terms of the
+ number of different memory addresses for which
+ conflicting access information is stored. If you find that
+ Helgrind is showing race errors with only one stack instead of
+ the expected two stacks, try increasing this value.</para>
+ <para>The minimum value is 10,000 and the maximum is 10,000,000
+ (ten times the default value). Increasing the value by 1
+ increases Helgrind's memory requirement by very roughly 100
+ bytes, so the maximum value will easily eat up an extra
+ gigabyte or so of memory.</para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
<!-- end of xi:include in the manpage -->
@@ -1007,26 +1023,6 @@
</listitem>
</varlistentry>
- <varlistentry id="opt.gen-vcg" xreflabel="--gen-vcg">
- <term>
- <option><![CDATA[--gen-vcg=no|yes|yes-w-vts [no]
- ]]></option>
- </term>
- <listitem>
- <para>At exit, write to stderr a dump of the happens-before
- graph computed by Helgrind, in a format suitable for the VCG
- graph visualisation tool. A suitable command line is:</para>
- <para><computeroutput>valgrind --tool=helgrind
- --gen-vcg=yes my_app 2>&1
- | grep xxxxxx | sed "s/xxxxxx//g"
- | xvcg -</computeroutput></para>
- <para>With <varname>--gen-vcg=yes</varname>, the basic
- happens-before graph is shown. With
- <varname>--gen-vcg=yes-w-vts</varname>, the vector timestamp
- for each node is also shown.</para>
- </listitem>
- </varlistentry>
-
<varlistentry id="opt.cmp-race-err-addrs"
xreflabel="--cmp-race-err-addrs">
<term>
@@ -1054,8 +1050,6 @@
<para>Run extensive sanity checks on Helgrind's internal
data structures at events defined by the bitstring, as
follows:</para>
- <para><computeroutput>100000 </computeroutput>at every query
- to the happens-before graph</para>
<para><computeroutput>010000 </computeroutput>after changes to
the lock order acquisition graph</para>
<para><computeroutput>001000 </computeroutput>after every client
@@ -1095,14 +1089,11 @@
<listitem><para>Document the VALGRIND_HG_CLEAN_MEMORY client
request.</para>
</listitem>
- <listitem><para>Possibly a client request to forcibly transfer
- ownership of memory from one thread to another. Requires further
- consideration.</para>
+ <listitem><para>The conflicting access mechanism sometimes
+ mysteriously fails to show the conflicting access' stack, even
+ when provided with unbounded storage for conflicting access info.
+ This should be investigated.</para>
</listitem>
- <listitem><para>Add a new client request that marks an address range
- as being "shared-modified with empty lockset" (the error state),
- and describe how to use it.</para>
- </listitem>
<listitem><para>Document races caused by gcc's thread-unsafe code
generation for speculative stores. In the interim see
<computeroutput>http://gcc.gnu.org/ml/gcc/2007-10/msg00266.html
@@ -1119,8 +1110,8 @@
generate false lock-order errors and confuse users.</para>
</listitem>
<listitem><para> Performance can be very poor. Slowdowns on the
- order of 100:1 are not unusual. There is quite some scope for
- performance improvements, though.
+ order of 100:1 are not unusual. There is limited scope for
+ performance improvements.
</para>
</listitem>
|