|
From: <sv...@va...> - 2007-11-05 00:25:52
|
Author: sewardj
Date: 2007-11-05 00:25:53 +0000 (Mon, 05 Nov 2007)
New Revision: 7089
Log:
Finish first version of the manual. Gaaaah!
Modified:
branches/THRCHECK/thrcheck/docs/tc-manual.xml
Modified: branches/THRCHECK/thrcheck/docs/tc-manual.xml
===================================================================
--- branches/THRCHECK/thrcheck/docs/tc-manual.xml 2007-11-04 17:11:19 UTC (rev 7088)
+++ branches/THRCHECK/thrcheck/docs/tc-manual.xml 2007-11-05 00:25:53 UTC (rev 7089)
@@ -38,26 +38,35 @@
<orderedlist>
<listitem>
- <para>Section FIXME: Misuses of the POSIX pthreads API.</para>
+ <para><link linkend="tc-manual.api-checks">
+ Misuses of the POSIX pthreads API.</link></para>
</listitem>
<listitem>
- <para>Section FIXME: Potential deadlocks arising from lock ordering
- problems.</para>
+ <para><link linkend="tc-manual.lock-orders">
+ Potential deadlocks arising from lock
+ ordering problems.</link></para>
</listitem>
<listitem>
- <para>Section FIXME: Data races -- accessing memory without adequate
- locking.</para>
+ <para><link linkend="tc-manual.data-races">
+ Data races -- accessing memory without adequate locking.
+ </link></para>
</listitem>
</orderedlist>
-<para>Section FIXME contains guidance on how to get the best out of Thrcheck.
-This is really a discussion about debugging strategies and how to
-organise your program to enhance its verifiability.</para>
+<para>Following those is a section containing
+<link linkend="tc-manual.effective-use">
+hints and tips on how to get the best out of Thrcheck.</link>
+</para>
-<para>Section FIXME contains a summary of command-line options.</para>
+<para>Then there is a
+<link linkend="tc-manual.options">summary of command-line
+options.</link>
+</para>
-<para>Finally, Section FIXME contains a brief list of areas in which Thrcheck
-could be improved.</para>
+<para>Finally, there is
+<link linkend="tc-manual.todolist">a brief summary of areas in which Thrcheck
+could be improved.</link>
+</para>
</sect1>
@@ -71,7 +80,7 @@
is therefore able to report on various common problems. Although
these are unglamourous errors, their presence can lead to undefined
program behaviour and hard-to-find bugs later in execution. The
-detected errors include:</para>
+detected errors are:</para>
<itemizedlist>
<listitem><para>unlocking an invalid mutex</para></listitem>
@@ -87,11 +96,20 @@
versa</para></listitem>
<listitem><para>when a POSIX pthread function fails with an
error code that must be handled</para></listitem>
+ <listitem><para>when a thread exits whilst still holding locked
+ locks</para></listitem>
+ <listitem><para>calling <computeroutput>pthread_cond_wait</computeroutput>
+ with a not-locked mutex, or one locked by a different
+ thread</para></listitem>
</itemizedlist>
<para>Checks pertaining to the validity of mutexes are generally also
performed for reader-writer locks.</para>
+<para>Various kinds of this-can't-possibly-happen events are also
+reported. These usually indicate bugs in the system threading
+library.</para>
+
<para>Reported errors always contain a primary stack trace indicating
where the error was detected. They may also contain auxiliary stack
traces giving additional information. In particular, most errors
@@ -114,8 +132,9 @@
<para>Thrcheck has a way of summarising thread identities, as
evidenced here by the text "<computeroutput>Thread
#1</computeroutput>". This is so that it can speak about threads and
-sets of threads without overwhelming you with details. See FIXME
-below for details.</para>
+sets of threads without overwhelming you with details. See
+<link linkend="tc-manual.data-races.errmsgs">below</link>
+for more information on interpreting error messages.</para>
</sect1>
@@ -219,7 +238,7 @@
<sect2 id="tc-manual.data-races.example" xreflabel="Simple Race">
-<title>A simple data race</title>
+<title>A Simple Data Race</title>
<para>About the simplest possible example of a race is as follows. In
this program, it is impossible to know what the value
@@ -371,13 +390,20 @@
attempts to write to it, a race is then reported.</para>
<para>This technique is known as "lockset inference" and was
-introduced in FIXME. It has been widely implemented since then.
-Thrcheck incorporates several refinements aimed at reducing the false
-error rate generated by a naive version of the algorithm. In section
-FIXME a summary of the complete algorithm used by Thrcheck is
-presented. First, however, it is important to understand details of
-transitions pertaining to the Exclusive-ownership state.</para>
+introduced in: "Eraser: A Dynamic Data Race Detector for Multithreaded
+Programs" (Stefan Savage, Michael Burrows, Greg Nelson, Patrick
+Sobalvarro and Thomas Anderson, ACM Transactions on Computer Systems,
+15(4):391-411, November 1997).</para>
+<para>Lockset inference has since been widely implemented, studied and
+extended. Thrcheck incorporates several refinements aimed at avoiding
+the high false error rate that naive versions of the algorithm suffer
+from. A
+<link linkend="tc-manual.data-races.summary">summary of the complete
+algorithm used by Thrcheck</link> is presented below. First, however,
+it is important to understand details of transitions pertaining to the
+Exclusive-ownership state.</para>
+
</sect2>
@@ -449,10 +475,13 @@
pthread_create and pthread_join calls, an error is still
reported.</para>
-<para>This technique was introduced in FIXME with the name Thread
-Lifetime Segments. Thrcheck implements an extended version of it.
-Specifically, Thrcheck allows transfer of exclusive ownership in the
-following situations:</para>
+<para>This technique was introduced with the name "thread lifetime
+segments" in "Runtime Checking of Multithreaded Applications with
+Visual Threads" (Jerry J. Harrow, Jr, Proceedings of the 7th
+International SPIN Workshop on Model Checking of Software Stanford,
+California, USA, August 2000, LNCS 1885, pp331--342). Thrcheck
+implements an extended version of it. Specifically, Thrcheck allows
+transfer of exclusive ownership in the following situations:</para>
<itemizedlist>
<listitem><para>At thread creation: a child can acquire ownership of
@@ -464,16 +493,16 @@
(thread that is exiting) at the point it exited.</para>
</listitem>
<listitem><para>At condition variable signallings and broadcasts. A
- thread Twait which completes a pthread_cond_wait call as a result of
+ thread Tw which completes a pthread_cond_wait call as a result of
a signal or broadcast on the same condition variable by some other
- thread Tsig, may acquire ownership of memory held exclusively by
- Tsig prior to the pthread_cond_signal/broadcast
+ thread Ts, may acquire ownership of memory held exclusively by
+ Ts prior to the pthread_cond_signal/broadcast
call.</para>
</listitem>
- <listitem><para>At semaphore posts (sem_post) calls. A thread Twait
+ <listitem><para>At semaphore posts (sem_post) calls. A thread Tw
which completes a sem_wait call call as a result of a sem_post call
- on the same semaphore by some other thread Tpost, may acquire
- ownership of memory held exclusively by Tpost prior to the sem_post
+ on the same semaphore by some other thread Tp, may acquire
+ ownership of memory held exclusively by Tp prior to the sem_post
call.</para>
</listitem>
</itemizedlist>
@@ -809,8 +838,9 @@
<para>Also, this message implies that Thrcheck did not see any
synchronisation event between threads #4 and #5 that would have
-allowed #5 to acquire exclusive ownership from #4. See FIXME for a
-discussion of transfers of exclusive ownership states between
+allowed #5 to acquire exclusive ownership from #4. See
+<link linkend="tc-manual.data-races.exclusive">above</link>
+for a discussion of transfers of exclusive ownership states between
threads.</para>
</sect2>
@@ -936,9 +966,10 @@
<para>The root cause of this synchronisation lossage is
particularly hard to understand, so an example is helpful. It was
- discussed at length by Arndt Muehlenfeldt [FIXME]. The canonical
- POSIX-recommended usage scheme for condition variables is as
- follows:</para>
+ discussed at length by Arndt Muehlenfeld ("Runtime Race Detection
+ in Multi-Threaded Programs", Dissertation, TU Graz, Austria). The
+ canonical POSIX-recommended usage scheme for condition variables
+ is as follows:</para>
<programlisting><![CDATA[
b is a Boolean condition, which is False most of the time
@@ -981,9 +1012,9 @@
<listitem>
<para>Make sure you are using a supported Linux distribution. At
present, Thrcheck only properly supports x86-linux and amd64-linux
- with glibc-2.3 or later. The latter restriction really says that
- we only support the NPTL threading library. The old LinuxThreads
- library is not supported.</para>
+ with glibc-2.3 or later. The latter restriction means we only
+ support the NPTL threading library. The old LinuxThreads library
+ is not supported.</para>
<para>Unsupported targets may work to varying degrees. In
particular ppc32-linux and ppc64-linux running NTPL should work,
@@ -992,6 +1023,19 @@
the lwarx/stwcx instructions.</para>
</listitem>
+ <listitem>
+ <para>POSIX requires that implementations of standard I/O (printf,
+ fprintf, fwrite, fread, etc) are thread safe. Unfortunately GNU
+ libc implements this by using internal locking primitives that
+ Thrcheck is unable to intercept. Consequently Thrcheck generates
+ many false race reports when you use these functions.</para>
+
+ <para>Thrcheck attempts to hide these errors using the standard
+ Valgrind error-suppression mechanism. So, at least for simple
+ test cases, you don't see any. Nevertheless, some may slip
+ through. Just something to be aware of.</para>
+ </listitem>
+
</orderedlist>
</sect1>
@@ -1002,48 +1046,39 @@
<sect1 id="tc-manual.options" xreflabel="Thrcheck Options">
<title>Thrcheck Options</title>
-<para>Currently there is only one Thrcheck-specific option:</para>
+<para>The following end-user options are available:</para>
<!-- start of xi:include in the manpage -->
<variablelist id="tc.opts.list">
<varlistentry id="opt.happens-before" xreflabel="--happens-before">
<term>
- <option><![CDATA[--happens-before=none|threads|condvars
- [default: condvars] ]]></option>
+ <option><![CDATA[--happens-before=none|threads|all
+ [default: all] ]]></option>
</term>
<listitem>
- <para>This option is mostly useful for debugging Thrcheck
- itself. It isn't much use to end users and is a bit difficult
- to explain.
- </para>
<para>Thrcheck always regards locks as the basis for
inter-thread synchronisation. However, by default, before
reporting a race error, Thrcheck will also check whether
certain other kinds of inter-thread synchronisation events
happened. It may be that if such events took place, then no
race really occurred, and so no error needs to be reported.
- This enables Thrcheck to correctly handle the
- worker-thread and worker-thread-pool idioms.
+ See <link linkend="tc-manual.data-races.exclusive">above</link>
+ for a discussion of transfers of exclusive ownership states
+ between threads.
</para>
- <para>With <varname>--happens-before=condvars</varname>, both
- thread creation/joinage, and condition variable
- signal/broadcast/waits are regarded as sources of
- synchronisation, and so both the worker-thread and
- worker-thread-pool idioms are correctly handled. "Correctly
- handled" means that Thrcheck will not falsely report race
- errors for correct uses of these idioms.
+ <para>With <varname>--happens-before=all</varname>, the
+ following events are regarded as sources of synchronisation:
+ thread creation/joinage, condition variable
+ signal/broadcast/waits, and semaphore posts/waits.
</para>
<para>With <varname>--happens-before=threads</varname>, only
thread creation/joinage events are regarded as sources of
- synchronisation, and so only the worker-thread idiom is
- correctly handled. The worker-thread-pool is not correctly
- handled.
+ synchronisation.
</para>
<para>With <varname>--happens-before=none</varname>, no events
(apart, of course, from locking) are regarded as sources of
- synchronisation. And so neither the worker-thread nor
- worker-thread-pool idioms are correctly handled.
+ synchronisation.
</para>
<para>Changing this setting from the default will increase your
false-error rate but give little or no gain. The only advantage
@@ -1055,34 +1090,146 @@
</listitem>
</varlistentry>
+ <varlistentry id="opt.trace-addr" xreflabel="--trace-addr">
+ <term>
+ <option><![CDATA[--trace-addr=0xXXYYZZ
+ ]]></option> and
+ <option><![CDATA[--trace-level=0|1|2 [default: 1]
+ ]]></option>
+ </term>
+ <listitem>
+ <para>Requests that Thrcheck produces a log of all state changes
+ to location 0xXXYYZZ. This can be helpful in tracking down
+ tricky races. <varname>--trace-level</varname> controls the
+ verbosity of the log. At the default setting (1), a one-line
+ summary of is printed for each state change. At level 2 a
+ complete stack trace is printed for each state change.</para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
<!-- end of xi:include in the manpage -->
-</sect1>
+<!-- start of xi:include in the manpage -->
+<para>In addition, the following debugging options are available for
+Thrcheck:</para>
+<variablelist id="tc.debugopts.list">
-<sect1 id="tc-manual.otherstuff" xreflabel="Other Stuff">
-<title>Other Stuff</title>
+ <varlistentry id="opt.trace-malloc" xreflabel="--trace-malloc">
+ <term>
+ <option><![CDATA[--trace-malloc=no|yes [no]
+ ]]></option>
+ </term>
+ <listitem>
+ <para>Show all client malloc (etc) and free (etc) requests.</para>
+ </listitem>
+ </varlistentry>
-<para>FIXME: this section will contain other stuff that it is
-important to document:</para>
+ <varlistentry id="opt.gen-vcg" xreflabel="--gen-vcg">
+ <term>
+ <option><![CDATA[--gen-vcg=no|yes|yes-w-vts [no]
+ ]]></option>
+ </term>
+ <listitem>
+ <para>At exit, write to stderr a dump of the happens-before
+ graph computed by Thrcheck, in a format suitable for the VCG
+ graph visualisation tool. A suitable command line is:</para>
+ <para><computeroutput>valgrind --tool=thrcheck
+ --gen-vcg=yes my_app 2>&1
+ | grep xxxxxx | sed "s/xxxxxx//g"
+ | xvcg -</computeroutput></para>
+ <para>With <varname>--gen-vcg=yes</varname>, the basic
+ happens-before graph is shown. With
+ <varname>--gen-vcg=yes-w-vts</varname>, the vector timestamp
+ for each node is also shown.</para>
+ </listitem>
+ </varlistentry>
-<itemizedlist>
- <listitem><para>LOCK prefixes on x86/amd64
- instructions</para></listitem>
- <listitem><para>Reader-writer locks, and semaphores?
- </para></listitem>
- <listitem><para>Other stuff I forgot?
- </para></listitem>
-</itemizedlist>
+ <varlistentry id="opt.cmp-race-err-addrs"
+ xreflabel="--cmp-race-err-addrs">
+ <term>
+ <option><![CDATA[--cmp-race-err-addrs=no|yes [no]
+ ]]></option>
+ </term>
+ <listitem>
+ <para>Controls whether or not race (data) addresses should be
+ taken into account when removing duplicates of race errors.
+ With <varname>--cmp-race-err-addrs=no</varname>, two otherwise
+ identical race errors will be considered to be the same if
+ their race addresses differ. With
+ With <varname>--cmp-race-err-addrs=yes</varname> they will be
+ considered different. This is provided to help make certain
+ regression tests work reliably.</para>
+ </listitem>
+ </varlistentry>
-Inconsistent cv/mx associations
-Thread exiting whilst holding locks
+ <varlistentry id="opt.tc-sanity-flags" xreflabel="--tc-sanity-flags">
+ <term>
+ <option><![CDATA[--tc-sanity-flags=<XXXXX> (X = 0|1) [00000]
+ ]]></option>
+ </term>
+ <listitem>
+ <para>Run extensive sanity checks on Thrcheck's internal
+ data structures at events defined by the bitstring, as
+ follows:</para>
+ <para><computeroutput>10000 </computeroutput>after changes to
+ the lock order acquisition graph</para>
+ <para><computeroutput>01000 </computeroutput>after every client
+ memory access (NB: not currently used)</para>
+ <para><computeroutput>00100 </computeroutput>after every client
+ memory range permission setting of 256 bytes or greater</para>
+ <para><computeroutput>00010 </computeroutput>after every client
+ lock or unlock event</para>
+ <para><computeroutput>00001 </computeroutput>after every client
+ thread creation or joinage event</para>
+ <para>Note these will make Thrcheck run very slowly, often to
+ the point of being completely unusable.</para>
+ </listitem>
+ </varlistentry>
-waiting for a condition variable without holding
- the associated mutex
+</variablelist>
+<!-- end of xi:include in the manpage -->
-better printing of lock cycles
+
</sect1>
+<sect1 id="tc-manual.todolist" xreflabel="To Do List">
+<title>A To-Do List for Thrcheck</title>
+
+<para>The following is a list of loose ends which should be tidied up
+some time.</para>
+
+<itemizedlist>
+ <listitem><para>Track which mutexes are associated with which
+ condition variables, and emit a warning if this becomes
+ inconsistent.</para>
+ </listitem>
+ <listitem><para>For lock order errors, print the complete lock
+ cycle, rather than only doing for size-2 cycles as at
+ present.</para>
+ </listitem>
+ <listitem><para>Document the VALGRIND_HG_CLEAN_MEMORY client
+ request.</para>
+ </listitem>
+ <listitem><para>Possibly a client request to forcibly transfer
+ ownership of memory from one thread to another. Requires further
+ consideration.</para>
+ </listitem>
+ <listitem><para>Add a new client request that marks an address range
+ as being "shared-modified with empty lockset" (the error state),
+ and describe how to use it.</para>
+ </listitem>
+ <listitem><para>Document races caused by gcc's thread-unsafe code
+ generation for speculative stores. In the interim see
+ <computeroutput>http://gcc.gnu.org/ml/gcc/2007-10/msg00266.html
+ </computeroutput>
+ and <computeroutput>http://lkml.org/lkml/2007/10/24/673</computeroutput>.
+ </para>
+ </listitem>
+
+</itemizedlist>
+
+</sect1>
+
</chapter>
|