You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
1
(3) |
|
2
(8) |
3
(19) |
4
(24) |
5
(23) |
6
(16) |
7
(33) |
8
(5) |
|
9
(4) |
10
(23) |
11
(22) |
12
(40) |
13
(30) |
14
(31) |
15
(17) |
|
16
(18) |
17
(20) |
18
(41) |
19
(36) |
20
(25) |
21
(8) |
22
(9) |
|
23
(17) |
24
(12) |
25
(15) |
26
(15) |
27
(16) |
28
(22) |
29
(6) |
|
30
(7) |
31
(10) |
|
|
|
|
|
|
From: <sv...@va...> - 2009-08-07 05:23:39
|
Author: njn
Date: 2009-08-07 06:23:31 +0100 (Fri, 07 Aug 2009)
New Revision: 10735
Log:
Minor formatting and grammar changes to the DRD manual chapter, to make it
consistent with the rest of the documentation.
Modified:
trunk/drd/docs/drd-manual.xml
Modified: trunk/drd/docs/drd-manual.xml
===================================================================
--- trunk/drd/docs/drd-manual.xml 2009-08-07 04:55:15 UTC (rev 10734)
+++ trunk/drd/docs/drd-manual.xml 2009-08-07 05:23:31 UTC (rev 10735)
@@ -48,13 +48,13 @@
<para>
Multithreaded programs can use one or more of the following programming
-paradigms. Which paradigm is appropriate depends a.o. on the application type.
+paradigms. Which paradigm is appropriate depends e.g. on the application type.
Some examples of multithreaded programming paradigms are:
<itemizedlist>
<listitem>
<para>
Locking. Data that is shared over threads is protected from concurrent
- accesses via locking. A.o. the POSIX threads library, the Qt library
+ accesses via locking. E.g. the POSIX threads library, the Qt library
and the Boost.Thread library support this paradigm directly.
</para>
</listitem>
@@ -85,10 +85,9 @@
threads is updated via transactions. After each transaction it is
verified whether there were any conflicting transactions. If there were
conflicts, the transaction is aborted, otherwise it is committed. This
- is a so-called optimistic approach. There is a prototype of the Intel C
- Compiler (<computeroutput>icc</computeroutput>) available that supports
- STM. Research about the addition of STM support
- to GCC is ongoing.
+ is a so-called optimistic approach. There is a prototype of the Intel C++
+ Compiler available that supports STM. Research about the addition of
+ STM support to GCC is ongoing.
</para>
</listitem>
</itemizedlist>
@@ -254,7 +253,7 @@
</orderedlist>
The combination of program order and synchronization order is called the
<emphasis>happens-before relationship</emphasis>. This concept was first
-defined by S. Adve e.a. in the paper <emphasis>Detecting data races on weak
+defined by S. Adve et al in the paper <emphasis>Detecting data races on weak
memory systems</emphasis>, ACM SIGARCH Computer Architecture News, v.19 n.3,
p.234-243, May 1991.
</para>
@@ -330,7 +329,7 @@
</term>
<listitem>
<para>
- Controls whether <constant>DRD</constant> detects data races on stack
+ Controls whether DRD detects data races on stack
variables. Verifying stack variables is disabled by default because
most programs do not share stack variables over threads.
</para>
@@ -371,11 +370,11 @@
<listitem>
<para>
Whether to report calls to
- <function>pthread_cond_signal()</function> and
- <function>pthread_cond_broadcast()</function> where the mutex
+ <function>pthread_cond_signal</function> and
+ <function>pthread_cond_broadcast</function> where the mutex
associated with the signal through
- <function>pthread_cond_wait()</function> or
- <function>pthread_cond_timed_wait()</function>is not locked at
+ <function>pthread_cond_wait</function> or
+ <function>pthread_cond_timed_wait</function>is not locked at
the time the signal is sent. Sending a signal without holding
a lock on the associated mutex is a common programming error
which can cause subtle race conditions and unpredictable
@@ -635,8 +634,9 @@
<listitem>
<para>
Next, the call stack of the conflicting access is displayed. If
- your program has been compiled with debug information (-g), this
- call stack will include file names and line numbers. The two
+ your program has been compiled with debug information
+ (<option>-g</option>), this call stack will include file names and
+ line numbers. The two
bottommost frames in this call stack (<function>clone</function>
and <function>start_thread</function>) show how the NPTL starts
a thread. The third frame
@@ -644,7 +644,7 @@
fourth frame (<function>thread_func</function>) is the first
interesting line because it shows the thread entry point, that
is the function that has been passed as the third argument to
- <function>pthread_create()</function>.
+ <function>pthread_create</function>.
</para>
</listitem>
<listitem>
@@ -782,7 +782,7 @@
</listitem>
<listitem>
<para>
- Calling <function>pthread_cond_wait()</function> on a mutex
+ Calling <function>pthread_cond_wait</function> on a mutex
that is not locked, that is locked by another thread or that
has been locked recursively.
</para>
@@ -790,7 +790,7 @@
<listitem>
<para>
Associating two different mutexes with a condition variable
- through <function>pthread_cond_wait()</function>.
+ through <function>pthread_cond_wait</function>.
</para>
</listitem>
<listitem>
@@ -865,14 +865,14 @@
<para>
Just as for other Valgrind tools it is possible to let a client program
interact with the DRD tool through client requests. In addition to the
-client requests several macro's have been defined that allow to use the
+client requests several macros have been defined that allow to use the
client requests in a convenient way.
</para>
<para>
The interface between client programs and the DRD tool is defined in
the header file <literal><valgrind/drd.h></literal>. The
-available macro's and client requests are:
+available macros and client requests are:
<itemizedlist>
<listitem>
<para>
@@ -896,7 +896,7 @@
</listitem>
<listitem>
<para>
- The macro's <literal>DRD_IGNORE_VAR(x)</literal>,
+ The macros <literal>DRD_IGNORE_VAR(x)</literal>,
<literal>ANNOTATE_TRACE_MEMORY(&x)</literal> and the corresponding
client request <varname>VG_USERREQ__DRD_START_SUPPRESSION</varname>. Some
applications contain intentional races. There exist e.g. applications
@@ -917,7 +917,7 @@
</listitem>
<listitem>
<para>
- The macro's <literal>DRD_TRACE_VAR(x)</literal>,
+ The macros <literal>DRD_TRACE_VAR(x)</literal>,
<literal>ANNOTATE_TRACE_MEMORY(&x)</literal>
and the corresponding client request
<varname>VG_USERREQ__DRD_START_TRACE_ADDR</varname>. Trace all
@@ -947,7 +947,7 @@
the next access to the variable at the specified address should be
considered to have happened after the access just before the latest
<literal>ANNOTATE_HAPPENS_BEFORE(addr)</literal> annotation that
- references the same variable. The purpose of these two macro's is to
+ references the same variable. The purpose of these two macros is to
tell DRD about the order of inter-thread memory accesses implemented via
atomic memory operations.
</para>
@@ -1104,12 +1104,12 @@
<computeroutput>gthread</computeroutput> libraries. These libraries
are built on top of POSIX threads, and hence are directly supported by
DRD. Please keep in mind that you have to call
-<function>g_thread_init()</function> before creating any threads, or
+<function>g_thread_init</function> before creating any threads, or
DRD will report several data races on glib functions. See also the
<ulink
url="http://library.gnome.org/devel/glib/stable/glib-Threads.html">GLib
Reference Manual</ulink> for more information about
-<function>g_thread_init()</function>.
+<function>g_thread_init</function>.
</para>
<para>
@@ -1303,7 +1303,7 @@
To know where the scope ends of POSIX objects that have not been
destroyed explicitly. It is e.g. not required by the POSIX
threads standard to call
- <function>pthread_mutex_destroy()</function> before freeing the
+ <function>pthread_mutex_destroy</function> before freeing the
memory in which a mutex object resides.
</para>
</listitem>
@@ -1322,8 +1322,8 @@
It is essential for correct operation of DRD that the tool knows about
memory allocation and deallocation events. When analyzing a client program
with DRD that uses a custom memory allocator, either instrument the custom
-memory allocator with the <literal>VALGRIND_MALLOCLIKE_BLOCK()</literal>
-and <literal>VALGRIND_FREELIKE_BLOCK()</literal> macro's or disable the
+memory allocator with the <literal>VALGRIND_MALLOCLIKE_BLOCK</literal>
+and <literal>VALGRIND_FREELIKE_BLOCK</literal> macros or disable the
custom memory allocator.
</para>
@@ -1346,14 +1346,14 @@
<para>
It is essential for correct operation of DRD that there are no memory
errors such as dangling pointers in the client program. Which means that
-it is a good idea to make sure that your program is memcheck-clean
+it is a good idea to make sure that your program is Memcheck-clean
before you analyze it with DRD. It is possible however that some of
-the memcheck reports are caused by data races. In this case it makes
-sense to run DRD before memcheck.
+the Memcheck reports are caused by data races. In this case it makes
+sense to run DRD before Memcheck.
</para>
<para>
-So which tool should be run first ? In case both DRD and memcheck
+So which tool should be run first? In case both DRD and Memcheck
complain about a program, a possible approach is to run both tools
alternatingly and to fix as many errors as possible after each run of
each tool until none of the two tools prints any more error messages.
@@ -1389,7 +1389,7 @@
<para>
Most applications will run between 20 and 50 times slower under
DRD than a native single-threaded run. The slowdown will be most
- noticeable for applications which perform very much mutex lock /
+ noticeable for applications which perform frequent mutex lock /
unlock operations.
</para>
</listitem>
@@ -1438,7 +1438,7 @@
<literal>std::cout</literal>. Doing so would not only
generate multiple data race reports, it could also result in
output from several threads getting mixed up. Either use
- <function>printf()</function> or do the following:
+ <function>printf</function> or do the following:
<orderedlist>
<listitem>
<para>Derive a class from <literal>std::ostreambuf</literal>
@@ -1480,7 +1480,7 @@
<para>
The Single UNIX Specification version two defines the following four
mutex types (see also the documentation of <ulink
-url="http://www.opengroup.org/onlinepubs/007908799/xsh/pthread_mutexattr_settype.html"><function>pthread_mutexattr_settype()</function></ulink>):
+url="http://www.opengroup.org/onlinepubs/007908799/xsh/pthread_mutexattr_settype.html"><function>pthread_mutexattr_settype</function></ulink>):
<itemizedlist>
<listitem>
<para>
@@ -1547,11 +1547,11 @@
</sect2>
<sect2 id="drd-manual.pctw" xreflabel="pthread_cond_timedwait">
-<title>pthread_cond_timedwait() and timeouts</title>
+<title><function>pthread_cond_timedwait</function> and timeouts</title>
<para>
Historically the function
-<function>pthread_cond_timedwait()</function> only allowed the
+<function>pthread_cond_timedwait</function> only allowed the
specification of an absolute timeout, that is a timeout independent of
the time when this function was called. However, almost every call to
this function expresses a relative timeout. This typically happens by
@@ -1564,8 +1564,8 @@
<listitem>
<para>
When initializing a condition variable through
- pthread_cond_init(), specify that the timeout of
- pthread_cond_timedwait() will use the clock
+ <function>pthread_cond_init</function>, specify that the timeout of
+ <function>pthread_cond_timedwait</function> will use the clock
<literal>CLOCK_MONOTONIC</literal> instead of
<literal>CLOCK_REALTIME</literal>. You can do this via
<computeroutput>pthread_condattr_setclock(...,
@@ -1574,7 +1574,7 @@
</listitem>
<listitem>
<para>
- When calling <function>pthread_cond_timedwait()</function>, pass
+ When calling <function>pthread_cond_timedwait</function>, pass
the sum of
<computeroutput>clock_gettime(CLOCK_MONOTONIC)</computeroutput>
and a relative timeout as the third argument.
@@ -1597,7 +1597,7 @@
application it can be very convenient to know which thread logged
which information. One possible approach is to identify threads in
logging output by including the result of
-<function>pthread_self()</function> in every log line. However, this approach
+<function>pthread_self</function> in every log line. However, this approach
has two disadvantages: there is no direct relationship between these
values and the source code and these values can be different in each
run. A better approach is to assign a brief name to each thread and to
@@ -1607,19 +1607,19 @@
<listitem>
<para>
Allocate a key for the pointer to the thread name through
- <function>pthread_key_create()</function>.
+ <function>pthread_key_create</function>.
</para>
</listitem>
<listitem>
<para>
Just after thread creation, set the thread name through
- <function>pthread_setspecific()</function>.
+ <function>pthread_setspecific</function>.
</para>
</listitem>
<listitem>
<para>
In the code that generates the logging information, query the thread
- name by calling <function>pthread_getspecific()</function>.
+ name by calling <function>pthread_getspecific</function>.
</para>
</listitem>
</itemizedlist>
@@ -1639,9 +1639,7 @@
<itemizedlist>
<listitem>
<para>
- DRD has only been tested on the Linux operating system, and not
- on any of the other operating systems supported by
- Valgrind.
+ DRD has only been tested on Linux and Mac OS X.
</para>
</listitem>
<listitem>
@@ -1653,12 +1651,12 @@
</listitem>
<listitem>
<para>
- DRD, just like memcheck, will refuse to start on Linux
+ DRD, just like Memcheck, will refuse to start on Linux
distributions where all symbol information has been removed from
- ld.so. This is a.o. the case for the PPC editions of openSUSE
- and Gentoo. You will have to install the glibc debuginfo package
- on these platforms before you can use DRD. See also openSUSE bug
- <ulink url="http://bugzilla.novell.com/show_bug.cgi?id=396197">
+ <filename>ld.so</filename>. This is e.g. the case for the PPC editions
+ of openSUSE and Gentoo. You will have to install the glibc debuginfo
+ package on these platforms before you can use DRD. See also openSUSE
+ bug <ulink url="http://bugzilla.novell.com/show_bug.cgi?id=396197">
396197</ulink> and Gentoo bug <ulink
url="http://bugs.gentoo.org/214065">214065</ulink>.
</para>
|
|
From: <sv...@va...> - 2009-08-07 04:55:23
|
Author: njn
Date: 2009-08-07 05:55:15 +0100 (Fri, 07 Aug 2009)
New Revision: 10734
Log:
Overhauled the how-to-write-a-new-tool chapter.
Modified:
trunk/docs/xml/manual-writing-tools.xml
Modified: trunk/docs/xml/manual-writing-tools.xml
===================================================================
--- trunk/docs/xml/manual-writing-tools.xml 2009-08-07 04:07:20 UTC (rev 10733)
+++ trunk/docs/xml/manual-writing-tools.xml 2009-08-07 04:55:15 UTC (rev 10734)
@@ -14,14 +14,15 @@
<title>Introduction</title>
<para>The key idea behind Valgrind's architecture is the division
-between its "core" and "tool plug-ins".</para>
+between its <emphasis>core</emphasis> and <emphasis>tools</emphasis>.</para>
<para>The core provides the common low-level infrastructure to
support program instrumentation, including the JIT
compiler, low-level memory manager, signal handling and a
-scheduler (for pthreads). It also provides certain services that
+thread scheduler. It also provides certain services that
are useful to some but not all tools, such as support for error
-recording and suppression.</para>
+recording, and support for replacing heap allocation functions such as
+<function>malloc</function>.</para>
<para>But the core leaves certain operations undefined, which
must be filled by tools. Most notably, tools define how program
@@ -34,13 +35,13 @@
-<sect1 id="manual-writing-tools.writingatool" xreflabel="Writing a Tool">
-<title>Writing a Tool</title>
+<sect1 id="manual-writing-tools.writingatool" xreflabel="Basics">
+<title>Basics</title>
<sect2 id="manual-writing-tools.howtoolswork" xreflabel="How tools work">
<title>How tools work</title>
-<para>Tool plug-ins must define various functions for instrumenting programs
+<para>Tools must define various functions for instrumenting programs
that are called by Valgrind's core. They are then linked against
Valgrind's core to define a complete Valgrind tool which will be used
when the <option>--tool</option> option is used to select it.</para>
@@ -87,19 +88,20 @@
</listitem>
<listitem>
- <para>Create empty files
- <filename>foobar/docs/Makefile.am</filename> and
- <filename>foobar/tests/Makefile.am</filename>.
+ <para>Create an empty file <filename>foobar/tests/Makefile.am</filename>.
</para>
</listitem>
<listitem>
<para>Copy <filename>none/Makefile.am</filename> into
<filename>foobar/</filename>. Edit it by replacing all
- occurrences of the string <computeroutput>"none"</computeroutput> with
- <computeroutput>"foobar"</computeroutput>, and all occurrences of
- the string <computeroutput>"nl_"</computeroutput> with
- <computeroutput>"fb_"</computeroutput>.</para>
+ occurrences of the strings
+ <computeroutput>"none"</computeroutput>,
+ <computeroutput>"nl_"</computeroutput> and
+ <computeroutput>"nl-"</computeroutput> with
+ <computeroutput>"foobar"</computeroutput>,
+ <computeroutput>"fb_"</computeroutput> and
+ <computeroutput>"fb-"</computeroutput> respectively.</para>
</listitem>
<listitem>
@@ -107,11 +109,11 @@
<computeroutput>foobar/</computeroutput>, renaming it as
<filename>fb_main.c</filename>. Edit it by changing the
<computeroutput>details</computeroutput> lines in
- <function>nl_pre_clo_init()</function> to something appropriate for the
+ <function>nl_pre_clo_init</function> to something appropriate for the
tool. These fields are used in the startup message, except for
<computeroutput>bug_reports_to</computeroutput> which is used if a
- tool assertion fails. Also replace the string
- <computeroutput>"nl_"</computeroutput> with
+ tool assertion fails. Also, replace the string
+ <computeroutput>"nl_"</computeroutput> throughout with
<computeroutput>"fb_"</computeroutput> again.</para>
</listitem>
@@ -119,13 +121,12 @@
<para>Edit <filename>Makefile.am</filename>, adding the new directory
<filename>foobar</filename> to the
<computeroutput>TOOLS</computeroutput> or
- <computeroutput>EXP_TOOLS</computeroutput>variables.</para>
+ <computeroutput>EXP_TOOLS</computeroutput> variables.</para>
</listitem>
<listitem>
<para>Edit <filename>configure.in</filename>, adding
- <filename>foobar/Makefile</filename>,
- <filename>foobar/docs/Makefile</filename> and
+ <filename>foobar/Makefile</filename> and
<filename>foobar/tests/Makefile</filename> to the
<computeroutput>AC_OUTPUT</computeroutput> list.</para>
</listitem>
@@ -166,7 +167,7 @@
</orderedlist>
-<para>These steps don't have to be followed exactly - you can choose
+<para>These steps don't have to be followed exactly -- you can choose
different names for your source files, and use a different
<option>--prefix</option> for
<computeroutput>./configure</computeroutput>.</para>
@@ -190,8 +191,7 @@
<para>The names can be different to the above, but these are the usual
names. The first one is registered using the macro
-<computeroutput>VG_DETERMINE_INTERFACE_VERSION</computeroutput> (which also
-checks that the core/tool interface of the tool matches that of the core).
+<computeroutput>VG_DETERMINE_INTERFACE_VERSION</computeroutput>.
The last three are registered using the
<computeroutput>VG_(basic_tool_funcs)</computeroutput> function.</para>
@@ -207,25 +207,25 @@
<title>Initialisation</title>
<para>Most of the initialisation should be done in
-<function>pre_clo_init()</function>. Only use
-<function>post_clo_init()</function> if a tool provides command line
+<function>pre_clo_init</function>. Only use
+<function>post_clo_init</function> if a tool provides command line
options and must do some initialisation after option processing takes
place (<computeroutput>"clo"</computeroutput> stands for "command line
options").</para>
<para>First of all, various "details" need to be set for a tool, using
-the functions <function>VG_(details_*)()</function>. Some are all
+the functions <function>VG_(details_*)</function>. Some are all
compulsory, some aren't. Some are used when constructing the startup
message, <computeroutput>detail_bug_reports_to</computeroutput> is used
-if <computeroutput>VG_(tool_panic)()</computeroutput> is ever called, or
+if <computeroutput>VG_(tool_panic)</computeroutput> is ever called, or
a tool assertion fails. Others have other uses.</para>
<para>Second, various "needs" can be set for a tool, using the functions
-<function>VG_(needs_*)()</function>. They are mostly booleans, and can
+<function>VG_(needs_*)</function>. They are mostly booleans, and can
be left untouched (they default to <varname>False</varname>). They
determine whether a tool can do various things such as: record, report
and suppress errors; process command line options; wrap system calls;
-record extra information about heap blocks, etc.</para>
+record extra information about heap blocks; etc.</para>
<para>For example, if a tool wants the core's help in recording and
reporting errors, it must call
@@ -233,13 +233,11 @@
eight functions for comparing errors, printing out errors, reading
suppressions from a suppressions file, etc. While writing these
functions requires some work, it's much less than doing error handling
-from scratch because the core is doing most of the work. See the
-function <function>VG_(needs_tool_errors)</function> in
-<filename>include/pub_tool_tooliface.h</filename> for full details of
-all the needs.</para>
+from scratch because the core is doing most of the work.
+</para>
<para>Third, the tool can indicate which events in core it wants to be
-notified about, using the functions <function>VG_(track_*)()</function>.
+notified about, using the functions <function>VG_(track_*)</function>.
These include things such as heap blocks being allocated, the stack
pointer changing, a mutex being locked, etc. If a tool wants to know
about this, it should provide a pointer to a function, which will be
@@ -247,7 +245,7 @@
<para>For example, if the tool want to be notified when a new heap block
is allocated, it should call
-<function>VG_(track_new_mem_heap)()</function> with an appropriate
+<function>VG_(track_new_mem_heap)</function> with an appropriate
function pointer, and the assigned function will be called each time
this happens.</para>
@@ -262,9 +260,9 @@
<sect2 id="manual-writing-tools.instr" xreflabel="Instrumentation">
<title>Instrumentation</title>
-<para><function>instrument()</function> is the interesting one. It
+<para><function>instrument</function> is the interesting one. It
allows you to instrument <emphasis>VEX IR</emphasis>, which is
-Valgrind's RISC-like intermediate language. VEX IR is described fairly well
+Valgrind's RISC-like intermediate language. VEX IR is described
in the comments of the header file
<filename>VEX/pub/libvex_ir.h</filename>.</para>
@@ -308,21 +306,21 @@
implementation of a reasonable subset of the C library, details of which
are in <filename>pub_tool_libc*.h</filename>.</para>
-<para>When writing a tool, you shouldn't need to look at any of the code in
-Valgrind's core. Although it might be useful sometimes to help understand
-something.</para>
+<para>When writing a tool, in theory you shouldn't need to look at any of
+the code in Valgrind's core, but in practice it might be useful sometimes to
+help understand something.</para>
<para>The <filename>include/pub_tool_basics.h</filename> and
-<filename>VEX/pub/libvex_basictypes.h</filename> files file have some basic
+<filename>VEX/pub/libvex_basictypes.h</filename> files have some basic
types that are widely used.</para>
<para>Ultimately, the tools distributed (Memcheck, Cachegrind, Lackey, etc.)
are probably the best documentation of all, for the moment.</para>
-<para>Note that the <computeroutput>VG_</computeroutput> macro is used
+<para>The <computeroutput>VG_</computeroutput> macro is used
heavily. This just prepends a longer string in front of names to avoid
potential namespace clashes. It is defined in
-<filename>include/pub_tool_basics_asm.h</filename>.</para>
+<filename>include/pub_tool_basics.h</filename>.</para>
<para>There are some assorted notes about various aspects of the
implementation in <filename>docs/internals/</filename>. Much of it
@@ -331,16 +329,22 @@
</sect2>
-<sect2 id="manual-writing-tools.advice" xreflabel="Words of Advice">
-<title>Words of Advice</title>
+</sect1>
-<para>Writing and debugging tools is not trivial. Here are some
-suggestions for solving common problems.</para>
-<sect3 id="manual-writing-tools.segfaults">
-<title>Segmentation Faults</title>
+<sect1 id="manual-writing-tools.advtopics" xreflabel="Advanced Topics">
+<title>Advanced Topics</title>
+<para>Once a tool becomes more complicated, there are some extra
+things you may want/need to do.</para>
+
+<sect2 id="manual-writing-tools.advice" xreflabel="Debugging Tips">
+<title>Debugging Tips</title>
+
+<para>Writing and debugging tools is not trivial. Here are some
+suggestions for solving common problems.</para>
+
<para>If you are getting segmentation faults in C functions used by your
tool, the usual GDB command:</para>
@@ -348,75 +352,15 @@
gdb <prog> core]]></screen>
<para>usually gives the location of the segmentation fault.</para>
-</sect3>
+<para>If you want to debug C functions used by your tool, there are
+instructions on how to do so in the file
+<filename>README_DEVELOPERS</filename>.</para>
-
-<sect3 id="manual-writing-tools.debugfns">
-<title>Debugging C functions</title>
-
-<para>If you want to debug C functions used by your tool, you can
-achieve this by following these steps:</para>
-<orderedlist>
- <listitem>
- <para>Set <computeroutput>VALGRIND_LAUNCHER</computeroutput> to
- <computeroutput><![CDATA[<prefix>/bin/valgrind]]></computeroutput>:</para>
-<programlisting>
- export VALGRIND_LAUNCHER=/usr/local/bin/valgrind</programlisting>
- </listitem>
-
- <listitem>
- <para>Then run <computeroutput><![CDATA[ gdb <prefix>/lib/valgrind/<platform>/<tool>:]]></computeroutput></para>
-<programlisting>
- gdb /usr/local/lib/valgrind/ppc32-linux/lackey</programlisting>
- </listitem>
-
- <listitem>
- <para>Do <computeroutput>handle SIGSEGV SIGILL nostop
- noprint</computeroutput> in GDB to prevent GDB from stopping on a
- SIGSEGV or SIGILL:</para>
-<programlisting>
- (gdb) handle SIGILL SIGSEGV nostop noprint</programlisting>
- </listitem>
-
- <listitem>
- <para>Set any breakpoints you want and proceed as normal for GDB:</para>
-<programlisting>
- (gdb) b vgPlain_do_exec</programlisting>
- <para>The macro VG_(FUNC) is expanded to vgPlain_FUNC, so If you
- want to set a breakpoint VG_(do_exec), you could do like this in
- GDB.</para>
- </listitem>
-
- <listitem>
- <para>Run the tool with required options:</para>
-<programlisting>
- (gdb) run `pwd`</programlisting>
- </listitem>
-
-</orderedlist>
-
-<para>GDB may be able to give you useful information. Note that by
-default most of the system is built with
-<option>-fomit-frame-pointer</option>, and you'll need to get rid of
-this to extract useful tracebacks from GDB.</para>
-
-</sect3>
-
-
-<sect3 id="manual-writing-tools.ucode-probs">
-<title>IR Instrumentation Problems</title>
-
<para>If you are having problems with your VEX IR instrumentation, it's
likely that GDB won't be able to help at all. In this case, Valgrind's
<option>--trace-flags</option> option is invaluable for observing the
results of instrumentation.</para>
-</sect3>
-
-
-<sect3 id="manual-writing-tools.misc">
-<title>Miscellaneous</title>
-
<para>If you just want to know whether a program point has been reached,
using the <computeroutput>OINK</computeroutput> macro (in
<filename>include/pub_tool_libcprint.h</filename>) can be easier than
@@ -426,26 +370,14 @@
<computeroutput>valgrind --help-debug</computeroutput> for the
list).</para>
-</sect3>
-
</sect2>
-</sect1>
-
-
-
-<sect1 id="manual-writing-tools.advtopics" xreflabel="Advanced Topics">
-<title>Advanced Topics</title>
-
-<para>Once a tool becomes more complicated, there are some extra
-things you may want/need to do.</para>
-
<sect2 id="manual-writing-tools.suppressions" xreflabel="Suppressions">
<title>Suppressions</title>
<para>If your tool reports errors and you want to suppress some common
ones, you can add suppressions to the suppression files. The relevant
-files are <filename>valgrind/*.supp</filename>; the final suppression
+files are <filename>*.supp</filename>; the final suppression
file is aggregated from these files by combining the relevant
<filename>.supp</filename> files depending on the versions of linux, X
and glibc on a system.</para>
@@ -454,7 +386,7 @@
<computeroutput>tool_name:suppression_name</computeroutput>. The
<computeroutput>tool_name</computeroutput> here is the name you specify
for the tool during initialisation with
-<function>VG_(details_name)()</function>.</para>
+<function>VG_(details_name)</function>.</para>
</sect2>
@@ -462,101 +394,60 @@
<sect2 id="manual-writing-tools.docs" xreflabel="Documentation">
<title>Documentation</title>
-<para>As of version 3.0.0, Valgrind documentation has been converted to
-XML. Why? See <ulink url="http://www.ucc.ie/xml/">The XML FAQ</ulink>.
-</para>
-
-
-<sect3 id="manual-writing-tools.xml" xreflabel="The XML Toolchain">
-<title>The XML Toolchain</title>
-
<para>If you are feeling conscientious and want to write some
-documentation for your tool, please use XML. The Valgrind
-Docs use the following toolchain and versions:</para>
+documentation for your tool, please use XML as the rest of Valgrind does.
+The file <filename>docs/README</filename> has more details on getting
+the XML toolchain to work; this can be difficult, unfortunately.</para>
-<programlisting>
- xmllint: using libxml version 20607
- xsltproc: using libxml 20607, libxslt 10102 and libexslt 802
- pdfxmltex: pdfTeX (Web2C 7.4.5) 3.14159-1.10b
- pdftops: version 3.00
- DocBook: version 4.2
-</programlisting>
+<para>To write the documentation, follow these steps (using
+<computeroutput>foobar</computeroutput> as the example tool name
+again):</para>
-<para><command>Latency:</command> you should note that latency is
-a big problem: DocBook is constantly being updated, but the tools
-tend to lag behind somewhat. It is important that the versions
-get on with each other, so if you decide to upgrade something,
-then you need to ascertain whether things still work nicely -
-this *cannot* be assumed.</para>
-
-<para><command>Stylesheets:</command> The Valgrind docs use
-various custom stylesheet layers, all of which are in
-<computeroutput>valgrind/docs/lib/</computeroutput>. You
-shouldn't need to modify these in any way.</para>
-
-<para><command>Catalogs:</command> Catalogs provide a mapping from
-generic addresses to specific local directories on a given machine.
-Most recent Linux distributions have adopted a common place for storing
-catalogs (<filename>/etc/xml/</filename>). Assuming that you have the
-various tools listed above installed, you probably won't need to modify
-your catalogs. But if you do, then just add another
-<computeroutput>group</computeroutput> to this file, reflecting your
-local installation.</para>
-
-</sect3>
-
-
-<sect3 id="manual-writing-tools.writing" xreflabel="Writing the Documentation">
-<title>Writing the Documentation</title>
-
-<para>Follow these steps (using <computeroutput>foobar</computeroutput>
-as the example tool name again):</para>
-
<orderedlist>
<listitem>
<para>The docs go in
- <computeroutput>valgrind/foobar/docs/</computeroutput>, which you will
+ <computeroutput>foobar/docs/</computeroutput>, which you will
have created when you started writing the tool.</para>
</listitem>
<listitem>
- <para>Write <filename>foobar/docs/Makefile.am</filename>. Use
- <filename>memcheck/docs/Makefile.am</filename> as an
- example.</para>
- </listitem>
-
- <listitem>
<para>Copy the XML documentation file for the tool Nulgrind from
- <filename>valgrind/none/docs/nl-manual.xml</filename> to
+ <filename>none/docs/nl-manual.xml</filename> to
<computeroutput>foobar/docs/</computeroutput>, and rename it to
<filename>foobar/docs/fb-manual.xml</filename>.</para>
- <para><command>Note</command>: there is a *really stupid* tetex bug
- with underscores in filenames, so don't use '_'.</para>
+ <para><command>Note</command>: there is a tetex bug
+ involving underscores in filenames, so don't use '_'.</para>
</listitem>
<listitem>
<para>Write the documentation. There are some helpful bits and
- pieces on using xml markup in
- <filename>valgrind/docs/xml/xml_help.txt</filename>.</para>
+ pieces on using XML markup in
+ <filename>docs/xml/xml_help.txt</filename>.</para>
</listitem>
<listitem>
<para>Include it in the User Manual by adding the relevant entry to
- <filename>valgrind/docs/xml/manual.xml</filename>. Copy and edit an
+ <filename>docs/xml/manual.xml</filename>. Copy and edit an
existing entry.</para>
</listitem>
<listitem>
+ <para>Include it in the man page by adding the relevant entry to
+ <filename>docs/xml/valgrind-manpage.xml</filename>. Copy and
+ edit an existing entry.</para>
+ </listitem>
+
+ <listitem>
<para>Validate <filename>foobar/docs/fb-manual.xml</filename> using
- the following command from within <filename>valgrind/docs/</filename>:
+ the following command from within <filename>docs/</filename>:
</para>
<screen><![CDATA[
-% make valid
+make valid
]]></screen>
- <para>You will probably get errors that look like this:</para>
+ <para>You may get errors that look like this:</para>
<screen><![CDATA[
./xml/index.xml:5: element chapter: validity error : No declaration for
@@ -565,9 +456,9 @@
<para>Ignore (only) these -- they're not important.</para>
- <para>Because the xml toolchain is fragile, it is important to ensure
+ <para>Because the XML toolchain is fragile, it is important to ensure
that <filename>fb-manual.xml</filename> won't break the documentation
- set build. Note that just because an xml file happily transforms to
+ set build. Note that just because an XML file happily transforms to
html does not necessarily mean the same holds true for pdf/ps.</para>
</listitem>
@@ -575,30 +466,31 @@
<para>You can (re-)generate the HTML docs while you are writing
<filename>fb-manual.xml</filename> to help you see how it's looking.
The generated files end up in
- <filename>valgrind/docs/html/</filename>. Use the following
- command, within <filename>valgrind/docs/</filename>:</para>
+ <filename>docs/html/</filename>. Use the following
+ command, within <filename>docs/</filename>:</para>
<screen><![CDATA[
-% make html-docs
+make html-docs
]]></screen>
</listitem>
<listitem>
- <para>When you have finished, also generate pdf and ps output to
- check all is well, from within <filename>valgrind/docs/</filename>:
+ <para>When you have finished, try to generate PDF and PostScript output to
+ check all is well, from within <filename>docs/</filename>:
</para>
<screen><![CDATA[
-% make print-docs
+make print-docs
]]></screen>
<para>Check the output <filename>.pdf</filename> and
<filename>.ps</filename> files in
- <computeroutput>valgrind/docs/print/</computeroutput>.</para>
+ <computeroutput>docs/print/</computeroutput>.</para>
+
+ <para>Note that the toolchain is even more fragile for the print docs,
+ so don't feel too bad if you can't get it working.</para>
</listitem>
</orderedlist>
-</sect3>
-
</sect2>
@@ -650,13 +542,15 @@
<sect2 id="manual-writing-tools.profiling" xreflabel="Profiling">
<title>Profiling</title>
-<para>To profile a tool, use Cachegrind on it. Read README_DEVELOPERS for
-details on running Valgrind under Valgrind.</para>
+<para>Lots of profiling tools have trouble running Valgrind. For example,
+trying to use gprof is hopeless.</para>
-<para>Alternatively, you can use OProfile. In most cases, it is better than
-Cachegrind because it's much faster, and gives real times, as opposed to
-instruction and cache hit/miss counts.</para>
+<para>Probably the best way to profile a tool is with OProfile on Linux.</para>
+<para>You can also use Cachegrind on it. Read
+<filename>README_DEVELOPERS</filename> for details on running Valgrind under
+Valgrind; it's a bit fragile but can usually be made to work.</para>
+
</sect2>
@@ -665,41 +559,36 @@
<title>Other Makefile Hackery</title>
<para>If you add any directories under
-<computeroutput>valgrind/foobar/</computeroutput>, you will need to add
+<computeroutput>foobar/</computeroutput>, you will need to add
an appropriate <filename>Makefile.am</filename> to it, and add a
corresponding entry to the <computeroutput>AC_OUTPUT</computeroutput>
-list in <filename>valgrind/configure.in</filename>.</para>
+list in <filename>configure.in</filename>.</para>
<para>If you add any scripts to your tool (see Cachegrind for an
example) you need to add them to the
<computeroutput>bin_SCRIPTS</computeroutput> variable in
-<filename>valgrind/foobar/Makefile.am</filename>.</para>
+<filename>foobar/Makefile.am</filename> and possible also to the
+<computeroutput>AC_OUTPUT</computeroutput> list in
+<filename>configure.in</filename>.</para>
</sect2>
<sect2 id="manual-writing-tools.ifacever" xreflabel="Core/tool Interface Versions">
-<title>Core/tool Interface Versions</title>
+<title>The Core/tool Interface</title>
-<para>In order to allow for the core/tool interface to evolve over time,
-Valgrind uses a basic interface versioning system. All a tool has to do
-is use the
-<computeroutput>VG_DETERMINE_INTERFACE_VERSION</computeroutput> macro
-exactly once in its code. If not, a link error will occur when the tool
-is built.</para>
+<para>The core/tool interface evolves over time, but it's pretty stable.
+We deliberately do not provide backward compatibility with old interfaces,
+because it is too difficult and too restrictive. We view this as a good
+thing -- if we had to be backward compatible with earlier versions, many
+improvements now in the system could not have been added.</para>
-<para>The interface version number is changed when binary incompatible
-changes are made to the interface. If the core and tool has the same major
-version number X they should work together. If X doesn't match, Valgrind
-will abort execution with an explanation of the problem.</para>
+<para>Because tools are statically linked with the core, if a tool compiles
+successfully then it should be compatible with the core. We would not
+deliberately violate this property by, for example, changing the behaviour
+of a core function without changing its prototype.</para>
-<para>This approach was chosen so that if the interface changes in the
-future, old tools won't work and the reason will be clearly explained,
-instead of possibly crashing mysteriously. We have attempted to
-minimise the potential for binary incompatible changes by means such as
-minimising the use of naked structs in the interface.</para>
-
</sect2>
</sect1>
@@ -709,17 +598,10 @@
<sect1 id="manual-writing-tools.finalwords" xreflabel="Final Words">
<title>Final Words</title>
-<para>The core/tool interface is not fixed. It's pretty stable these days,
-but it does change. We deliberately do not provide backward compatibility
-with old interfaces, because it is too difficult and too restrictive.
-The interface checking should catch any incompatibilities. We view this as
-a good thing -- if we had to be backward compatible with earlier versions,
-many improvements now in the system could not have been added.
-</para>
+<para>Writing a new Valgrind tool is not easy, but the tools you can write
+with Valgrind are among the most powerful programming tools there are.
+Happy programming!</para>
-
-<para>Happy programming.</para>
-
</sect1>
</chapter>
|
|
From: <sv...@va...> - 2009-08-07 04:07:28
|
Author: njn Date: 2009-08-07 05:07:20 +0100 (Fri, 07 Aug 2009) New Revision: 10733 Log: Overhaul design+implementation chapter. Modified: trunk/docs/xml/design-impl.xml Modified: trunk/docs/xml/design-impl.xml =================================================================== --- trunk/docs/xml/design-impl.xml 2009-08-07 02:58:11 UTC (rev 10732) +++ trunk/docs/xml/design-impl.xml 2009-08-07 04:07:20 UTC (rev 10733) @@ -1,6 +1,7 @@ <?xml version="1.0"?> <!-- -*- sgml -*- --> <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" - "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"> + "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" +[ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]> <chapter id="design-impl" @@ -11,54 +12,76 @@ <para>A number of academic publications nicely describe many aspects of Valgrind's design and implementation. Online copies of all of -them, and others, are available at -http://valgrind.org/docs/pubs.html.</para> +them, and others, are available on the <ulink url="&vg-pubs-url;">Valgrind +publications page</ulink>.</para> -<para>A good top-level overview of Valgrind is given in:</para> +<para>The following paper gives a good overview of Valgrind, and explains +how it differs from other dynamic binary instrumentation frameworks such as +Pin and DynamoRIO.</para> -<para> -"Valgrind: A Framework for Heavyweight Dynamic Binary -Instrumentation." Nicholas Nethercote and Julian Seward. Proceedings -of ACM SIGPLAN 2007 Conference on Programming Language Design and -Implementation (PLDI 2007), San Diego, California, USA, June 2007. -This paper describes how Valgrind works, and how it differs from other -DBI frameworks such as Pin and DynamoRIO.</para> +<itemizedlist> + <listitem> + <para> + <command>Valgrind: A Framework for Heavyweight Dynamic Binary + Instrumentation. Nicholas Nethercote and Julian Seward. Proceedings + of ACM SIGPLAN 2007 Conference on Programming Language Design and + Implementation (PLDI 2007), San Diego, California, USA, June + 2007.</command> + </para> + </listitem> +</itemizedlist> -<para>The following two papers together give a comprehensive -description of how Memcheck works:</para> +<para>The following two papers together give a comprehensive description of +how most of Memcheck works. The first paper describes in detail how +Memcheck's undefined value error detection (a.k.a. V bits) works. The +second paper describes in detail how Memcheck's shadow memory is +implemented, and compares it to other alternative approaches. +</para> -<para>"Using Valgrind to detect undefined value errors with -bit-precision." Julian Seward and Nicholas Nethercote. Proceedings -of the USENIX'05 Annual Technical Conference, Anaheim, California, -USA, April 2005. This paper describes in detail how Memcheck's -undefined value error detection (a.k.a. V bits) works.</para> +<itemizedlist> + <listitem> + <para><command>Using Valgrind to detect undefined value errors with + bit-precision. Julian Seward and Nicholas Nethercote. Proceedings + of the USENIX'05 Annual Technical Conference, Anaheim, California, + USA, April 2005.</command> + </para> -<para>"How to Shadow Every Byte of Memory Used by a Program." -Nicholas Nethercote and Julian Seward. Proceedings of the Third -International ACM SIGPLAN/SIGOPS Conference on Virtual Execution -Environments (VEE 2007), San Diego, California, USA, June 2007. This -paper describes in detail how Memcheck's shadow memory is implemented, -and compares it to other alternative approaches.</para> + <para><command>How to Shadow Every Byte of Memory Used by a Program. + Nicholas Nethercote and Julian Seward. Proceedings of the Third + International ACM SIGPLAN/SIGOPS Conference on Virtual Execution + Environments (VEE 2007), San Diego, California, USA, June + 2007.</command> + </para> + </listitem> +</itemizedlist> -<para>The following paper describes Callgrind:</para> +<para>The following paper describes Callgrind.</para> -<para>"A Tool Suite for Simulation Based Analysis of Memory Access -Behavior." Josef Weidendorfer, Markus Kowarschik and Carsten -Trinitis. Proceedings of the 4th International Conference on -Computational Science (ICCS 2004), Krakow, Poland, June 2004. This -paper describes Callgrind.</para> +<itemizedlist> + <listitem> + <para><command>A Tool Suite for Simulation Based Analysis of Memory Access + Behavior. Josef Weidendorfer, Markus Kowarschik and Carsten + Trinitis. Proceedings of the 4th International Conference on + Computational Science (ICCS 2004), Krakow, Poland, June 2004.</command> + </para> + </listitem> +</itemizedlist> <para>The following dissertation describes Valgrind in some detail -(some of these details are now out-of-date) as well as Cachegrind, +(many of these details are now out-of-date) as well as Cachegrind, Annelid and Redux. It also covers some underlying theory about dynamic binary analysis in general and what all these tools have in -common:</para> +common.</para> -<para>"Dynamic Binary Analysis and Instrumentation." Nicholas -Nethercote. PhD Dissertation, University of Cambridge, November -2004.</para> +<itemizedlist> + <listitem> + <para><command>Dynamic Binary Analysis and Instrumentation. Nicholas + Nethercote.</command> PhD Dissertation, University of Cambridge, November + 2004.</para> + </listitem> +</itemizedlist> </chapter> |
|
From: <sv...@va...> - 2009-08-07 02:58:38
|
Author: njn
Date: 2009-08-07 03:58:11 +0100 (Fri, 07 Aug 2009)
New Revision: 10732
Log:
Went over the FAQ. Also tweaked vg-entities.xml a bit.
Modified:
trunk/cachegrind/docs/cg-manual.xml
trunk/docs/xml/FAQ.xml
trunk/docs/xml/dist-docs.xml
trunk/docs/xml/manual-writing-tools.xml
trunk/docs/xml/manual.xml
trunk/docs/xml/quick-start-guide.xml
trunk/docs/xml/tech-docs.xml
trunk/docs/xml/valgrind-manpage.xml
trunk/docs/xml/vg-entities.xml
Modified: trunk/cachegrind/docs/cg-manual.xml
===================================================================
--- trunk/cachegrind/docs/cg-manual.xml 2009-08-07 02:18:00 UTC (rev 10731)
+++ trunk/cachegrind/docs/cg-manual.xml 2009-08-07 02:58:11 UTC (rev 10732)
@@ -1245,7 +1245,7 @@
<title>How Cachegrind Works</title>
<para>The best reference for understanding how Cachegrind works is chapter 3 of
"Dynamic Binary Analysis and Instrumentation", by Nicholas Nethercote. It
-is available on the <ulink url="&vg-pubs;">Valgrind publications
+is available on the <ulink url="&vg-pubs-url;">Valgrind publications
page</ulink>.</para>
</sect2>
Modified: trunk/docs/xml/FAQ.xml
===================================================================
--- trunk/docs/xml/FAQ.xml 2009-08-07 02:18:00 UTC (rev 10731)
+++ trunk/docs/xml/FAQ.xml 2009-08-07 02:58:11 UTC (rev 10732)
@@ -11,7 +11,7 @@
<releaseinfo>&rel-type; &rel-version; &rel-date;</releaseinfo>
<copyright>
<year>&vg-lifespan;</year>
- <holder><ulink url="&vg-developers;">Valgrind Developers</ulink></holder>
+ <holder><ulink url="&vg-devs-url;">Valgrind Developers</ulink></holder>
</copyright>
<legalnotice>
<para>Email: <ulink url="mailto:&vg-vemail;">&vg-vemail;</ulink></para>
@@ -52,7 +52,7 @@
<para>From Nordic mythology. Originally (before release) the project
was named Heimdall, after the watchman of the Nordic gods. He could
"see a hundred miles by day or night, hear the grass growing, see the
- wool growing on a sheep's back" (etc). This would have been a great
+ wool growing on a sheep's back", etc. This would have been a great
name, but it was already taken by a security package "Heimdal".</para>
<para>Keeping with the Nordic theme, Valgrind was chosen. Valgrind is
@@ -78,7 +78,7 @@
<qandaentry id="faq.make_dies">
<question id="q-make_dies">
- <para>When I trying building Valgrind, 'make' dies partway with
+ <para>When building Valgrind, 'make' dies partway with
an assertion failure, something like this:</para>
<screen>
% make: expand.c:489: allocated_variable_append:
@@ -88,20 +88,20 @@
<answer id="a-make_dies">
<para>It's probably a bug in 'make'. Some, but not all, instances of
version 3.79.1 have this bug, see
- www.mail-archive.com/bug...@gn.../msg01658.html. Try upgrading to
- a more recent version of 'make'. Alternatively, we have heard that
- unsetting the CFLAGS environment variable avoids the problem.</para>
+ <ulink url="http://www.mail-archive.com/bug...@gn.../msg01658.html">this</ulink>.
+ Try upgrading to a more recent version of 'make'. Alternatively, we have
+ heard that unsetting the CFLAGS environment variable avoids the
+ problem.</para>
</answer>
</qandaentry>
<qandaentry id="faq.glibc_devel">
<question>
- <para>When I try to build Valgrind, 'make' fails with
-<programlisting>
+ <para>When building Valgrind, 'make' fails with this:</para>
+<screen>
/usr/bin/ld: cannot find -lc
collect2: ld returned 1 exit status
-</programlisting>
- </para>
+</screen>
</question>
<answer>
<para>You need to install the glibc-static-devel package.</para>
@@ -118,17 +118,17 @@
<qandaentry id="faq.exit_errors">
<question id="q-exit_errors">
<para>Programs run OK on Valgrind, but at exit produce a bunch of
- errors involving <literal>__libc_freeres()</literal> and then die
+ errors involving <literal>__libc_freeres</literal> and then die
with a segmentation fault.</para>
</question>
<answer id="a-exit_errors">
<para>When the program exits, Valgrind runs the procedure
- <function>__libc_freeres()</function> in glibc. This is a hook for
+ <function>__libc_freeres</function> in glibc. This is a hook for
memory debuggers, so they can ask glibc to free up any memory it has
used. Doing that is needed to ensure that Valgrind doesn't
incorrectly report space leaks in glibc.</para>
- <para>Problem is that running <literal>__libc_freeres()</literal> in
+ <para>The problem is that running <literal>__libc_freeres</literal> in
older glibc versions causes this crash.</para>
<para>Workaround for 1.1.X and later versions of Valgrind: use the
@@ -237,9 +237,9 @@
memory pool allocators. Memory for quite a number of destructed
objects is not immediately freed and given back to the OS, but kept
in the pool(s) for later re-use. The fact that the pools are not
- freed at the exit() of the program cause Valgrind to report this
+ freed at the exit of the program cause Valgrind to report this
memory as still reachable. The behaviour not to free pools at the
- exit() could be called a bug of the library though.</para>
+ exit could be called a bug of the library though.</para>
<para>Using GCC, you can force the STL to use malloc and to free
memory as soon as possible by globally disabling memory caching.
@@ -269,8 +269,8 @@
by reading
<ulink
url="http://gcc.gnu.org/onlinedocs/libstdc++/faq/index.html#4_4_leak">
- http://gcc.gnu.org/onlinedocs/libstdc++/faq/index.html#4_4_leak</ulink> if
- you absolutely want to do that. But beware:
+ http://gcc.gnu.org/onlinedocs/libstdc++/faq/index.html#4_4_leak</ulink>
+ if you absolutely want to do that. But beware:
allocators belong to the more messy parts of the STL and
people went to great lengths to make the STL portable across
platforms. Chances are good that your solution will work on your
@@ -297,7 +297,7 @@
object is unloaded before the program terminates, Valgrind will
discard the debug information and the error message will be full of
<literal>???</literal> entries. The workaround here is to avoid
- calling dlclose() on these shared objects.</para>
+ calling <function>dlclose</function> on these shared objects.</para>
<para>Also, <option>-fomit-frame-pointer</option> and
<option>-fstack-check</option> can make stack traces worse.</para>
@@ -369,9 +369,11 @@
<para>Occasionally Valgrind stack traces get the wrong function
names. This is caused by glibc using aliases to effectively give
one function two names. Most of the time Valgrind chooses a
- suitable name, but very occasionally it gets it wrong. Examples we
- know of are printing 'bcmp' instead of 'memcmp', 'index' instead of
- 'strchr', and 'rindex' instead of 'strrchr'.</para>
+ suitable name, but very occasionally it gets it wrong. Examples we know
+ of are printing <function>bcmp</function> instead of
+ <function>memcmp</function>, <function>index</function> instead of
+ <function>strchr</function>, and <function>rindex</function> instead of
+ <function>strrchr</function>.</para>
</answer>
</qandaentry>
@@ -401,19 +403,12 @@
</answer>
</qandaentry>
-</qandadiv>
-
-<!-- Memcheck doesn't find my bug -->
-<qandadiv id="faq.notfound" xreflabel="Memcheck doesn't find my bug">
-<title>Memcheck doesn't find my bug</title>
-
<qandaentry id="faq.hiddenbug">
<question id="q-hiddenbug">
- <para>I try running "valgrind --tool=memcheck my_program" and get
- Valgrind's startup message, but I don't get any errors and I know my
- program has errors.</para>
+ <para> Memcheck doesn't report any errors and I know my program has
+ errors.</para>
</question>
<answer id="a-hiddenbug">
<para>There are two possible causes of this.</para>
@@ -442,13 +437,13 @@
<para>Second, if your program is statically linked, most Valgrind
tools won't work as well, because they won't be able to replace
- certain functions, such as malloc(), with their own versions. A key
- indicator of this is if Memcheck says:
+ certain functions, such as <function>malloc</function>, with their own
+ versions. A key indicator of this is if Memcheck says:
<programlisting>
All heap blocks were freed -- no leaks are possible
</programlisting>
- when you know your program calls malloc(). The workaround is to
- avoid statically linking your program.</para>
+ when you know your program calls <function>malloc</function>. The
+ workaround is to avoid statically linking your program.</para>
</answer>
</qandaentry>
@@ -475,6 +470,10 @@
<para>Unfortunately, Memcheck doesn't do bounds checking on static
or stack arrays. We'd like to, but it's just not possible to do in
a reasonable way that fits with how Memcheck works. Sorry.</para>
+
+ <para>However, the experimental tool Ptrcheck can detect errors like
+ this. Run Valgrind with the <option>--tool=exp-ptrcheck</option> option
+ to try it, but beware that it is not as robust as Memcheck.</para>
</answer>
</qandaentry>
@@ -612,48 +611,31 @@
<qandadiv id="faq.help" xreflabel="How To Get Further Assistance">
<title>How To Get Further Assistance</title>
+<!-- WARNING: this file should not xref other parts of the docs, because it
+is built standalone as FAQ.txt. That's why we link to, for example, the
+online copy of the manual. -->
+
<qandaentry id="e-help">
<!-- <question><para/></question> -->
<answer id="a-help">
- <para>Please read all of this section before posting.</para>
-
- <para>If you think an answer is incomplete or inaccurate, please
- e-mail <ulink url="mailto:&vg-vemail;">&vg-vemail;</ulink>.</para>
-
<para>Read the appropriate section(s) of the
- <ulink url="&vg-bookset;">Valgrind Documentation</ulink>.</para>
+ <ulink url="&vg-docs-url;">Valgrind Documentation</ulink>.</para>
- <para>Read the
- <ulink url="&vg-dist-docs;">Distribution Documents</ulink>.</para>
-
<para><ulink url="http://search.gmane.org">Search</ulink> the
<ulink url="http://news.gmane.org/gmane.comp.debugging.valgrind">valgrind-users</ulink> mailing list archives, using the group name
<computeroutput>gmane.comp.debugging.valgrind</computeroutput>.</para>
- <para>Only when you have tried all of these things and are still
- stuck, should you post to the
- <ulink url="&vg-users-list;">valgrind-users mailing list</ulink>. In
- which case, please read the following carefully. Making a complete
- posting will greatly increase the chances that an expert or fellow
- user reading it will have enough information and motivation to
- reply.</para>
+ <para>If you think an answer in this FAQ is incomplete or inaccurate, please
+ e-mail <ulink url="mailto:&vg-vemail;">&vg-vemail;</ulink>.</para>
- <para>Make sure you give full details of the problem, including the
- full output of <computeroutput>valgrind -v <your-prog></computeroutput>, if
- applicable. Also which Linux distribution you're using (Red Hat,
- Debian, etc) and its version number.</para>
-
- <para>You are in little danger of making your posting too long unless
- you include large chunks of Valgrind's (unsuppressed) output, so err
- on the side of giving too much information.</para>
-
- <para>Clearly written subject lines and message bodies are
- appreciated, too.</para>
-
- <para>Finally, remember that, despite the fact that most of the
- community are very helpful and responsive to emailed questions, you
- are probably requesting help from unpaid volunteers, so you have no
- guarantee of receiving an answer.</para>
+ <para>If you have tried all of these things and are still
+ stuck, you can try mailing the
+ <ulink url="&vg-lists-url;">valgrind-users mailing list</ulink>.
+ Note that an email has a better change of being answered usefully if it is
+ clearly written. Also remember that, despite the fact that most of the
+ community are very helpful and responsive to emailed questions, you are
+ probably requesting help from unpaid volunteers, so you have no guarantee
+ of receiving an answer.</para>
</answer>
</qandaentry>
Modified: trunk/docs/xml/dist-docs.xml
===================================================================
--- trunk/docs/xml/dist-docs.xml 2009-08-07 02:18:00 UTC (rev 10731)
+++ trunk/docs/xml/dist-docs.xml 2009-08-07 02:58:11 UTC (rev 10732)
@@ -11,7 +11,7 @@
<releaseinfo>&rel-type; &rel-version; &rel-date;</releaseinfo>
<copyright>
<year>&vg-lifespan;</year>
- <holder><ulink url="&vg-developers;">Valgrind Developers</ulink></holder>
+ <holder><ulink url="&vg-devs-url;">Valgrind Developers</ulink></holder>
</copyright>
<legalnotice>
<para>Email: <ulink url="mailto:&vg-vemail;">&vg-vemail;</ulink></para>
Modified: trunk/docs/xml/manual-writing-tools.xml
===================================================================
--- trunk/docs/xml/manual-writing-tools.xml 2009-08-07 02:18:00 UTC (rev 10731)
+++ trunk/docs/xml/manual-writing-tools.xml 2009-08-07 02:58:11 UTC (rev 10732)
@@ -54,7 +54,7 @@
<para>To write your own tool, you'll need the Valgrind source code. You'll
need a check-out of the Subversion repository for the automake/autoconf
build instructions to work. See the information about how to do check-out
-from the repository at <ulink url="&vg-svn-repo;">the Valgrind
+from the repository at <ulink url="&vg-repo-url;">the Valgrind
website</ulink>.</para>
</sect2>
Modified: trunk/docs/xml/manual.xml
===================================================================
--- trunk/docs/xml/manual.xml 2009-08-07 02:18:00 UTC (rev 10731)
+++ trunk/docs/xml/manual.xml 2009-08-07 02:58:11 UTC (rev 10732)
@@ -11,7 +11,7 @@
<releaseinfo>&rel-type; &rel-version; &rel-date;</releaseinfo>
<copyright>
<year>&vg-lifespan;</year>
- <holder><ulink url="&vg-developers;">Valgrind Developers</ulink></holder>
+ <holder><ulink url="&vg-devs-url;">Valgrind Developers</ulink></holder>
</copyright>
<legalnotice>
<para>Email: <ulink url="mailto:&vg-vemail;">&vg-vemail;</ulink></para>
Modified: trunk/docs/xml/quick-start-guide.xml
===================================================================
--- trunk/docs/xml/quick-start-guide.xml 2009-08-07 02:18:00 UTC (rev 10731)
+++ trunk/docs/xml/quick-start-guide.xml 2009-08-07 02:58:11 UTC (rev 10732)
@@ -10,7 +10,7 @@
<releaseinfo>&rel-type; &rel-version; &rel-date;</releaseinfo>
<copyright>
<year>&vg-lifespan;</year>
- <holder><ulink url="&vg-developers;">Valgrind Developers</ulink></holder>
+ <holder><ulink url="&vg-devs-url;">Valgrind Developers</ulink></holder>
</copyright>
<legalnotice>
<para>Email: <ulink url="mailto:&vg-vemail;">&vg-vemail;</ulink></para>
Modified: trunk/docs/xml/tech-docs.xml
===================================================================
--- trunk/docs/xml/tech-docs.xml 2009-08-07 02:18:00 UTC (rev 10731)
+++ trunk/docs/xml/tech-docs.xml 2009-08-07 02:58:11 UTC (rev 10732)
@@ -10,7 +10,7 @@
<releaseinfo>&rel-type; &rel-version; &rel-date;</releaseinfo>
<copyright>
<year>&vg-lifespan;</year>
- <holder><ulink url="&vg-developers;">Valgrind Developers</ulink></holder>
+ <holder><ulink url="&vg-devs-url;">Valgrind Developers</ulink></holder>
</copyright>
<legalnotice>
<para>Email: <ulink url="mailto:&vg-vemail;">&vg-vemail;</ulink></para>
Modified: trunk/docs/xml/valgrind-manpage.xml
===================================================================
--- trunk/docs/xml/valgrind-manpage.xml 2009-08-07 02:18:00 UTC (rev 10731)
+++ trunk/docs/xml/valgrind-manpage.xml 2009-08-07 02:58:11 UTC (rev 10732)
@@ -42,8 +42,8 @@
<para>This manual page covers only basic usage and options. For more
comprehensive information, please see the HTML documentation on your
-system: <filename>&vg-doc-path;</filename>, or online:
-<filename>&vg-bookset;</filename>.</para>
+system: <filename>&vg-docs-path;</filename>, or online:
+<filename>&vg-docs-url;</filename>.</para>
</refsect1>
@@ -223,9 +223,9 @@
<title>See Also</title>
<para>
-<filename>&vg-doc-path;</filename>,
+<filename>&vg-docs-path;</filename>,
and/or
-<filename>&vg-bookset;</filename>.
+<filename>&vg-docs-url;</filename>.
</para>
</refsect1>
Modified: trunk/docs/xml/vg-entities.xml
===================================================================
--- trunk/docs/xml/vg-entities.xml 2009-08-07 02:18:00 UTC (rev 10731)
+++ trunk/docs/xml/vg-entities.xml 2009-08-07 02:58:11 UTC (rev 10732)
@@ -1,9 +1,7 @@
<!-- misc. strings -->
-<!ENTITY vg-url "http://www.valgrind.org/">
<!ENTITY vg-jemail "ju...@va...">
<!ENTITY vg-vemail "val...@va...">
<!ENTITY vg-lifespan "2000-2009">
-<!ENTITY vg-users-list "http://lists.sourceforge.net/lists/listinfo/valgrind-users">
<!-- valgrind release + version stuff -->
<!ENTITY rel-type "Release">
@@ -11,14 +9,14 @@
<!ENTITY rel-date "2 January 2009">
<!-- where the docs are installed -->
-<!ENTITY vg-doc-path "/usr/share/doc/valgrind/html/index.html">
+<!ENTITY vg-docs-path "/usr/share/doc/valgrind/html/index.html">
<!-- valgrind website links used in lots of places in the docs -->
<!-- kept in here 'cos everytime the website gets changed, it`s -->
<!-- a real pain tracking the links down in the docs -->
-<!ENTITY vg-developers "http://www.valgrind.org/info/developers.html">
-<!ENTITY vg-svn-repo "http://www.valgrind.org/downloads/repository.html">
-<!ENTITY vg-pubs "http://www.valgrind.org/docs/pubs.html">
-<!ENTITY vg-bookset "http://www.valgrind.org/docs/manual/index.html">
-<!ENTITY vg-dist-docs "http://www.valgrind.org/docs/manual/dist.html">
-<!ENTITY vg-commentary "http://www.valgrind.org/docs/manual/manual-core.html#manual-core.comment">
+<!ENTITY vg-url "http://www.valgrind.org/">
+<!ENTITY vg-lists-url "http://www.valgrind.org/support/mailing_lists.html">
+<!ENTITY vg-devs-url "http://www.valgrind.org/info/developers.html">
+<!ENTITY vg-repo-url "http://www.valgrind.org/downloads/repository.html">
+<!ENTITY vg-pubs-url "http://www.valgrind.org/docs/pubs.html">
+<!ENTITY vg-docs-url "http://www.valgrind.org/docs/manual/index.html">
|
|
From: Tom H. <th...@cy...> - 2009-08-07 02:46:00
|
Nightly build on lloyd ( x86_64, Fedora 7 ) Started at 2009-08-07 03:05:06 BST Ended at 2009-08-07 03:45:38 BST Results unchanged from 24 hours ago Checking out valgrind source tree ... done Configuring valgrind ... done Building valgrind ... done Running regression tests ... failed Regression test results follow == 520 tests, 1 stderr failure, 0 stdout failures, 0 post failures == helgrind/tests/tc06_two_races_xml (stderr) |
|
From: Tom H. <th...@cy...> - 2009-08-07 02:31:06
|
Nightly build on mg ( x86_64, Fedora 9 ) Started at 2009-08-07 03:10:07 BST Ended at 2009-08-07 03:30:49 BST Results unchanged from 24 hours ago Checking out valgrind source tree ... done Configuring valgrind ... done Building valgrind ... done Running regression tests ... failed Regression test results follow == 527 tests, 1 stderr failure, 0 stdout failures, 0 post failures == helgrind/tests/tc06_two_races_xml (stderr) |
|
From: <sv...@va...> - 2009-08-07 02:18:25
|
Author: njn
Date: 2009-08-07 03:18:00 +0100 (Fri, 07 Aug 2009)
New Revision: 10731
Log:
Overhaul Helgrind's manual chapter.
Modified:
trunk/helgrind/docs/hg-manual.xml
trunk/helgrind/hg_main.c
Modified: trunk/helgrind/docs/hg-manual.xml
===================================================================
--- trunk/helgrind/docs/hg-manual.xml 2009-08-07 00:18:25 UTC (rev 10730)
+++ trunk/helgrind/docs/hg-manual.xml 2009-08-07 02:18:00 UTC (rev 10731)
@@ -41,9 +41,6 @@
<para><link linkend="hg-manual.data-races">
Data races -- accessing memory without adequate locking
or synchronisation</link>.
- Note that race detection in versions 3.4.0 and later uses a
- different algorithm than in 3.3.x. Hence, if you have been using
- Helgrind in 3.3.x, you may want to re-read this section.
</para>
</listitem>
</orderedlist>
@@ -106,7 +103,7 @@
error code that must be handled</para></listitem>
<listitem><para>when a thread exits whilst still holding locked
locks</para></listitem>
- <listitem><para>calling <computeroutput>pthread_cond_wait</computeroutput>
+ <listitem><para>calling <function>pthread_cond_wait</function>
with a not-locked mutex, an invalid mutex,
or one locked by a different
thread</para></listitem>
@@ -119,7 +116,7 @@
waiting</para></listitem>
<listitem><para>waiting on an uninitialised pthread
barrier</para></listitem>
- <listitem><para>for all of the pthread_ functions that Helgrind
+ <listitem><para>for all of the pthreads functions that Helgrind
intercepts, an error is reported, along with a stack
trace, if the system threading library routine returns
an error code, even if Helgrind itself detected no
@@ -288,10 +285,10 @@
]]></programlisting>
<para>The problem is there is nothing to
-stop <computeroutput>var</computeroutput> being updated simultaneously
+stop <varname>var</varname> being updated simultaneously
by both threads. A correct program would
-protect <computeroutput>var</computeroutput> with a lock of type
-<computeroutput>pthread_mutex_t</computeroutput>, which is acquired
+protect <varname>var</varname> with a lock of type
+<function>pthread_mutex_t</function>, which is acquired
before each access and released afterwards. Helgrind's output for
this program is:</para>
@@ -374,8 +371,8 @@
Pthreads): thread creation, thread joining, locks, condition
variables, semaphores and barriers.</para>
-<para>The effect of using these functions is to impose on a threaded
-program, constraints upon the order in which memory accesses can
+<para>The effect of using these functions is to impose
+constraints upon the order in which memory accesses can
happen. This implied ordering is generally known as the
"happens-before relation". Once you understand the happens-before
relation, it is easy to see how Helgrind finds races in your code.
@@ -465,7 +462,7 @@
two accesses are ordered by the happens-before relation. If so,
that's fine; if not, it reports a race.</para>
-<para>It is important to understand the the happens-before relation
+<para>It is important to understand that the happens-before relation
creates only a partial ordering, not a total ordering. An example of
a total ordering is comparison of numbers: for any two numbers
<computeroutput>x</computeroutput> and
@@ -535,9 +532,9 @@
of the child.</para>
</listitem>
<listitem><para>Similarly, when an exiting thread is reaped via a
- call to pthread_join, once the call returns, the reaping thread
- acquires a happens-after dependency relative to all memory accesses
- made by the exiting thread.</para>
+ call to <function>pthread_join</function>, once the call returns, the
+ reaping thread acquires a happens-after dependency relative to all memory
+ accesses made by the exiting thread.</para>
</listitem>
</itemizedlist>
@@ -559,9 +556,9 @@
<listitem><para>Two accesses are considered to be ordered by the
happens-before dependency even through arbitrarily long chains of
synchronisation events. For example, if T1 accesses some location
- L, and then pthread_cond_signals T2, which later
- pthread_cond_signals T3, which then accesses L, then a suitable
- happens-before dependency exists between the first and second
+ L, and then <function>pthread_cond_signals</function> T2, which later
+ <function>pthread_cond_signals</function> T3, which then accesses L, then
+ a suitable happens-before dependency exists between the first and second
accesses, even though it involves two different inter-thread
synchronisation events.</para>
</listitem>
@@ -708,11 +705,11 @@
use the POSIX threading primitives. Helgrind needs to be able to
see all events pertaining to thread creation, exit, locking and
other synchronisation events. To do so it intercepts many POSIX
- pthread_ functions.</para>
+ pthreads functions.</para>
<para>Do not roll your own threading primitives (mutexes, etc)
- from combinations of the Linux futex syscall, atomic counters and
- wotnot. These throw Helgrind's internal what's-going-on models
+ from combinations of the Linux futex syscall, atomic counters, etc.
+ These throw Helgrind's internal what's-going-on models
way off course and will give bogus results.</para>
<para>Also, do not reimplement existing POSIX abstractions using
@@ -742,11 +739,11 @@
Qt 4 and/or KDE4 applications.</para>
</listitem>
<listitem><para>Runtime support library for GNU OpenMP (part of
- GCC), at least GCC versions 4.2 and 4.3. The GNU OpenMP runtime
- library (libgomp.so) constructs its own synchronisation
- primitives using combinations of atomic memory instructions and
- the futex syscall, which causes total chaos since in Helgrind
- since it cannot "see" those.</para>
+ GCC), at least for GCC versions 4.2 and 4.3. The GNU OpenMP runtime
+ library (<filename>libgomp.so</filename>) constructs its own
+ synchronisation primitives using combinations of atomic memory
+ instructions and the futex syscall, which causes total chaos since in
+ Helgrind since it cannot "see" those.</para>
<para>Fortunately, this can be solved using a configuration-time
flag (for GCC). Rebuild GCC from source, and configure using
<varname>--disable-linux-futex</varname>.
@@ -761,43 +758,47 @@
<listitem>
<para>Avoid memory recycling. If you can't avoid it, you must use
- tell Helgrind what is going on via the VALGRIND_HG_CLEAN_MEMORY
- client request
- (in <computeroutput>helgrind.h</computeroutput>).</para>
+ tell Helgrind what is going on via the
+ <function>VALGRIND_HG_CLEAN_MEMORY</function> client request (in
+ <computeroutput>helgrind.h</computeroutput>).</para>
- <para>Helgrind is aware of standard memory allocation and
- deallocation that occurs via malloc/free/new/delete and from entry
- and exit of stack frames. In particular, when memory is
- deallocated via free, delete, or function exit, Helgrind considers
- that memory clean, so when it is eventually reallocated, its
- history is irrelevant.</para>
+ <para>Helgrind is aware of standard heap memory allocation and
+ deallocation that occurs via
+ <function>malloc</function>/<function>free</function>/<function>new</function>/<function>delete</function>
+ and from entry and exit of stack frames. In particular, when memory is
+ deallocated via <function>free</function>, <function>delete</function>,
+ or function exit, Helgrind considers that memory clean, so when it is
+ eventually reallocated, its history is irrelevant.</para>
<para>However, it is common practice to implement memory recycling
schemes. In these, memory to be freed is not handed to
- malloc/delete, but instead put into a pool of free buffers to be
- handed out again as required. The problem is that Helgrind has no
+ <function>free</function>/<function>delete</function>, but instead put
+ into a pool of free buffers to be handed out again as required. The
+ problem is that Helgrind has no
way to know that such memory is logically no longer in use, and
its history is irrelevant. Hence you must make that explicit,
- using the VALGRIND_HG_CLEAN_MEMORY client request to specify the
- relevant address ranges. It's easiest to put these requests into
- the pool manager code, and use them either when memory is returned
- to the pool, or is allocated from it.</para>
+ using the <function>VALGRIND_HG_CLEAN_MEMORY</function> client request
+ to specify the relevant address ranges. It's easiest to put these
+ requests into the pool manager code, and use them either when memory is
+ returned to the pool, or is allocated from it.</para>
</listitem>
<listitem>
<para>Avoid POSIX condition variables. If you can, use POSIX
- semaphores (sem_t, sem_post, sem_wait) to do inter-thread event
- signalling. Semaphores with an initial value of zero are
- particularly useful for this.</para>
+ semaphores (<function>sem_t</function>, <function>sem_post</function>,
+ <function>sem_wait</function>) to do inter-thread event signalling.
+ Semaphores with an initial value of zero are particularly useful for
+ this.</para>
<para>Helgrind only partially correctly handles POSIX condition
variables. This is because Helgrind can see inter-thread
- dependencies between a pthread_cond_wait call and a
- pthread_cond_signal/broadcast call only if the waiting thread
- actually gets to the rendezvous first (so that it actually calls
- pthread_cond_wait). It can't see dependencies between the threads
- if the signaller arrives first. In the latter case, POSIX
- guidelines imply that the associated boolean condition still
+ dependencies between a <function>pthread_cond_wait</function> call and a
+ <function>pthread_cond_signal</function>/<function>pthread_cond_broadcast</function>
+ call only if the waiting thread actually gets to the rendezvous first
+ (so that it actually calls
+ <function>pthread_cond_wait</function>). It can't see dependencies
+ between the threads if the signaller arrives first. In the latter case,
+ POSIX guidelines imply that the associated boolean condition still
provides an inter-thread synchronisation event, but one which is
invisible to Helgrind.</para>
@@ -859,16 +860,18 @@
</listitem>
<listitem>
- <para>Round up all finished threads using pthread_join. Avoid
+ <para>Round up all finished threads using
+ <function>pthread_join</function>. Avoid
detaching threads: don't create threads in the detached state, and
- don't call pthread_detach on existing threads.</para>
+ don't call <function>pthread_detach</function> on existing threads.</para>
- <para>Using pthread_join to round up finished threads provides a
- clear synchronisation point that both Helgrind and programmers can
- see. If you don't call pthread_join on a thread, Helgrind has no
- way to know when it finishes, relative to any significant
- synchronisation points for other threads in the program. So it
- assumes that the thread lingers indefinitely and can potentially
+ <para>Using <function>pthread_join</function> to round up finished
+ threads provides a clear synchronisation point that both Helgrind and
+ programmers can see. If you don't call
+ <function>pthread_join</function> on a thread, Helgrind has no way to
+ know when it finishes, relative to any
+ significant synchronisation points for other threads in the program. So
+ it assumes that the thread lingers indefinitely and can potentially
interfere indefinitely with the memory state of the program. It
has every right to assume that -- after all, it might really be
the case that, for scheduling reasons, the exiting thread did run
@@ -899,11 +902,12 @@
</listitem>
<listitem>
- <para>POSIX requires that implementations of standard I/O (printf,
- fprintf, fwrite, fread, etc) are thread safe. Unfortunately GNU
- libc implements this by using internal locking primitives that
- Helgrind is unable to intercept. Consequently Helgrind generates
- many false race reports when you use these functions.</para>
+ <para>POSIX requires that implementations of standard I/O
+ (<function>printf</function>, <function>fprintf</function>,
+ <function>fwrite</function>, <function>fread</function>, etc) are thread
+ safe. Unfortunately GNU libc implements this by using internal locking
+ primitives that Helgrind is unable to intercept. Consequently Helgrind
+ generates many false race reports when you use these functions.</para>
<para>Helgrind attempts to hide these errors using the standard
Valgrind error-suppression mechanism. So, at least for simple
@@ -923,7 +927,8 @@
where <computeroutput>libpthread.so</computeroutput> or
<computeroutput>ld.so</computeroutput> is the object associated
with the innermost stack frame, please file a bug report at
- http://www.valgrind.org.</para>
+ <ulink url="&vg-url;">&vg-url;</ulink>.
+ </para>
</listitem>
</orderedlist>
@@ -956,27 +961,36 @@
</listitem>
</varlistentry>
- <varlistentry id="opt.show-conflicts"
- xreflabel="--show-conflicts">
+ --history-level=none|partial|full [full]
+ full: show both stack traces for a data race (can be very slow)
+ approx: full trace for one thread, approx for the other (faster)
+ none: only show trace for one thread in a race (fastest)
+
+
+
+ <varlistentry id="opt.history-level"
+ xreflabel="--history-level">
<term>
- <option><![CDATA[--show-conflicts=no|yes
- [default: yes] ]]></option>
+ <option><![CDATA[--history-level=none|approx|full
+ [default: full] ]]></option>
</term>
<listitem>
- <para>When enabled (the default), Helgrind collects enough
- information about "old" accesses that it can produce two stack
- traces in a race report -- both the stack trace for the
+ <para>When set to <option>full</option> (the default), Helgrind
+ collects enough information about "old" accesses that it can produce
+ two stack traces in a race report -- both the stack trace for the
current access, and the trace for the older, conflicting
access.</para>
<para>Collecting such information is expensive in both speed and
- memory. This flag disables collection of such information.
- Helgrind will run significantly faster and use less memory,
- but without the conflicting access stacks, it will be very
- much more difficult to track down the root causes of
- races. However, this option may be useful in situations where
- you just want to check for the presence or absence of races,
- for example, when doing regression testing of a previously
- race-free program.</para>
+ memory. However, without it, it is very much more difficult to
+ track down the root causes of races. Nonetheless, you may not need
+ it in situations where you just want to check for the presence or
+ absence of races, for example, when doing regression testing of a
+ previously race-free program.</para>
+ <para>Setting this option to <option>approx</option> means that
+ Helgrind will show a full trace for one thread, and an approximation
+ for the other, and run faster. Setting it to <option>none</option>
+ means that Helgrind will show a full trace for one thread, and
+ nothing for the other, and run faster again.</para>
</listitem>
</varlistentry>
@@ -1010,42 +1024,46 @@
<!-- end of xi:include in the manpage -->
<!-- start of xi:include in the manpage -->
+<!-- commented out, because we don't document debugging options in the
+ manual. Nb: all the double-dashes below had a space inserted in them
+ to avoid problems with premature closing of this comment.
<para>In addition, the following debugging options are available for
Helgrind:</para>
<variablelist id="hg.debugopts.list">
- <varlistentry id="opt.trace-malloc" xreflabel="--trace-malloc">
+ <varlistentry id="opt.trace-malloc" xreflabel="- -trace-malloc">
<term>
- <option><![CDATA[--trace-malloc=no|yes [no]
+ <option><![CDATA[- -trace-malloc=no|yes [no]
]]></option>
</term>
<listitem>
- <para>Show all client malloc (etc) and free (etc) requests.</para>
+ <para>Show all client <function>malloc</function> (etc) and
+ <function>free</function> (etc) requests.</para>
</listitem>
</varlistentry>
<varlistentry id="opt.cmp-race-err-addrs"
- xreflabel="--cmp-race-err-addrs">
+ xreflabel="- -cmp-race-err-addrs">
<term>
- <option><![CDATA[--cmp-race-err-addrs=no|yes [no]
+ <option><![CDATA[- -cmp-race-err-addrs=no|yes [no]
]]></option>
</term>
<listitem>
<para>Controls whether or not race (data) addresses should be
taken into account when removing duplicates of race errors.
- With <varname>--cmp-race-err-addrs=no</varname>, two otherwise
+ With <varname>- -cmp-race-err-addrs=no</varname>, two otherwise
identical race errors will be considered to be the same if
their race addresses differ. With
- With <varname>--cmp-race-err-addrs=yes</varname> they will be
+ With <varname>- -cmp-race-err-addrs=yes</varname> they will be
considered different. This is provided to help make certain
regression tests work reliably.</para>
</listitem>
</varlistentry>
- <varlistentry id="opt.hg-sanity-flags" xreflabel="--hg-sanity-flags">
+ <varlistentry id="opt.hg-sanity-flags" xreflabel="- -hg-sanity-flags">
<term>
- <option><![CDATA[--hg-sanity-flags=<XXXXXX> (X = 0|1) [000000]
+ <option><![CDATA[- -hg-sanity-flags=<XXXXXX> (X = 0|1) [000000]
]]></option>
</term>
<listitem>
@@ -1068,11 +1086,36 @@
</varlistentry>
</variablelist>
+-->
<!-- end of xi:include in the manpage -->
</sect1>
+
+
+<sect1 id="hg-manual.client-requests" xreflabel="Helgrind Client Requests">
+<title>Helgrind Client Requests</title>
+
+<para>The following client requests are defined in
+<filename>helgrind.h</filename>. See that file for exact details of their
+arguments.</para>
+
+<itemizedlist>
+
+ <listitem>
+ <para><function>VALGRIND_HG_CLEAN_MEMORY</function>,
+ This makes Helgrind forget everything it knows about a specified memory
+ range. This is particularly useful for memory allocators that wish to
+ recycle memory.</para>
+ </listitem>
+
+</itemizedlist>
+
+</sect1>
+
+
+
<sect1 id="hg-manual.todolist" xreflabel="To Do List">
<title>A To-Do List for Helgrind</title>
@@ -1088,9 +1131,6 @@
cycle, rather than only doing for size-2 cycles as at
present.</para>
</listitem>
- <listitem><para>Document the VALGRIND_HG_CLEAN_MEMORY client
- request.</para>
- </listitem>
<listitem><para>The conflicting access mechanism sometimes
mysteriously fails to show the conflicting access' stack, even
when provided with unbounded storage for conflicting access info.
@@ -1104,8 +1144,8 @@
</para>
</listitem>
<listitem><para>Don't update the lock-order graph, and don't check
- for errors, when a "try"-style lock operation happens (eg
- pthread_mutex_trylock). Such calls do not add any real
+ for errors, when a "try"-style lock operation happens (e.g.
+ <function>pthread_mutex_trylock</function>). Such calls do not add any real
restrictions to the locking order, since they can always fail to
acquire the lock, resulting in the caller going off and doing Plan
B (presumably it will have a Plan B). Doing such checks could
Modified: trunk/helgrind/hg_main.c
===================================================================
--- trunk/helgrind/hg_main.c 2009-08-07 00:18:25 UTC (rev 10730)
+++ trunk/helgrind/hg_main.c 2009-08-07 02:18:00 UTC (rev 10731)
@@ -4210,7 +4210,7 @@
{
VG_(printf)(
" --track-lockorders=no|yes show lock ordering errors? [yes]\n"
-" --history-level=none|partial|full [full]\n"
+" --history-level=none|approx|full [full]\n"
" full: show both stack traces for a data race (can be very slow)\n"
" approx: full trace for one thread, approx for the other (faster)\n"
" none: only show trace for one thread in a race (fastest)\n"
|
|
From: <sv...@va...> - 2009-08-07 00:33:51
|
Author: njn
Date: 2009-08-07 01:18:25 +0100 (Fri, 07 Aug 2009)
New Revision: 10730
Log:
Thoroughly overhauled the Cachegrind manual chapter, mostly by putting
things in a more sensible order. Also tweaked the Massif chapter a bit
more.
Modified:
trunk/cachegrind/cg_annotate.in
trunk/cachegrind/docs/cg-manual.xml
trunk/massif/docs/ms-manual.xml
Modified: trunk/cachegrind/cg_annotate.in
===================================================================
--- trunk/cachegrind/cg_annotate.in 2009-08-06 18:13:17 UTC (rev 10729)
+++ trunk/cachegrind/cg_annotate.in 2009-08-07 00:18:25 UTC (rev 10730)
@@ -146,7 +146,7 @@
options for the user, with defaults in [ ], are:
-h --help show this message
- -v --version show version
+ --version show version
--show=A,B,C only show figures for events A,B,C [all]
--sort=A,B,C sort columns by events A,B,C [event column order]
--threshold=<0--100> percentage of counts (of primary sort event) we
@@ -179,7 +179,7 @@
if ($arg =~ /^-/) {
# --version
- if ($arg =~ /^-v$|^--version$/) {
+ if ($arg =~ /^--version$/) {
die("cg_annotate-$version\n");
# --show=A,B,C
Modified: trunk/cachegrind/docs/cg-manual.xml
===================================================================
--- trunk/cachegrind/docs/cg-manual.xml 2009-08-06 18:13:17 UTC (rev 10729)
+++ trunk/cachegrind/docs/cg-manual.xml 2009-08-07 00:18:25 UTC (rev 10730)
@@ -15,30 +15,56 @@
<title>Overview</title>
<para>Cachegrind simulates how your program interacts with a machine's cache
-hierarchy and (optionally) branch predictor. It gathers the following
-statistics:</para>
+hierarchy and (optionally) branch predictor. It simulates a machine with
+independent first level instruction and data caches (I1 and D1), backed by a
+unified second level cache (L2). This configuration is used by almost all
+modern machines.</para>
+
+<para>
+It gathers the following statistics (abbreviations used for each statistic
+is given in parentheses):</para>
<itemizedlist>
<listitem>
- <para>L1 instruction cache reads and read misses;</para>
+ <para>I cache reads (<computeroutput>Ir</computeroutput>,
+ which equals the number of instructions executed),
+ I1 cache read misses (<computeroutput>I1mr</computeroutput>) and
+ L2 cache instruction read misses (<computeroutput>I1mr</computeroutput>).
+ </para>
</listitem>
<listitem>
- <para>L1 data cache reads and read misses, writes and write
- misses;</para>
+ <para>D cache reads (<computeroutput>Dr</computeroutput>, which
+ equals the number of memory reads),
+ D1 cache read misses (<computeroutput>D1mr</computeroutput>), and
+ L2 cache data read misses (<computeroutput>D2mr</computeroutput>).
+ </para>
</listitem>
<listitem>
- <para>L2 unified cache reads and read misses, writes and
- writes misses.</para>
+ <para>D cache writes (<computeroutput>Dw</computeroutput>, which equals
+ the number of memory writes),
+ D1 cache write misses (<computeroutput>D1mw</computeroutput>), and
+ L2 cache data write misses (<computeroutput>D2mw</computeroutput>).
+ </para>
</listitem>
<listitem>
- <para>Conditional branches and mispredicted conditional branches.</para>
+ <para>Conditional branches executed (<computeroutput>Bc</computeroutput>) and
+ conditional branches mispredicted (<computeroutput>Bcm</computeroutput>).
+ </para>
</listitem>
<listitem>
- <para>Indirect branches and mispredicted indirect branches. An
- indirect branch is a jump or call to a destination only known at
- run time.</para>
+ <para>Indirect branches executed (<computeroutput>Bi</computeroutput>) and
+ indirect branches mispredicted (<computeroutput>Bim</computeroutput>).
+ </para>
</listitem>
</itemizedlist>
+<para>Note that D1 total accesses is given by
+<computeroutput>D1mr</computeroutput> +
+<computeroutput>D1mw</computeroutput>, and that L2 total
+accesses is given by <computeroutput>I2mr</computeroutput> +
+<computeroutput>D2mr</computeroutput> +
+<computeroutput>D2mw</computeroutput>.
+</para>
+
<para>These statistics are presented for the entire program and for each
function in the program. You can also annotate each line of source code in
the program with the counts that were caused directly by it.</para>
@@ -54,258 +80,49 @@
instruction executed, you can find out how many instructions are
executed per line, which can be useful for traditional profiling.</para>
-<para>Branch profiling is not enabled by default. To use it, you must
-additionally specify <option>--branch-sim=yes</option>
-on the command line.</para>
+</sect1>
-<sect2 id="cg-manual.basics" xreflabel="Basics">
-<title>Basics</title>
+<sect1 id="cg-manual.profile"
+ xreflabel="Using Cachegrind, cg_annotate and cg_merge">
+<title>Using Cachegrind, cg_annotate and cg_merge</title>
+
<para>First off, as for normal Valgrind use, you probably want to
compile with debugging info (the
<option>-g</option> flag). But by contrast with
-normal Valgrind use, you probably <command>do</command> want to turn
+normal Valgrind use, you probably do want to turn
optimisation on, since you should profile your program as it will
be normally run.</para>
-<para>The two steps are:</para>
-<orderedlist>
- <listitem>
- <para>Run your program with <computeroutput>valgrind
- --tool=cachegrind</computeroutput> in front of the normal
- command line invocation. When the program finishes,
- Cachegrind will print summary cache statistics. It also
- collects line-by-line information in a file
- <computeroutput>cachegrind.out.<pid></computeroutput>, where
- <computeroutput><pid></computeroutput> is the program's process
- ID.</para>
+<para>Then, you need to run Cachegrind itself to gather the profiling
+information, and then run cg_annotate to get a detailed presentation of that
+information. As an optional intermediate step, you can use cg_merge to sum
+together the outputs of multiple Cachegrind runs, into a single file which
+you then use as the input for cg_annotate.</para>
- <para>Branch prediction statistics are not collected by default.
- To do so, add the flag
- <option>--branch-sim=yes</option>.
- </para>
- <para>This step should be done every time you want to collect
- information about a new program, a changed program, or about
- the same program with different input.</para>
- </listitem>
+<sect2 id="cg-manual.running-cachegrind" xreflabel="Running Cachegrind">
+<title>Running Cachegrind</title>
- <listitem>
- <para>Generate a function-by-function summary, and possibly
- annotate source files, using the supplied
- cg_annotate program. Source
- files to annotate can be specified manually, or manually on
- the command line, or "interesting" source files can be
- annotated automatically with the
- <option>--auto=yes</option> option. You can
- annotate C/C++ files or assembly language files equally
- easily.</para>
+<para>To run Cachegrind on a program <filename>prog</filename>, run:</para>
+<screen><![CDATA[
+valgrind --tool=cachegrind prog
+]]></screen>
- <para>This step can be performed as many times as you like
- for each Step 2. You may want to do multiple annotations
- showing different information each time.</para>
- </listitem>
-
-</orderedlist>
-
-<para>As an optional intermediate step, you can use the supplied
-cg_merge program to sum together the
-outputs of multiple Cachegrind runs, into a single file which you then
-use as the input for cg_annotate.</para>
-
-<para>These steps are described in detail in the following
-sections.</para>
-
-</sect2>
-
-
-<sect2 id="cache-sim" xreflabel="Cache simulation specifics">
-<title>Cache simulation specifics</title>
-
-<para>Cachegrind simulates a machine with independent
-first level instruction and data caches (I1 and D1), backed by a
-unified second level cache (L2). This configuration is used by almost
-all modern machines. Some old Cyrix CPUs had a unified I and D L1
-cache, but they are ancient history now.</para>
-
-<para>Specific characteristics of the simulation are as
-follows:</para>
-
-<itemizedlist>
-
- <listitem>
- <para>Write-allocate: when a write miss occurs, the block
- written to is brought into the D1 cache. Most modern caches
- have this property.</para>
- </listitem>
-
- <listitem>
- <para>Bit-selection hash function: the set of line(s) in the cache
- to which a memory block maps is chosen by the middle bits
- M--(M+N-1) of the byte address, where:</para>
- <itemizedlist>
- <listitem>
- <para>line size = 2^M bytes</para>
- </listitem>
- <listitem>
- <para>(cache size / line size / associativity) = 2^N bytes</para>
- </listitem>
- </itemizedlist>
- </listitem>
-
- <listitem>
- <para>Inclusive L2 cache: the L2 cache typically replicates all
- the entries of the L1 caches, because fetching into L1 involves
- fetching into L2 first (this does not guarantee strict inclusiveness,
- as lines evicted from L2 still could reside in L1). This is
- standard on Pentium chips, but AMD Opterons, Athlons and Durons
- use an exclusive L2 cache that only holds
- blocks evicted from L1. Ditto most modern VIA CPUs.</para>
- </listitem>
-
-</itemizedlist>
-
-<para>The cache configuration simulated (cache size,
-associativity and line size) is determined automagically using
-the x86 CPUID instruction. If you have an machine that (a)
-doesn't support the CPUID instruction, or (b) supports it in an
-early incarnation that doesn't give any cache information, then
-Cachegrind will fall back to using a default configuration (that
-of a model 3/4 Athlon). Cachegrind will tell you if this
-happens. You can manually specify one, two or all three levels
-(I1/D1/L2) of the cache from the command line using the
-<option>--I1</option>,
-<option>--D1</option> and
-<option>--L2</option> options.
-For cache parameters to be valid for simulation, the number
-of sets (with associativity being the number of cache lines in
-each set) has to be a power of two.</para>
-
-<para>On PowerPC platforms
-Cachegrind cannot automatically
-determine the cache configuration, so you will
-need to specify it with the
-<option>--I1</option>,
-<option>--D1</option> and
-<option>--L2</option> options.</para>
-
-
-<para>Other noteworthy behaviour:</para>
-
-<itemizedlist>
- <listitem>
- <para>References that straddle two cache lines are treated as
- follows:</para>
- <itemizedlist>
- <listitem>
- <para>If both blocks hit --> counted as one hit</para>
- </listitem>
- <listitem>
- <para>If one block hits, the other misses --> counted
- as one miss.</para>
- </listitem>
- <listitem>
- <para>If both blocks miss --> counted as one miss (not
- two)</para>
- </listitem>
- </itemizedlist>
- </listitem>
-
- <listitem>
- <para>Instructions that modify a memory location
- (eg. <computeroutput>inc</computeroutput> and
- <computeroutput>dec</computeroutput>) are counted as doing
- just a read, ie. a single data reference. This may seem
- strange, but since the write can never cause a miss (the read
- guarantees the block is in the cache) it's not very
- interesting.</para>
-
- <para>Thus it measures not the number of times the data cache
- is accessed, but the number of times a data cache miss could
- occur.</para>
- </listitem>
-
-</itemizedlist>
-
-<para>If you are interested in simulating a cache with different
-properties, it is not particularly hard to write your own cache
-simulator, or to modify the existing ones in
-<computeroutput>cg_sim.c</computeroutput>. We'd be
-interested to hear from anyone who does.</para>
-
-</sect2>
-
-
-<sect2 id="branch-sim" xreflabel="Branch simulation specifics">
-<title>Branch simulation specifics</title>
-
-<para>Cachegrind simulates branch predictors intended to be
-typical of mainstream desktop/server processors of around 2004.</para>
-
-<para>Conditional branches are predicted using an array of 16384 2-bit
-saturating counters. The array index used for a branch instruction is
-computed partly from the low-order bits of the branch instruction's
-address and partly using the taken/not-taken behaviour of the last few
-conditional branches. As a result the predictions for any specific
-branch depend both on its own history and the behaviour of previous
-branches. This is a standard technique for improving prediction
-accuracy.</para>
-
-<para>For indirect branches (that is, jumps to unknown destinations)
-Cachegrind uses a simple branch target address predictor. Targets are
-predicted using an array of 512 entries indexed by the low order 9
-bits of the branch instruction's address. Each branch is predicted to
-jump to the same address it did last time. Any other behaviour causes
-a mispredict.</para>
-
-<para>More recent processors have better branch predictors, in
-particular better indirect branch predictors. Cachegrind's predictor
-design is deliberately conservative so as to be representative of the
-large installed base of processors which pre-date widespread
-deployment of more sophisticated indirect branch predictors. In
-particular, late model Pentium 4s (Prescott), Pentium M, Core and Core
-2 have more sophisticated indirect branch predictors than modelled by
-Cachegrind. </para>
-
-<para>Cachegrind does not simulate a return stack predictor. It
-assumes that processors perfectly predict function return addresses,
-an assumption which is probably close to being true.</para>
-
-<para>See Hennessy and Patterson's classic text "Computer
-Architecture: A Quantitative Approach", 4th edition (2007), Section
-2.3 (pages 80-89) for background on modern branch predictors.</para>
-
-</sect2>
-
-
-</sect1>
-
-
-
-<sect1 id="cg-manual.profile" xreflabel="Profiling programs">
-<title>Profiling programs</title>
-
-<para>To gather cache profiling information about the program
-<computeroutput>ls -l</computeroutput>, invoke Cachegrind like
-this:</para>
-
-<programlisting><![CDATA[
-valgrind --tool=cachegrind ls -l]]></programlisting>
-
<para>The program will execute (slowly). Upon completion,
summary statistics that look like this will be printed:</para>
<programlisting><![CDATA[
==31751== I refs: 27,742,716
==31751== I1 misses: 276
-==31751== L2 misses: 275
+==31751== L2i misses: 275
==31751== I1 miss rate: 0.0%
==31751== L2i miss rate: 0.0%
==31751==
==31751== D refs: 15,430,290 (10,955,517 rd + 4,474,773 wr)
==31751== D1 misses: 41,185 ( 21,905 rd + 19,280 wr)
-==31751== L2 misses: 23,085 ( 3,987 rd + 19,098 wr)
+==31751== L2d misses: 23,085 ( 3,987 rd + 19,098 wr)
==31751== D1 miss rate: 0.2% ( 0.1% + 0.4%)
==31751== L2d miss rate: 0.1% ( 0.0% + 0.4%)
==31751==
@@ -326,47 +143,30 @@
total).</para>
<para>Combined instruction and data figures for the L2 cache
-follow that.</para>
+follow that. Note that the L2 miss rate is computed relative to the total
+number of memory accesses, not the number of L1 misses. I.e. it is
+<computeroutput>(I2mr + D2mr + D2mw) / (Ir + Dr + Dw)</computeroutput>
+not
+<computeroutput>(I2mr + D2mr + D2mw) / (I1mr + D1mr + D1mw)</computeroutput>
+</para>
+<para>Branch prediction statistics are not collected by default.
+To do so, add the flag <option>--branch-sim=yes</option>.</para>
+</sect2>
-<sect2 id="cg-manual.outputfile" xreflabel="Output file">
-<title>Output file</title>
-<para>As well as printing summary information, Cachegrind also
-writes line-by-line cache profiling information to a user-specified
-file. By default this file is named
-<computeroutput>cachegrind.out.<pid></computeroutput>. This file
-is human-readable, but is intended to be interpreted by the accompanying
-program cg_annotate, described in the next section.</para>
+<sect2 id="cg-manual.outputfile" xreflabel="Output File">
+<title>Output File</title>
-<para>Things to note about the
-<computeroutput>cachegrind.out.<pid></computeroutput>
-file:</para>
+<para>As well as printing summary information, Cachegrind also writes
+more detailed profiling information to a file. By default this file is named
+<filename>cachegrind.out.<pid></filename> (where
+<filename><pid></filename> is the program's process ID), but its name
+can be changed with the <option>--cachegrind-out-file</option> option. This
+file is human-readable, but is intended to be interpreted by the
+accompanying program cg_annotate, described in the next section.</para>
-<itemizedlist>
- <listitem>
- <para>It is written every time Cachegrind is run, and will
- overwrite any existing
- <computeroutput>cachegrind.out.<pid></computeroutput>
- in the current directory (but that won't happen very often
- because it takes some time for process ids to be
- recycled).</para>
- </listitem>
- <listitem>
- <para>To use an output file name other than the default
- <computeroutput>cachegrind.out</computeroutput>,
- use the <option>--cachegrind-out-file</option>
- switch.</para>
- </listitem>
- <listitem>
- <para>It can be big: <computeroutput>ls -l</computeroutput>
- generates a file of about 350KB. Browsing a few files and
- web pages with a Konqueror built with full debugging
- information generates a file of around 15 MB.</para>
- </listitem>
-</itemizedlist>
-
<para>The default <computeroutput>.<pid></computeroutput> suffix
on the output file name serves two purposes. Firstly, it means you
don't have to rename old log files that you don't want to overwrite.
@@ -374,122 +174,34 @@
<option>--trace-children=yes</option> option of
programs that spawn child processes.</para>
+<para>The output file can be big, many megabytes for large applications
+built with full debugging information.</para>
+
</sect2>
+
+<sect2 id="cg-manual.running-cg_annotate" xreflabel="Running cg_annotate">
+<title>Running cg_annotate</title>
-<sect2 id="cg-manual.cgopts" xreflabel="Cachegrind options">
-<title>Cachegrind options</title>
+<para>Before using cg_annotate,
+it is worth widening your window to be at least 120-characters
+wide if possible, as the output lines can be quite long.</para>
-<!-- start of xi:include in the manpage -->
-<para id="cg.opts.para">Using command line options, you can
-manually specify the I1/D1/L2 cache
-configuration to simulate. For each cache, you can specify the
-size, associativity and line size. The size and line size
-are measured in bytes. The three items
-must be comma-separated, but with no spaces, eg:
-<literallayout> valgrind --tool=cachegrind --I1=65535,2,64</literallayout>
+<para>To get a function-by-function summary, run:</para>
-You can specify one, two or three of the I1/D1/L2 caches. Any level not
-manually specified will be simulated using the configuration found in
-the normal way (via the CPUID instruction for automagic cache
-configuration, or failing that, via defaults).</para>
+<screen>cg_annotate <filename></screen>
-<para>Cache-simulation specific options are:</para>
+<para>on a Cachegrind output file.</para>
-<variablelist id="cg.opts.list">
-
- <varlistentry id="opt.I1" xreflabel="--I1">
- <term>
- <option><![CDATA[--I1=<size>,<associativity>,<line size> ]]></option>
- </term>
- <listitem>
- <para>Specify the size, associativity and line size of the level 1
- instruction cache. </para>
- </listitem>
- </varlistentry>
-
- <varlistentry id="opt.D1" xreflabel="--D1">
- <term>
- <option><![CDATA[--D1=<size>,<associativity>,<line size> ]]></option>
- </term>
- <listitem>
- <para>Specify the size, associativity and line size of the level 1
- data cache.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry id="opt.L2" xreflabel="--L2">
- <term>
- <option><![CDATA[--L2=<size>,<associativity>,<line size> ]]></option>
- </term>
- <listitem>
- <para>Specify the size, associativity and line size of the level 2
- cache.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry id="opt.cachegrind-out-file" xreflabel="--cachegrind-out-file">
- <term>
- <option><![CDATA[--cachegrind-out-file=<file> ]]></option>
- </term>
- <listitem>
- <para>Write the profile data to
- <computeroutput>file</computeroutput> rather than to the default
- output file,
- <computeroutput>cachegrind.out.<pid></computeroutput>. The
- <option>%p</option> and <option>%q</option> format specifiers
- can be used to embed the process ID and/or the contents of an
- environment variable in the name, as is the case for the core
- option <option><xref linkend="opt.log-file"/></option>.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry id="opt.cache-sim" xreflabel="--cache-sim">
- <term>
- <option><![CDATA[--cache-sim=no|yes [yes] ]]></option>
- </term>
- <listitem>
- <para>Enables or disables collection of cache access and miss
- counts.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry id="opt.branch-sim" xreflabel="--branch-sim">
- <term>
- <option><![CDATA[--branch-sim=no|yes [no] ]]></option>
- </term>
- <listitem>
- <para>Enables or disables collection of branch instruction and
- misprediction counts. By default this is disabled as it
- slows Cachegrind down by approximately 25%. Note that you
- cannot specify <option>--cache-sim=no</option>
- and <option>--branch-sim=no</option>
- together, as that would leave Cachegrind with no
- information to collect.</para>
- </listitem>
- </varlistentry>
-
-</variablelist>
-<!-- end of xi:include in the manpage -->
-
</sect2>
-
-<sect2 id="cg-manual.annotate" xreflabel="Annotating C/C++ programs">
-<title>Annotating C/C++ programs</title>
+<sect2 id="cg-manual.the-output-preamble" xreflabel="The Output Preamble">
+<title>The Output Preamble</title>
-<para>Before using cg_annotate,
-it is worth widening your window to be at least 120-characters
-wide if possible, as the output lines can be quite long.</para>
+<para>The first part of the output looks like this:</para>
-<para>To get a function-by-function summary, run <computeroutput>cg_annotate
-<filename></computeroutput> on a Cachegrind output file.</para>
-
-<para>The output looks like this:</para>
-
<programlisting><![CDATA[
--------------------------------------------------------------------------------
I1 cache: 65536 B, 64 B, 2-way associative
@@ -501,37 +213,11 @@
Event sort order: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
Threshold: 99%
Chosen for annotation:
-Auto-annotation: on
+Auto-annotation: off
+]]></programlisting>
---------------------------------------------------------------------------------
-Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
---------------------------------------------------------------------------------
-27,742,716 276 275 10,955,517 21,905 3,987 4,474,773 19,280 19,098 PROGRAM TOTALS
---------------------------------------------------------------------------------
-Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw file:function
---------------------------------------------------------------------------------
-8,821,482 5 5 2,242,702 1,621 73 1,794,230 0 0 getc.c:_IO_getc
-5,222,023 4 4 2,276,334 16 12 875,959 1 1 concord.c:get_word
-2,649,248 2 2 1,344,810 7,326 1,385 . . . vg_main.c:strcmp
-2,521,927 2 2 591,215 0 0 179,398 0 0 concord.c:hash
-2,242,740 2 2 1,046,612 568 22 448,548 0 0 ctype.c:tolower
-1,496,937 4 4 630,874 9,000 1,400 279,388 0 0 concord.c:insert
- 897,991 51 51 897,831 95 30 62 1 1 ???:???
- 598,068 1 1 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__flockfile
- 598,068 0 0 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__funlockfile
- 598,024 4 4 213,580 35 16 149,506 0 0 vg_clientmalloc.c:malloc
- 446,587 1 1 215,973 2,167 430 129,948 14,057 13,957 concord.c:add_existing
- 341,760 2 2 128,160 0 0 128,160 0 0 vg_clientmalloc.c:vg_trap_here_WRAPPER
- 320,782 4 4 150,711 276 0 56,027 53 53 concord.c:init_hash_table
- 298,998 1 1 106,785 0 0 64,071 1 1 concord.c:create
- 149,518 0 0 149,516 0 0 1 0 0 ???:tolower@@GLIBC_2.0
- 149,518 0 0 149,516 0 0 1 0 0 ???:fgetc@@GLIBC_2.0
- 95,983 4 4 38,031 0 0 34,409 3,152 3,150 concord.c:new_word_node
- 85,440 0 0 42,720 0 0 21,360 0 0 vg_clientmalloc.c:vg_bogus_epilogue]]></programlisting>
-
-
-<para>First up is a summary of the annotation options:</para>
+<para>This is a summary of the annotation options:</para>
<itemizedlist>
@@ -547,68 +233,10 @@
</listitem>
<listitem>
- <para>Events recorded: event abbreviations are:</para>
+ <para>Events recorded: which events were recorded.</para>
<itemizedlist>
- <listitem>
- <para><computeroutput>Ir</computeroutput>: I cache reads
- (ie. instructions executed)</para>
- </listitem>
- <listitem>
- <para><computeroutput>I1mr</computeroutput>: I1 cache read
- misses</para>
- </listitem>
- <listitem>
- <para><computeroutput>I2mr</computeroutput>: L2 cache
- instruction read misses</para>
- </listitem>
- <listitem>
- <para><computeroutput>Dr</computeroutput>: D cache reads
- (ie. memory reads)</para>
- </listitem>
- <listitem>
- <para><computeroutput>D1mr</computeroutput>: D1 cache read
- misses</para>
- </listitem>
- <listitem>
- <para><computeroutput>D2mr</computeroutput>: L2 cache data
- read misses</para>
- </listitem>
- <listitem>
- <para><computeroutput>Dw</computeroutput>: D cache writes
- (ie. memory writes)</para>
- </listitem>
- <listitem>
- <para><computeroutput>D1mw</computeroutput>: D1 cache write
- misses</para>
- </listitem>
- <listitem>
- <para><computeroutput>D2mw</computeroutput>: L2 cache data
- write misses</para>
- </listitem>
- <listitem>
- <para><computeroutput>Bc</computeroutput>: Conditional branches
- executed</para>
- </listitem>
- <listitem>
- <para><computeroutput>Bcm</computeroutput>: Conditional branches
- mispredicted</para>
- </listitem>
- <listitem>
- <para><computeroutput>Bi</computeroutput>: Indirect branches
- executed</para>
- </listitem>
- <listitem>
- <para><computeroutput>Bim</computeroutput>: Conditional branches
- mispredicted</para>
- </listitem>
</itemizedlist>
- <para>Note that D1 total accesses is given by
- <computeroutput>D1mr</computeroutput> +
- <computeroutput>D1mw</computeroutput>, and that L2 total
- accesses is given by <computeroutput>I2mr</computeroutput> +
- <computeroutput>D2mr</computeroutput> +
- <computeroutput>D2mw</computeroutput>.</para>
</listitem>
<listitem>
@@ -628,7 +256,7 @@
<option>--sort</option> option.</para>
<para>Note that this dictates the order the functions appear.
- It is <command>not</command> the order in which the columns
+ It is <emphasis>not</emphasis> the order in which the columns
appear; that is dictated by the "events shown" line (and can
be changed with the <option>--show</option>
option).</para>
@@ -660,49 +288,87 @@
</itemizedlist>
+</sect2>
+
+
+<sect2 id="cg-manual.the-global"
+ xreflabel="The Global and Function-level Counts">
+<title>The Global and Function-level Counts</title>
+
<para>Then follows summary statistics for the whole
-program. These are similar to the summary provided when running
-<computeroutput>valgrind --tool=cachegrind</computeroutput>.</para>
+program:</para>
-<para>Then follows function-by-function statistics. Each function
+<programlisting><![CDATA[
+--------------------------------------------------------------------------------
+Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
+--------------------------------------------------------------------------------
+27,742,716 276 275 10,955,517 21,905 3,987 4,474,773 19,280 19,098 PROGRAM TOTALS]]></programlisting>
+
+<para>
+These are similar to the summary provided when Cachegrind finishes running.
+</para>
+
+<para>Then comes function-by-function statistics:</para>
+
+<programlisting><![CDATA[
+--------------------------------------------------------------------------------
+Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw file:function
+--------------------------------------------------------------------------------
+8,821,482 5 5 2,242,702 1,621 73 1,794,230 0 0 getc.c:_IO_getc
+5,222,023 4 4 2,276,334 16 12 875,959 1 1 concord.c:get_word
+2,649,248 2 2 1,344,810 7,326 1,385 . . . vg_main.c:strcmp
+2,521,927 2 2 591,215 0 0 179,398 0 0 concord.c:hash
+2,242,740 2 2 1,046,612 568 22 448,548 0 0 ctype.c:tolower
+1,496,937 4 4 630,874 9,000 1,400 279,388 0 0 concord.c:insert
+ 897,991 51 51 897,831 95 30 62 1 1 ???:???
+ 598,068 1 1 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__flockfile
+ 598,068 0 0 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__funlockfile
+ 598,024 4 4 213,580 35 16 149,506 0 0 vg_clientmalloc.c:malloc
+ 446,587 1 1 215,973 2,167 430 129,948 14,057 13,957 concord.c:add_existing
+ 341,760 2 2 128,160 0 0 128,160 0 0 vg_clientmalloc.c:vg_trap_here_WRAPPER
+ 320,782 4 4 150,711 276 0 56,027 53 53 concord.c:init_hash_table
+ 298,998 1 1 106,785 0 0 64,071 1 1 concord.c:create
+ 149,518 0 0 149,516 0 0 1 0 0 ???:tolower@@GLIBC_2.0
+ 149,518 0 0 149,516 0 0 1 0 0 ???:fgetc@@GLIBC_2.0
+ 95,983 4 4 38,031 0 0 34,409 3,152 3,150 concord.c:new_word_node
+ 85,440 0 0 42,720 0 0 21,360 0 0 vg_clientmalloc.c:vg_bogus_epilogue]]></programlisting>
+
+<para>Each function
is identified by a
<computeroutput>file_name:function_name</computeroutput> pair. If
a column contains only a dot it means the function never performs
-that event (eg. the third row shows that
+that event (e.g. the third row shows that
<computeroutput>strcmp()</computeroutput> contains no
instructions that write to memory). The name
<computeroutput>???</computeroutput> is used if the the file name
and/or function name could not be determined from debugging
information. If most of the entries have the form
<computeroutput>???:???</computeroutput> the program probably
-wasn't compiled with <option>-g</option>. If any
-code was invalidated (either due to self-modifying code or
-unloading of shared objects) its counts are aggregated into a
-single cost centre written as
-<computeroutput>(discarded):(discarded)</computeroutput>.</para>
+wasn't compiled with <option>-g</option>.</para>
<para>It is worth noting that functions will come both from
-the profiled program (eg. <filename>concord.c</filename>)
-and from libraries (eg. <filename>getc.c</filename>)</para>
+the profiled program (e.g. <filename>concord.c</filename>)
+and from libraries (e.g. <filename>getc.c</filename>)</para>
-<para>There are two ways to annotate source files -- by choosing
-them manually, or with the
-<option>--auto=yes</option> option. To do it
-manually, just specify the filenames as additional arguments to
-cg_annotate. For example, the
-output from running <filename>cg_annotate <filename>
-concord.c</filename> for our example produces the same output as above
-followed by an annotated version of <filename>concord.c</filename>, a
-section of which looks like:</para>
+</sect2>
+
+<sect2 id="cg-manual.line-by-line" xreflabel="Line-by-line Counts">
+<title>Line-by-line Counts</title>
+
+<para>There are two ways to annotate source files -- by specifying them
+manually as arguments to cg_annotate, or with the
+<option>--auto=yes</option> option. For example, the output from running
+<filename>cg_annotate <filename> concord.c</filename> for our example
+produces the same output as above followed by an annotated version of
+<filename>concord.c</filename>, a section of which looks like:</para>
+
<programlisting><![CDATA[
--------------------------------------------------------------------------------
-- User-annotated source: concord.c
--------------------------------------------------------------------------------
Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
-[snip]
-
. . . . . . . . . void init_hash_table(char *file_name, Word_Node *table[])
3 1 1 . . . 1 0 0 {
. . . . . . . . . FILE *file_ptr;
@@ -759,8 +425,7 @@
controlled by the <option>--context</option>
option.</para>
-<para>To get automatic annotation, run
-<computeroutput>cg_annotate <filename> --auto=yes</computeroutput>.
+<para>To get automatic annotation, use the <option>--auto=yes</option> option.
cg_annotate will automatically annotate every source file it can
find that is mentioned in the function-by-function summary.
Therefore, the files chosen for auto-annotation are affected by
@@ -782,7 +447,7 @@
<para>This is quite common for library files, since libraries are
usually compiled with debugging information, but the source files
are often not present on a system. If a file is chosen for
-annotation <command>both</command> manually and automatically, it
+annotation both manually and automatically, it
is marked as <computeroutput>User-annotated
source</computeroutput>. Use the
<option>-I</option>/<option>--include</option> option to tell Valgrind where
@@ -790,15 +455,15 @@
information aren't specific enough.</para>
<para>Beware that cg_annotate can take some time to digest large
-<computeroutput>cachegrind.out.<pid></computeroutput> files,
+<filename>cachegrind.out.<pid></filename> files,
e.g. 30 seconds or more. Also beware that auto-annotation can
produce a lot of output if your program is large!</para>
</sect2>
-<sect2 id="cg-manual.assembler" xreflabel="Annotating assembler programs">
-<title>Annotating assembly code programs</title>
+<sect2 id="cg-manual.assembler" xreflabel="Annotating Assembly Code Programs">
+<title>Annotating Assembly Code Programs</title>
<para>Valgrind can annotate assembly code programs too, or annotate
the assembly code generated for your C program. Sometimes this is
@@ -828,139 +493,27 @@
</sect2>
-</sect1>
+<sect2 id="cg-manual.annopts.warnings" xreflabel="cg_annotate Warnings">
+<title>cg_annotate Warnings</title>
-
-<sect1 id="cg-manual.annopts" xreflabel="cg_annotate options">
-<title>cg_annotate options</title>
-
-<variablelist>
-
- <varlistentry>
- <term>
- <option><![CDATA[-h --help ]]></option>
- </term>
- <listitem>
- <para>Show the help message.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>
- <option><![CDATA[-v --version ]]></option>
- </term>
- <listitem>
- <para>Show the version number.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>
- <option><![CDATA[--sort=A,B,C [default: order in
- cachegrind.out.<pid>] ]]></option>
- </term>
- <listitem>
- <para>Specifies the events upon which the sorting of the
- function-by-function entries will be based. Useful if you
- want to concentrate on eg. I cache misses
- (<option>--sort=I1mr,I2mr</option>), or D cache misses
- (<option>--sort=D1mr,D2mr</option>), or L2 misses
- (<option>--sort=D2mr,I2mr</option>).</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>
- <option><![CDATA[--show=A,B,C [default: all, using order in
- cachegrind.out.<pid>] ]]></option>
- </term>
- <listitem>
- <para>Specifies which events to show (and the column
- order). Default is to use all present in the
- <computeroutput>cachegrind.out.<pid></computeroutput> file (and
- use the order in the file).</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>
- <option><![CDATA[--threshold=X [default: 99%] ]]></option>
- </term>
- <listitem>
- <para>Sets the threshold for the function-by-function
- summary. Functions are shown that account for more than X%
- of the primary sort event. If auto-annotating, also affects
- which files are annotated.</para>
-
- <para>Note: thresholds can be set for more than one of the
- events by appending any events for the
- <option>--sort</option> option with a colon
- and a number (no spaces, though). E.g. if you want to see
- the functions that cover 99% of L2 read misses and 99% of L2
- write misses, use this option:</para>
- <para><option>--sort=D2mr:99,D2mw:99</option></para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>
- <option><![CDATA[--auto=<no|yes> [default: no] ]]></option>
- </term>
- <listitem>
- <para>When enabled, automatically annotates every file that
- is mentioned in the function-by-function summary that can be
- found. Also gives a list of those that couldn't be found.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>
- <option><![CDATA[--context=N [default: 8] ]]></option>
- </term>
- <listitem>
- <para>Print N lines of context before and after each
- annotated line. Avoids printing large sections of source
- files that were not executed. Use a large number
- (eg. 10,000) to show all source lines.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>
- <option><![CDATA[-I<dir> --include=<dir> [default: none] ]]></option>
- </term>
- <listitem>
- <para>Adds a directory to the list in which to search for
- files. Multiple -I/--include options can be given to add
- multiple directories.</para>
- </listitem>
- </varlistentry>
-
-</variablelist>
-
-
-
-<sect2 id="cg-manual.annopts.warnings" xreflabel="Warnings">
-<title>Warnings</title>
-
<para>There are a couple of situations in which
cg_annotate issues warnings.</para>
<itemizedlist>
<listitem>
<para>If a source file is more recent than the
- <computeroutput>cachegrind.out.<pid></computeroutput> file.
+ <filename>cachegrind.out.<pid></filename> file.
This is because the information in
- <computeroutput>cachegrind.out.<pid></computeroutput> is only
+ <filename>cachegrind.out.<pid></filename> is only
recorded with line numbers, so if the line numbers change at
- all in the source (eg. lines added, deleted, swapped), any
+ all in the source (e.g. lines added, deleted, swapped), any
annotations will be incorrect.</para>
</listitem>
<listitem>
<para>If information is recorded about line numbers past the
end of a file. This can be caused by the above problem,
- ie. shortening the source file while using an old
- <computeroutput>cachegrind.out.<pid></computeroutput> file. If
+ i.e. shortening the source file while using an old
+ <filename>cachegrind.out.<pid></filename> file. If
this happens, the figures for the bogus lines are printed
anyway (clearly marked as bogus) in case they are
important.</para>
@@ -972,8 +525,8 @@
<sect2 id="cg-manual.annopts.things-to-watch-out-for"
- xreflabel="Things to watch out for">
-<title>Things to watch out for</title>
+ xreflabel="Unusual Annotation Cases">
+<title>Unusual Annotation Cases</title>
<para>Some odd things that can occur during annotation:</para>
@@ -1015,6 +568,10 @@
%esi,%esi</computeroutput> to it.</para>
</listitem>
+ <!--
+ I think this isn't true any more, not since cost centres were moved from
+ being associated with instruction addresses to being associated with
+ source line numbers.
<listitem>
<para>Inlined functions can cause strange results in the
function-by-function summary. If a function
@@ -1026,7 +583,7 @@
<filename>bar.c</filename>, there will not be a
<computeroutput>foo.h:inline_me()</computeroutput> function
entry. Instead, there will be separate function entries for
- each inlining site, ie.
+ each inlining site, i.e.
<computeroutput>foo.h:f1()</computeroutput>,
<computeroutput>foo.h:f2()</computeroutput> and
<computeroutput>foo.h:f3()</computeroutput>. To find the
@@ -1041,6 +598,7 @@
<filename>foo.h</filename>, so Valgrind keeps using the old
one.</para>
</listitem>
+ -->
<listitem>
<para>Sometimes, the same filename might be represented with
@@ -1086,94 +644,12 @@
</sect2>
+<sect2 id="cg-manual.cg_merge" xreflabel="cg_merge">
+<title>Merging Profiles with cg_merge</title>
-<sect2 id="cg-manual.annopts.accuracy" xreflabel="Accuracy">
-<title>Accuracy</title>
-
-<para>Valgrind's cache profiling has a number of
-shortcomings:</para>
-
-<itemizedlist>
- <listitem>
- <para>It doesn't account for kernel activity -- the effect of
- system calls on the cache contents is ignored.</para>
- </listitem>
-
- <listitem>
- <para>It doesn't account for other process activity.
- This is probably desirable when considering a single
- program.</para>
- </listitem>
-
- <listitem>
- <para>It doesn't account for virtual-to-physical address
- mappings. Hence the simulation is not a true
- representation of what's happening in the
- cache. Most caches are physically indexed, but Cachegrind
- simulates caches using virtual addresses.</para>
- </listitem>
-
- <listitem>
- <para>It doesn't account for cache misses not visible at the
- instruction level, eg. those arising from TLB misses, or
- speculative execution.</para>
- </listitem>
-
- <listitem>
- <para>Valgrind will schedule
- threads differently from how they would be when running natively.
- This could warp the results for threaded programs.</para>
- </listitem>
-
- <listitem>
- <para>The x86/amd64 instructions <computeroutput>bts</computeroutput>,
- <computeroutput>btr</computeroutput> and
- <computeroutput>btc</computeroutput> will incorrectly be
- counted as doing a data read if both the arguments are
- registers, eg:</para>
-<programlisting><![CDATA[
- btsl %eax, %edx]]></programlisting>
-
- <para>This should only happen rarely.</para>
- </listitem>
-
- <listitem>
- <para>x86/amd64 FPU instructions with data sizes of 28 and 108 bytes
- (e.g. <computeroutput>fsave</computeroutput>) are treated as
- though they only access 16 bytes. These instructions seem to
- be rare so hopefully this won't affect accuracy much.</para>
- </listitem>
-
-</itemizedlist>
-
-<para>Another thing worth noting is that results are very sensitive.
-Changing the size of the the executable being profiled, or the sizes
-of any of the shared libraries it uses, or even the length of their
-file names, can perturb the results. Variations will be small, but
-don't expect perfectly repeatable results if your program changes at
-all.</para>
-
-<para>More recent GNU/Linux distributions do address space
-randomisation, in which identical runs of the same program have their
-shared libraries loaded at different locations, as a security measure.
-This also perturbs the results.</para>
-
-<para>While these factors mean you shouldn't trust the results to
-be super-accurate, hopefully they should be close enough to be
-useful.</para>
-
-</sect2>
-
-</sect1>
-
-
-
-<sect1 id="cg-manual.cg_merge" xreflabel="cg_merge">
-<title>Merging profiles with cg_merge</title>
-
<para>
cg_merge is a simple program which
-reads multiple profile files, as created by cachegrind, merges them
+reads multiple profile files, as created by Cachegrind, merges them
together, and writes the results into another file in the same format.
You can then examine the merged results using
<computeroutput>cg_annotate <filename></computeroutput>, as
@@ -1220,22 +696,224 @@
attempt to print a helpful error message if any of the input files
fail these checks.</para>
+</sect2>
+
+
</sect1>
+
+<sect1 id="cg-manual.cgopts" xreflabel="Cachegrind Options">
+<title>Cachegrind Options</title>
+
+<!-- start of xi:include in the manpage -->
+<para>Cachegrind-specific options are:</para>
+
+<variablelist id="cg.opts.list">
+
+ <varlistentry id="opt.I1" xreflabel="--I1">
+ <term>
+ <option><![CDATA[--I1=<size>,<associativity>,<line size> ]]></option>
+ </term>
+ <listitem>
+ <para>Specify the size, associativity and line size of the level 1
+ instruction cache. </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="opt.D1" xreflabel="--D1">
+ <term>
+ <option><![CDATA[--D1=<size>,<associativity>,<line size> ]]></option>
+ </term>
+ <listitem>
+ <para>Specify the size, associativity and line size of the level 1
+ data cache.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="opt.L2" xreflabel="--L2">
+ <term>
+ <option><![CDATA[--L2=<size>,<associativity>,<line size> ]]></option>
+ </term>
+ <listitem>
+ <para>Specify the size, associativity and line size of the level 2
+ cache.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="opt.cache-sim" xreflabel="--cache-sim">
+ <term>
+ <option><![CDATA[--cache-sim=no|yes [yes] ]]></option>
+ </term>
+ <listitem>
+ <para>Enables or disables collection of cache access and miss
+ counts.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="opt.branch-sim" xreflabel="--branch-sim">
+ <term>
+ <option><![CDATA[--branch-sim=no|yes [no] ]]></option>
+ </term>
+ <listitem>
+ <para>Enables or disables collection of branch instruction and
+ misprediction counts. By default this is disabled as it
+ slows Cachegrind down by approximately 25%. Note that you
+ cannot specify <option>--cache-sim=no</option>
+ and <option>--branch-sim=no</option>
+ together, as that would leave Cachegrind with no
+ information to collect.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="opt.cachegrind-out-file" xreflabel="--cachegrind-out-file">
+ <term>
+ <option><![CDATA[--cachegrind-out-file=<file> ]]></option>
+ </term>
+ <listitem>
+ <para>Write the profile data to
+ <computeroutput>file</computeroutput> rather than to the default
+ output file,
+ <filename>cachegrind.out.<pid></filename>. The
+ <option>%p</option> and <option>%q</option> format specifiers
+ can be used to embed the process ID and/or the contents of an
+ environment variable in the name, as is the case for the core
+ option <option><xref linkend="opt.log-file"/></option>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+</variablelist>
+<!-- end of xi:include in the manpage -->
+
+</sect1>
+
+
+
+<sect1 id="cg-manual.annopts" xreflabel="cg_annotate Options">
+<title>cg_annotate Options</title>
+
+<variablelist>
+
+ <varlistentry>
+ <term>
+ <option><![CDATA[-h --help ]]></option>
+ </term>
+ <listitem>
+ <para>Show the help message.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option><![CDATA[--version ]]></option>
+ </term>
+ <listitem>
+ <para>Show the version number.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option><![CDATA[--show=A,B,C [default: all, using order in
+ cachegrind.out.<pid>] ]]></option>
+ </term>
+ <listitem>
+ <para>Specifies which events to show (and the column
+ order). Default is to use all present in the
+ <filename>cachegrind.out.<pid></filename> file (and
+ use the order in the file). Useful if you want to concentrate on, for
+ example, I cache misses (<option>--show=I1mr,I2mr</option>), or data
+ read misses (<option>--show=D1mr,D2mr</option>), or L2 data misses
+ (<option>--show=D2mr,D2mw</option>). Best used in conjunction with
+ <option>--sort</option>.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option><![CDATA[--sort=A,B,C [default: order in
+ cachegrind.out.<pid>] ]]></option>
+ </term>
+ <listitem>
+ <para>Specifies the events upon which the sorting of the
+ function-by-function entries will be based.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option><![CDATA[--threshold=X [default: 99%] ]]></option>
+ </term>
+ <listitem>
+ <para>Sets the threshold for the function-by-function
+ summary. Functions are shown that account for more than X%
+ of the primary sort event. If auto-annotating, also affects
+ which files are annotated.</para>
+
+ <para>Note: thresholds can be set for more than one of the
+ events by appending any events for the
+ <option>--sort</option> option with a colon
+ and a number (no spaces, though). E.g. if you want to see
+ the functions that cover 99% of L2 read misses and 99% of L2
+ write misses, use this option:</para>
+ <para><option>--sort=D2mr:99,D2mw:99</option></para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option><![CDATA[--auto=<no|yes> [default: no] ]]></option>
+ </term>
+ <listitem>
+ <para>When enabled, automatically annotates every file that
+ is mentioned in the function-by-function summary that can be
+ found. Also gives a list of those that couldn't be found.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option><![CDATA[--context=N [default: 8] ]]></option>
+ </term>
+ <listitem>
+ <para>Print N lines of context before and after each
+ annotated line. Avoids printing large sections of source
+ files that were not executed. Use a large number
+ (e.g. 100000) to show all source lines.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option><![CDATA[-I<dir> --include=<dir> [default: none] ]]></option>
+ </term>
+ <listitem>
+ <para>Adds a directory to the list in which to search for
+ files. Multiple <option>-I</option>/<option>--include</option>
+ options can be given to add multiple directories.</para>
+ </listitem>
+ </varlistentry>
+
+</variablelist>
+
+</sect1>
+
+
+
<sect1 id="cg-manual.acting-on"
- xreflabel="Acting on Cachegrind's information">
-<title>Acting on Cachegrind's information</title>
+ xreflabel="Acting on Cachegrind's Information">
+<title>Acting on Cachegrind's Information</title>
<para>
Cachegrind gives you lots of information, but acting on that information
isn't always easy. Here are some rules of thumb that we have found to be
useful.</para>
<para>
-First of all, the global hit/miss rate numbers are not that useful. If you
-have multiple programs or multiple runs of a program, comparing the numbers
-might identify if any are outliers and worthy of closer investigation.
-Otherwise, they're not enough to act on.</para>
+First of all, the global hit/miss counts and miss rates are not that useful.
+If you have multiple programs or multiple runs of a program, comparing the
+numbers might identify if any are outliers and worthy of closer
+investigation. Otherwise, they're not enough to act on.</para>
<para>
The function-by-function counts are more useful to look at, as they pinpoint
@@ -1313,17 +991,258 @@
</sect1>
+
+<sect1 id="cg-manual.sim-details"
+ xreflabel="Simulation Details">
+<title>Simulation Details</title>
+<para>
+This section talks about details you don't need to know about in order to
+use Cachegrind, but may be of interest to some people.
+</para>
+
+<sect2 id="cache-sim" xreflabel="Cache Simulation Specifics">
+<title>Cache Simulation Specifics</title>
+
+<para>Specific characteristics of the cache simulation are as
+follows:</para>
+
+<itemizedlist>
+
+ <listitem>
+ <para>Write-allocate: when a write miss occurs, the block
+ written to is brought into the D1 cache. Most modern caches
+ have this property.</para>
+ </listitem>
+
+ <listitem>
+ <para>Bit-selection hash function: the set of line(s) in the cache
+ to which a memory block maps is chosen by the middle bits
+ M--(M+N-1) of the byte address, where:</para>
+ <itemizedlist>
+ <listitem>
+ <para>line size = 2^M bytes</para>
+ </listitem>
+ <listitem>
+ <para>(cache size / line size / associativity) = 2^N bytes</para>
+ </listitem>
+ </itemizedlist>
+ </listitem>
+
+ <listitem>
+ <para>Inclusive L2 cache: the L2 cache typically replicates all
+ the entries of the L1 caches, because fetching into L1 involves
+ fetching into L2 first (this does not guarantee strict inclusiveness,
+ as lines evicted from L2 still could reside in L1). This is
+ standard on Pentium chips, but AMD Opterons, Athlons and Durons
+ use an exclusive L2 cache that only holds
+ blocks evicted from L1. Ditto most modern VIA CPUs.</para>
+ </listitem>
+
+</itemizedlist>
+
+<para>The cache configuration simulated (cache size,
+associativity and line size) is determined automatically using
+the x86 CPUID instruction. If you have a machine that (a)
+doesn't support the CPUID instruction, or (b) supports it in an
+early incarnation that doesn't give any cache information, then
+Cachegrind will fall back to using a default configuration (that
+of a model 3/4 Athlon). Cachegrind will tell you if this
+happens. You can manually specify one, two or all three levels
+(I1/D1/L2) of the cache from the command line using the
+<option>--I1</option>,
+<option>--D1</option> and
+<option>--L2</option> options.
+For cache parameters to be valid for simulation, the number
+of sets (with associativity being the number of cache lines in
+each set) has to be a power of two.</para>
+
+<para>On PowerPC platforms
+Cachegrind cannot automatically
+determine the cache configuration, so you will
+need to specify it with the
+<option>--I1</option>,
+<option>--D1</option> and
+<option>--L2</option> options.</para>
+
+
+<para>Other noteworthy behaviour:</para>
+
+<itemizedlist>
+ <listitem>
+ <para>References that straddle two cache lines are treated as
+ follows:</para>
+ <itemizedlist>
+ <listitem>
+ <para>If both blocks hit --> counted as one hit</para>
+ </listitem>
+ <listitem>
+ <para>If one block hits, the other misses --> counted
+ as one miss.</para>
+ </listitem>
+ <listitem>
+ <para>If both blocks miss --> counted as one miss (not
+ two)</para>
+ </listitem>
+ </itemizedlist>
+ </listitem>
+
+ <listitem>
+ <para>Instructions that modify a memory location
+ (e.g. <computeroutput>inc</computeroutput> and
+ <computeroutput>dec</computeroutput>) are counted as doing
+ just a read, i.e. a single data reference. This may seem
+ strange, but since the write can never cause a miss (the read
+ guarantees the block is in the cache) it's not very
+ interesting.</para>
+
+ <para>Thus it measures not the number of times the data cache
+ is accessed, but the number of times a data cache miss could
+ occur.</para>
+ </listitem>
+
+</itemizedlist>
+
+<para>If you are interested in simulating a cache with different
+properties, it is not particularly hard to write your own cache
+simulator, or to modify the existing ones in
+<computeroutput>cg_sim.c</computeroutput>. We'd be
+interested to hear from anyone who does.</para>
+
+</sect2>
+
+
+<sect2 id="branch-sim" xreflabel="Branch Simulation Specifics">
+<title>Branch Simulation Specifics</title>
+
+<para>Cachegrind simulates branch predictors intended to be
+typical of mainstream desktop/server processors of around 2004.</para>
+
+<para>Conditional branches are predicted using an array of 16384 2-bit
+saturating counters. The array index used for a branch instruction is
+computed partly from the low-order bits of the branch instruction's
+address and partly using the taken/not-taken behaviour of the last few
+conditional branches. As a result the predictions for any specific
+branch depend both on its own history and the behaviour of previous
+branches. This is a standard technique for improving prediction
+accuracy.</para>
+
+<para>For indirect branches (that is, jumps to unknown destinations)
+Cachegrind uses a simple branch target address predictor. Targets are
+predicted using an array of 512 entries indexed by the low order 9
+bits of the branch instruction's address. Each branch is predicted to
+jump to the same address it did last time. Any other behaviour causes
+a mispredict.</para>
+
+<para>More recent processors have better branch predictors, in
+particular better indirect branch predictors. Cachegrind's predictor
+design is deliberately conservative so as to be representative of the
+large installed base of processors which pre-date widespread
+deployment of more sophisticated indirect branch predictors. In
+particular, late model Pentium 4s (Prescott), Pentium M, Core and Core
+2 have more sophisticated indirect branch predictors than modelled by
+Cachegrind. </para>
+
+<para>Cachegrind does not simulate a return stack predictor. It
+assumes that processors perfectly predict function return addresses,
+an assumption which is probably close to being true.</para>
+
+<para>See Hennessy and Patterson's classic text "Computer
+Architecture: A Quantitative Approach", 4th edition (2007), Section
+2.3 (pages 80-89) for background on modern branch predictors.</para>
+
+</sect2>
+
+<sect2 id="cg-manual.annopts.accuracy" xreflabel="Accuracy">
+<title>Accuracy</title>
+
+<para>Valgrind's cache profiling has a number of
+shortcomings:</para>
+
+<itemizedlist>
+ <listitem>
+ <para>It doesn't account for kernel activity -- the effect of system
+ calls on the cache and branch predictor contents is ignored.</para>
+ </listitem>
+
+ <listitem>
+ <para>It doesn't account for other process activity.
+ This is probably desirable when considering a single
+ program.</para>
+ </listitem>
+
+ <listitem>
+ <para>It doesn't account for virtual-to-physical address
+ mappings. Hence the simulation is not a true
+ representation of what's happening in the
+ cache. Most caches and branch predictors are physically indexed, but
+ Cachegrind simulates caches using virtual addresses.</para>
+ </listitem>
+
+ <listitem>
+ <para>It doesn't account for cache misses not visible at the
+ instruction level, e.g. those arising from TLB misses, or
+ speculative execution.</para>
+ </listitem>
+
+ <listitem>
+ <para>Valgrind will schedule
+ threads differently from how they would be when running natively.
+ This could warp the results for threaded programs.</para>
+ </listitem>
+
+ <listitem>
+ <para>The x86/amd64 instructi...
[truncated message content] |