|
From: <sv...@va...> - 2011-11-24 17:10:16
|
Author: florian
Date: 2011-11-24 17:05:34 +0000 (Thu, 24 Nov 2011)
New Revision: 443
Log:
Rename pc-manual.html to sg-manual.html.
This fixes a stale link from manual.html.
Added:
trunk/docs/manual/sg-manual.html
Removed:
trunk/docs/manual/pc-manual.html
Deleted: trunk/docs/manual/pc-manual.html
===================================================================
--- trunk/docs/manual/pc-manual.html 2011-11-24 17:01:50 UTC (rev 442)
+++ trunk/docs/manual/pc-manual.html 2011-11-24 17:05:34 UTC (rev 443)
@@ -1,405 +0,0 @@
-<html>
-<head>
-<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
-<title>11.SGCheck: an experimental heap, stack and global array overrun detector</title>
-<link rel="stylesheet" href="vg_basic.css" type="text/css">
-<meta name="generator" content="DocBook XSL Stylesheets V1.75.2">
-<link rel="home" href="index.html" title="Valgrind Documentation">
-<link rel="up" href="manual.html" title="Valgrind User Manual">
-<link rel="prev" href="dh-manual.html" title="10.DHAT: a dynamic heap analysis tool">
-<link rel="next" href="bbv-manual.html" title="12.BBV: an experimental basic block vector generation tool">
-</head>
-<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
-<div><table class="nav" width="100%" cellspacing="3" cellpadding="3" border="0" summary="Navigation header"><tr>
-<td width="22px" align="center" valign="middle"><a accesskey="p" href="dh-manual.html"><img src="images/prev.png" width="18" height="21" border="0" alt="Prev"></a></td>
-<td width="25px" align="center" valign="middle"><a accesskey="u" href="manual.html"><img src="images/up.png" width="21" height="18" border="0" alt="Up"></a></td>
-<td width="31px" align="center" valign="middle"><a accesskey="h" href="index.html"><img src="images/home.png" width="27" height="20" border="0" alt="Up"></a></td>
-<th align="center" valign="middle">Valgrind User Manual</th>
-<td width="22px" align="center" valign="middle"><a accesskey="n" href="bbv-manual.html"><img src="images/next.png" width="18" height="21" border="0" alt="Next"></a></td>
-</tr></table></div>
-<div class="chapter" title="11.SGCheck: an experimental heap, stack and global array overrun detector">
-<div class="titlepage"><div><div><h2 class="title">
-<a name="pc-manual"></a>11.SGCheck: an experimental heap, stack and global array overrun detector</h2></div></div></div>
-<div class="toc">
-<p><b>Table of Contents</b></p>
-<dl>
-<dt><span class="sect1"><a href="pc-manual.html#pc-manual.overview">11.1. Overview</a></span></dt>
-<dt><span class="sect1"><a href="pc-manual.html#pc-manual.options">11.2. SGCheck Command-line Options</a></span></dt>
-<dt><span class="sect1"><a href="pc-manual.html#pc-manual.how-works.heap-checks">11.3. How SGCheck Works: Heap Checks</a></span></dt>
-<dt><span class="sect1"><a href="pc-manual.html#pc-manual.how-works.sg-checks">11.4. How SGCheck Works: Stack and Global Checks</a></span></dt>
-<dt><span class="sect1"><a href="pc-manual.html#pc-manual.cmp-w-memcheck">11.5. Comparison with Memcheck</a></span></dt>
-<dt><span class="sect1"><a href="pc-manual.html#pc-manual.limitations">11.6. Limitations</a></span></dt>
-<dt><span class="sect1"><a href="pc-manual.html#pc-manual.todo-user-visible">11.7. Still To Do: User-visible Functionality</a></span></dt>
-<dt><span class="sect1"><a href="pc-manual.html#pc-manual.todo-implementation">11.8. Still To Do: Implementation Tidying</a></span></dt>
-</dl>
-</div>
-<p>To use this tool, you must specify
-<code class="option">--tool=exp-sgcheck</code> on the Valgrind
-command line.</p>
-<div class="sect1" title="11.1.Overview">
-<div class="titlepage"><div><div><h2 class="title" style="clear: both">
-<a name="pc-manual.overview"></a>11.1.Overview</h2></div></div></div>
-<p>SGCheck is a tool for finding overruns of heap, stack
-and global arrays. Its functionality overlaps somewhat with
-Memcheck's, but it is able to catch invalid accesses in a number of
-cases that Memcheck would miss. A detailed comparison against
-Memcheck is presented below.</p>
-<p>SGCheck is composed of two almost completely independent tools
-that have been glued together. One part,
-in <code class="computeroutput">h_main.[ch]</code>, checks accesses
-through heap-derived pointers. The other part, in
-<code class="computeroutput">sg_main.[ch]</code>, checks accesses to
-stack and global arrays. The remaining
-files <code class="computeroutput">pc_{common,main}.[ch]</code>, provide
-common error-management and coordination functions, so as to make it
-appear as a single tool.</p>
-<p>The heap-check part is an extensively-hacked (largely rewritten)
-version of the experimental "Annelid" tool developed and described by
-Nicholas Nethercote and Jeremy Fitzhardinge. The stack- and global-
-check part uses a heuristic approach derived from an observation about
-the likely forms of stack and global array accesses, and, as far as is
-known, is entirely novel.</p>
-</div>
-<div class="sect1" title="11.2.SGCheck Command-line Options">
-<div class="titlepage"><div><div><h2 class="title" style="clear: both">
-<a name="pc-manual.options"></a>11.2.SGCheck Command-line Options</h2></div></div></div>
-<p>SGCheck-specific command-line options are:</p>
-<div class="variablelist">
-<a name="pc.opts.list"></a><dl>
-<dt>
-<a name="opt.enable-sg-checks"></a><span class="term">
- <code class="option">--enable-sg-checks=no|yes
- [default: yes] </code>
- </span>
-</dt>
-<dd><p>By default, SGCheck checks for overruns of stack, global
- and heap arrays.
- With <code class="varname">--enable-sg-checks=no</code>, the stack and
- global array checks are omitted, and only heap checking is
- performed. This can be useful because the stack and global
- checks are quite expensive, so omitting them speeds SGCheck up
- a lot.
- </p></dd>
-<dt>
-<a name="opt.partial-loads-ok"></a><span class="term">
- <code class="option">--partial-loads-ok=<yes|no> [default: no] </code>
- </span>
-</dt>
-<dd>
-<p>This option has the same meaning as it does for
- Memcheck.</p>
-<p>Controls how SGCheck handles word-sized, word-aligned
- loads which partially overlap the end of heap blocks -- that is,
- some of the bytes in the word are validly addressable, but
- others are not. When <code class="varname">yes</code>, such loads do not
- produce an address error. When <code class="varname">no</code> (the
- default), loads from partially invalid addresses are treated the
- same as loads from completely invalid addresses: an illegal heap
- access error is issued.
- </p>
-<p>Note that code that behaves in this way is in violation of
- the the ISO C/C++ standards, and should be considered broken. If
- at all possible, such code should be fixed. This option should be
- used only as a last resort.</p>
-</dd>
-</dl>
-</div>
-</div>
-<div class="sect1" title="11.3.How SGCheck Works: Heap Checks">
-<div class="titlepage"><div><div><h2 class="title" style="clear: both">
-<a name="pc-manual.how-works.heap-checks"></a>11.3.How SGCheck Works: Heap Checks</h2></div></div></div>
-<p>SGCheck can check for invalid uses of heap pointers, including
-out of range accesses and accesses to freed memory. The mechanism is
-however completely different from Memcheck's, and the checking is more
-powerful.</p>
-<p>For each pointer in the program, SGCheck keeps track of which
-heap block (if any) it was derived from. Then, when an access is made
-through that pointer, SGCheck compares the access address with the
-bounds of the associated block, and reports an error if the address is
-out of bounds, or if the block has been freed.</p>
-<p>Of course it is rarely the case that one wants to access a block
-only at the exact address returned by <code class="function">malloc</code> et al.
-SGCheck understands that adding or subtracting offsets from a pointer to a
-block results in a pointer to the same block.</p>
-<p>At a fundamental level, this scheme works because a correct
-program cannot make assumptions about the addresses returned by
-<code class="function">malloc</code> et al. In particular it cannot make any
-assumptions about the differences in addresses returned by subsequent calls
-to <code class="function">malloc</code> et al. Hence there are very few ways to take
-an address returned by <code class="function">malloc</code>, modify it, and still
-have a valid address. In short, the only allowable operations are adding
-and subtracting other non-pointer values. Almost all other operations
-produce a value which cannot possibly be a valid pointer.</p>
-</div>
-<div class="sect1" title="11.4.How SGCheck Works: Stack and Global Checks">
-<div class="titlepage"><div><div><h2 class="title" style="clear: both">
-<a name="pc-manual.how-works.sg-checks"></a>11.4.How SGCheck Works: Stack and Global Checks</h2></div></div></div>
-<p>When a source file is compiled
-with <code class="option">-g</code>, the compiler attaches DWARF3
-debugging information which describes the location of all stack and
-global arrays in the file.</p>
-<p>Checking of accesses to such arrays would then be relatively
-simple, if the compiler could also tell us which array (if any) each
-memory referencing instruction was supposed to access. Unfortunately
-the DWARF3 debugging format does not provide a way to represent such
-information, so we have to resort to a heuristic technique to
-approximate the same information. The key observation is that
- <span class="emphasis"><em>
- if a memory referencing instruction accesses inside a stack or
- global array once, then it is highly likely to always access that
- same array</em></span>.</p>
-<p>To see how this might be useful, consider the following buggy
-fragment:</p>
-<pre class="programlisting">
- { int i, a[10]; // both are auto vars
- for (i = 0; i <= 10; i++)
- a[i] = 42;
- }
-</pre>
-<p>At run time we will know the precise address
-of <code class="computeroutput">a[]</code> on the stack, and so we can
-observe that the first store resulting from <code class="computeroutput">a[i] =
-42</code> writes <code class="computeroutput">a[]</code>, and
-we will (correctly) assume that that instruction is intended always to
-access <code class="computeroutput">a[]</code>. Then, on the 11th
-iteration, it accesses somewhere else, possibly a different local,
-possibly an un-accounted for area of the stack (eg, spill slot), so
-SGCheck reports an error.</p>
-<p>There is an important caveat.</p>
-<p>Imagine a function such as <code class="function">memcpy</code>, which is used
-to read and write many different areas of memory over the lifetime of the
-program. If we insist that the read and write instructions in its memory
-copying loop only ever access one particular stack or global variable, we
-will be flooded with errors resulting from calls to
-<code class="function">memcpy</code>.</p>
-<p>To avoid this problem, SGCheck instantiates fresh likely-target
-records for each entry to a function, and discards them on exit. This
-allows detection of cases where (e.g.) <code class="function">memcpy</code> overflows
-its source or destination buffers for any specific call, but does not carry
-any restriction from one call to the next. Indeed, multiple threads may be
-multiple simultaneous calls to (e.g.) <code class="function">memcpy</code> without
-mutual interference.</p>
-</div>
-<div class="sect1" title="11.5.Comparison with Memcheck">
-<div class="titlepage"><div><div><h2 class="title" style="clear: both">
-<a name="pc-manual.cmp-w-memcheck"></a>11.5.Comparison with Memcheck</h2></div></div></div>
-<p>Memcheck does not do any access checks for stack or global arrays, so
-the presence of those in SGCheck is a straight win. (But see
-"Limitations" below).</p>
-<p>Memcheck and SGCheck use different approaches for checking heap
-accesses. Memcheck maintains bitmaps telling it which areas of memory
-are accessible and which are not. If a memory access falls in an
-unaccessible area, it reports an error. By marking the 16 bytes
-before and after an allocated block unaccessible, Memcheck is able to
-detect small over- and underruns of the block. Similarly, by marking
-freed memory as unaccessible, Memcheck can detect all accesses to
-freed memory.</p>
-<p>Memcheck's approach is simple. But it's also weak. It can't
-catch block overruns beyond 16 bytes. And, more generally, because it
-focusses only on the question "is the target address accessible", it
-fails to detect invalid accesses which just happen to fall within some
-other valid area. This is not improbable, especially in crowded areas
-of the process' address space.</p>
-<p>SGCheck's approach is to keep track of pointers derived from
-heap blocks. It tracks pointers which are derived directly from calls
-to <code class="function">malloc</code> et al, but also ones derived indirectly, by
-adding or subtracting offsets from the directly-derived pointers. When a
-pointer is finally used to access memory, SGCheck compares the access
-address with that of the block it was originally derived from, and
-reports an error if the access address is not within the block
-bounds.</p>
-<p>Consequently SGCheck can detect any out of bounds access
-through a heap-derived pointer, no matter how far from the original
-block it is.</p>
-<p>A second advantage is that SGCheck is better at detecting
-accesses to blocks freed very far in the past. Memcheck can detect
-these too, but only for blocks freed relatively recently. To detect
-accesses to a freed block, Memcheck must make it inaccessible, hence
-requiring a space overhead proportional to the size of the block. If
-the blocks are large, Memcheck will have to make them available for
-re-allocation relatively quickly, thereby losing the ability to detect
-invalid accesses to them.</p>
-<p>By contrast, SGCheck has a constant per-block space requirement
-of four machine words, for detection of accesses to freed blocks. A
-freed block can be reallocated immediately, yet SGCheck can still
-detect all invalid accesses through any pointers derived from the old
-allocation, providing only that the four-word descriptor for the old
-allocation is stored. For example, on a 64-bit machine, to detect
-accesses in any of the most recently freed 10 million blocks, SGCheck
-will require only 320MB of extra storage. Achieving the same level of
-detection with Memcheck is close to impossible and would likely
-involve several gigabytes of extra storage.</p>
-<p>Having said all that, remember that Memcheck performs uninitialised
-value checking, invalid and mismatched free checking, overlap checking, and
-leak checking, none of which SGCheck do. Memcheck has also benefitted from
-years of refinement, tuning, and experience with production-level usage, and
-so is much faster than SGCheck as it currently stands.
-</p>
-<p>Consequently we recommend you first make your programs run Memcheck
-clean. Once that's done, try SGCheck to see if you can shake out any
-further heap, global or stack errors.</p>
-</div>
-<div class="sect1" title="11.6.Limitations">
-<div class="titlepage"><div><div><h2 class="title" style="clear: both">
-<a name="pc-manual.limitations"></a>11.6.Limitations</h2></div></div></div>
-<p>This is an experimental tool, which relies rather too heavily on some
-not-as-robust-as-I-would-like assumptions on the behaviour of correct
-programs. There are a number of limitations which you should be aware
-of.</p>
-<div class="itemizedlist"><ul class="itemizedlist" type="disc">
-<li class="listitem"><p>Heap checks: SGCheck can occasionally lose track of, or
- become confused about, which heap block a given pointer has been
- derived from. This can cause it to falsely report errors, or to
- miss some errors. This is not believed to be a serious
- problem.</p></li>
-<li class="listitem"><p>Heap checks: SGCheck only tracks pointers that are stored
- properly aligned in memory. If a pointer is stored at a misaligned
- address, and then later read again, SGCheck will lose track of
- what it points at. Similar problem if a pointer is split into
- pieces and later reconsitituted.</p></li>
-<li class="listitem"><p>Heap checks: SGCheck needs to "understand" which system
- calls return pointers and which don't. Many, but not all system
- calls are handled. If an unhandled one is encountered, SGCheck will
- abort. Fortunately, adding support for a new syscall is very
- easy.</p></li>
-<li class="listitem"><p>Stack checks: It follows from the description above (<a class="xref" href="pc-manual.html#pc-manual.how-works.sg-checks" title="11.4.How SGCheck Works: Stack and Global Checks">How SGCheck Works: Stack and Global Checks</a>) that the first access by a
- memory referencing instruction to a stack or global array creates an
- association between that instruction and the array, which is checked on
- subsequent accesses by that instruction, until the containing function
- exits. Hence, the first access by an instruction to an array (in any
- given function instantiation) is not checked for overrun, since SGCheck
- uses that as the "example" of how subsequent accesses should
- behave.</p></li>
-<li class="listitem">
-<p>Stack checks: Similarly, and more serious, it is clearly
- possible to write legitimate pieces of code which break the basic
- assumption upon which the stack/global checking rests. For
- example:</p>
-<pre class="programlisting">
- { int a[10], b[10], *p, i;
- for (i = 0; i < 10; i++) {
- p = /* arbitrary condition */ ? &a[i] : &b[i];
- *p = 42;
- }
- }
-</pre>
-<p>In this case the store sometimes
- accesses <code class="computeroutput">a[]</code> and
- sometimes <code class="computeroutput">b[]</code>, but in no cases is
- the addressed array overrun. Nevertheless the change in target
- will cause an error to be reported.</p>
-<p>It is hard to see how to get around this problem. The only
- mitigating factor is that such constructions appear very rare, at
- least judging from the results using the tool so far. Such a
- construction appears only once in the Valgrind sources (running
- Valgrind on Valgrind) and perhaps two or three times for a start
- and exit of Firefox. The best that can be done is to suppress the
- errors.</p>
-</li>
-<li class="listitem"><p>Performance: the stack/global checks require reading all of
- the DWARF3 type and variable information on the executable and its
- shared objects. This is computationally expensive and makes
- startup quite slow. You can expect debuginfo reading time to be in
- the region of a minute for an OpenOffice sized application, on a
- 2.4 GHz Core 2 machine. Reading this information also requires a
- lot of memory. To make it viable, SGCheck goes to considerable
- trouble to compress the in-memory representation of the DWARF3
- data, which is why the process of reading it appears slow.</p></li>
-<li class="listitem"><p>Performance: SGCheck runs slower than Memcheck. This is
- partly due to a lack of tuning, but partly due to algorithmic
- difficulties. The heap-check side is potentially quite fast. The
- stack and global checks can sometimes require a number of range
- checks per memory access, and these are difficult to short-circuit
- (despite considerable efforts having been made).
- </p></li>
-<li class="listitem">
-<p>Coverage: the heap checking is relatively robust, requiring
- only that SGCheck can see calls to <code class="function">malloc</code> et al.
- In that sense it has debug-info requirements comparable with Memcheck,
- and is able to heap-check programs even with no debugging information
- attached.</p>
-<p>Stack/global checking is much more fragile. If a shared
- object does not have debug information attached, then SGCheck will
- not be able to determine the bounds of any stack or global arrays
- defined within that shared object, and so will not be able to check
- accesses to them. This is true even when those arrays are accessed
- from some other shared object which was compiled with debug
- info.</p>
-<p>At the moment SGCheck accepts objects lacking debuginfo
- without comment. This is dangerous as it causes SGCheck to
- silently skip stack and global checking for such objects. It would
- be better to print a warning in such circumstances.</p>
-</li>
-<li class="listitem"><p>Coverage: SGCheck checks that the areas read or written by
- system calls do not overrun heap blocks. But it doesn't currently
- check them for overruns stack and global arrays. This would be
- easy to add.</p></li>
-<li class="listitem"><p>Platforms: the stack/global checks won't work properly on any
- PowerPC platforms, only on x86 and amd64 targets. That's because
- the stack and global checking requires tracking function calls and
- exits reliably, and there's no obvious way to do it with the PPC
- ABIs. (In comparison, with the x86 and amd64 ABIs this is relatively
- straightforward.)</p></li>
-<li class="listitem"><p>Robustness: related to the previous point. Function
- call/exit tracking for x86/amd64 is believed to work properly even
- in the presence of longjmps within the same stack (although this
- has not been tested). However, code which switches stacks is
- likely to cause breakage/chaos.</p></li>
-</ul></div>
-</div>
-<div class="sect1" title="11.7.Still To Do: User-visible Functionality">
-<div class="titlepage"><div><div><h2 class="title" style="clear: both">
-<a name="pc-manual.todo-user-visible"></a>11.7.Still To Do: User-visible Functionality</h2></div></div></div>
-<div class="itemizedlist"><ul class="itemizedlist" type="disc">
-<li class="listitem"><p>Extend system call checking to work on stack and global arrays.</p></li>
-<li class="listitem"><p>Print a warning if a shared object does not have debug info
- attached, or if, for whatever reason, debug info could not be
- found, or read.</p></li>
-</ul></div>
-</div>
-<div class="sect1" title="11.8.Still To Do: Implementation Tidying">
-<div class="titlepage"><div><div><h2 class="title" style="clear: both">
-<a name="pc-manual.todo-implementation"></a>11.8.Still To Do: Implementation Tidying</h2></div></div></div>
-<p>Items marked CRITICAL are considered important for correctness:
-non-fixage of them is liable to lead to crashes or assertion failures
-in real use.</p>
-<div class="itemizedlist"><ul class="itemizedlist" type="disc">
-<li class="listitem"><p>h_main.c: make N_FREED_SEGS command-line configurable.</p></li>
-<li class="listitem"><p> sg_main.c: Improve the performance of the stack / global
- checks by doing some up-front filtering to ignore references in
- areas which "obviously" can't be stack or globals. This will
- require using information that m_aspacemgr knows about the address
- space layout.</p></li>
-<li class="listitem"><p>h_main.c: get rid of the last_seg_added hack; add suitable
- plumbing to the core/tool interface to do this cleanly.</p></li>
-<li class="listitem"><p>h_main.c: move vast amounts of arch-dependent ugliness
- (get_IntRegInfo et al) to its own source file, a la
- mc_machine.c.</p></li>
-<li class="listitem"><p>h_main.c: make the lossage-check stuff work again, as a way
- of doing quality assurance on the implementation.</p></li>
-<li class="listitem"><p>h_main.c: schemeEw_Atom: don't generate a call to
- nonptr_or_unknown, this is really stupid, since it could be done at
- translation time instead.</p></li>
-<li class="listitem"><p>CRITICAL: h_main.c: h_instrument (main instrumentation fn):
- generate shadows for word-sized temps defined in the block's
- preamble. (Why does this work at all, as it stands?)</p></li>
-<li class="listitem"><p>sg_main.c: fix compute_II_hash to make it a bit more sensible
- for ppc32/64 targets (except that sg_ doesn't work on ppc32/64
- targets, so this is a bit academic at the moment).</p></li>
-</ul></div>
-</div>
-</div>
-<div>
-<br><table class="nav" width="100%" cellspacing="3" cellpadding="2" border="0" summary="Navigation footer">
-<tr>
-<td rowspan="2" width="40%" align="left">
-<a accesskey="p" href="dh-manual.html"><<10.DHAT: a dynamic heap analysis tool</a></td>
-<td width="20%" align="center"><a accesskey="u" href="manual.html">Up</a></td>
-<td rowspan="2" width="40%" align="right"><a accesskey="n" href="bbv-manual.html">12.BBV: an experimental basic block vector generation tool>></a>
-</td>
-</tr>
-<tr><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td></tr>
-</table>
-</div>
-</body>
-</html>
Copied: trunk/docs/manual/sg-manual.html (from rev 442, trunk/docs/manual/pc-manual.html)
===================================================================
--- trunk/docs/manual/sg-manual.html (rev 0)
+++ trunk/docs/manual/sg-manual.html 2011-11-24 17:05:34 UTC (rev 443)
@@ -0,0 +1,405 @@
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
+<title>11.SGCheck: an experimental heap, stack and global array overrun detector</title>
+<link rel="stylesheet" href="vg_basic.css" type="text/css">
+<meta name="generator" content="DocBook XSL Stylesheets V1.75.2">
+<link rel="home" href="index.html" title="Valgrind Documentation">
+<link rel="up" href="manual.html" title="Valgrind User Manual">
+<link rel="prev" href="dh-manual.html" title="10.DHAT: a dynamic heap analysis tool">
+<link rel="next" href="bbv-manual.html" title="12.BBV: an experimental basic block vector generation tool">
+</head>
+<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
+<div><table class="nav" width="100%" cellspacing="3" cellpadding="3" border="0" summary="Navigation header"><tr>
+<td width="22px" align="center" valign="middle"><a accesskey="p" href="dh-manual.html"><img src="images/prev.png" width="18" height="21" border="0" alt="Prev"></a></td>
+<td width="25px" align="center" valign="middle"><a accesskey="u" href="manual.html"><img src="images/up.png" width="21" height="18" border="0" alt="Up"></a></td>
+<td width="31px" align="center" valign="middle"><a accesskey="h" href="index.html"><img src="images/home.png" width="27" height="20" border="0" alt="Up"></a></td>
+<th align="center" valign="middle">Valgrind User Manual</th>
+<td width="22px" align="center" valign="middle"><a accesskey="n" href="bbv-manual.html"><img src="images/next.png" width="18" height="21" border="0" alt="Next"></a></td>
+</tr></table></div>
+<div class="chapter" title="11.SGCheck: an experimental heap, stack and global array overrun detector">
+<div class="titlepage"><div><div><h2 class="title">
+<a name="pc-manual"></a>11.SGCheck: an experimental heap, stack and global array overrun detector</h2></div></div></div>
+<div class="toc">
+<p><b>Table of Contents</b></p>
+<dl>
+<dt><span class="sect1"><a href="pc-manual.html#pc-manual.overview">11.1. Overview</a></span></dt>
+<dt><span class="sect1"><a href="pc-manual.html#pc-manual.options">11.2. SGCheck Command-line Options</a></span></dt>
+<dt><span class="sect1"><a href="pc-manual.html#pc-manual.how-works.heap-checks">11.3. How SGCheck Works: Heap Checks</a></span></dt>
+<dt><span class="sect1"><a href="pc-manual.html#pc-manual.how-works.sg-checks">11.4. How SGCheck Works: Stack and Global Checks</a></span></dt>
+<dt><span class="sect1"><a href="pc-manual.html#pc-manual.cmp-w-memcheck">11.5. Comparison with Memcheck</a></span></dt>
+<dt><span class="sect1"><a href="pc-manual.html#pc-manual.limitations">11.6. Limitations</a></span></dt>
+<dt><span class="sect1"><a href="pc-manual.html#pc-manual.todo-user-visible">11.7. Still To Do: User-visible Functionality</a></span></dt>
+<dt><span class="sect1"><a href="pc-manual.html#pc-manual.todo-implementation">11.8. Still To Do: Implementation Tidying</a></span></dt>
+</dl>
+</div>
+<p>To use this tool, you must specify
+<code class="option">--tool=exp-sgcheck</code> on the Valgrind
+command line.</p>
+<div class="sect1" title="11.1.Overview">
+<div class="titlepage"><div><div><h2 class="title" style="clear: both">
+<a name="pc-manual.overview"></a>11.1.Overview</h2></div></div></div>
+<p>SGCheck is a tool for finding overruns of heap, stack
+and global arrays. Its functionality overlaps somewhat with
+Memcheck's, but it is able to catch invalid accesses in a number of
+cases that Memcheck would miss. A detailed comparison against
+Memcheck is presented below.</p>
+<p>SGCheck is composed of two almost completely independent tools
+that have been glued together. One part,
+in <code class="computeroutput">h_main.[ch]</code>, checks accesses
+through heap-derived pointers. The other part, in
+<code class="computeroutput">sg_main.[ch]</code>, checks accesses to
+stack and global arrays. The remaining
+files <code class="computeroutput">pc_{common,main}.[ch]</code>, provide
+common error-management and coordination functions, so as to make it
+appear as a single tool.</p>
+<p>The heap-check part is an extensively-hacked (largely rewritten)
+version of the experimental "Annelid" tool developed and described by
+Nicholas Nethercote and Jeremy Fitzhardinge. The stack- and global-
+check part uses a heuristic approach derived from an observation about
+the likely forms of stack and global array accesses, and, as far as is
+known, is entirely novel.</p>
+</div>
+<div class="sect1" title="11.2.SGCheck Command-line Options">
+<div class="titlepage"><div><div><h2 class="title" style="clear: both">
+<a name="pc-manual.options"></a>11.2.SGCheck Command-line Options</h2></div></div></div>
+<p>SGCheck-specific command-line options are:</p>
+<div class="variablelist">
+<a name="pc.opts.list"></a><dl>
+<dt>
+<a name="opt.enable-sg-checks"></a><span class="term">
+ <code class="option">--enable-sg-checks=no|yes
+ [default: yes] </code>
+ </span>
+</dt>
+<dd><p>By default, SGCheck checks for overruns of stack, global
+ and heap arrays.
+ With <code class="varname">--enable-sg-checks=no</code>, the stack and
+ global array checks are omitted, and only heap checking is
+ performed. This can be useful because the stack and global
+ checks are quite expensive, so omitting them speeds SGCheck up
+ a lot.
+ </p></dd>
+<dt>
+<a name="opt.partial-loads-ok"></a><span class="term">
+ <code class="option">--partial-loads-ok=<yes|no> [default: no] </code>
+ </span>
+</dt>
+<dd>
+<p>This option has the same meaning as it does for
+ Memcheck.</p>
+<p>Controls how SGCheck handles word-sized, word-aligned
+ loads which partially overlap the end of heap blocks -- that is,
+ some of the bytes in the word are validly addressable, but
+ others are not. When <code class="varname">yes</code>, such loads do not
+ produce an address error. When <code class="varname">no</code> (the
+ default), loads from partially invalid addresses are treated the
+ same as loads from completely invalid addresses: an illegal heap
+ access error is issued.
+ </p>
+<p>Note that code that behaves in this way is in violation of
+ the the ISO C/C++ standards, and should be considered broken. If
+ at all possible, such code should be fixed. This option should be
+ used only as a last resort.</p>
+</dd>
+</dl>
+</div>
+</div>
+<div class="sect1" title="11.3.How SGCheck Works: Heap Checks">
+<div class="titlepage"><div><div><h2 class="title" style="clear: both">
+<a name="pc-manual.how-works.heap-checks"></a>11.3.How SGCheck Works: Heap Checks</h2></div></div></div>
+<p>SGCheck can check for invalid uses of heap pointers, including
+out of range accesses and accesses to freed memory. The mechanism is
+however completely different from Memcheck's, and the checking is more
+powerful.</p>
+<p>For each pointer in the program, SGCheck keeps track of which
+heap block (if any) it was derived from. Then, when an access is made
+through that pointer, SGCheck compares the access address with the
+bounds of the associated block, and reports an error if the address is
+out of bounds, or if the block has been freed.</p>
+<p>Of course it is rarely the case that one wants to access a block
+only at the exact address returned by <code class="function">malloc</code> et al.
+SGCheck understands that adding or subtracting offsets from a pointer to a
+block results in a pointer to the same block.</p>
+<p>At a fundamental level, this scheme works because a correct
+program cannot make assumptions about the addresses returned by
+<code class="function">malloc</code> et al. In particular it cannot make any
+assumptions about the differences in addresses returned by subsequent calls
+to <code class="function">malloc</code> et al. Hence there are very few ways to take
+an address returned by <code class="function">malloc</code>, modify it, and still
+have a valid address. In short, the only allowable operations are adding
+and subtracting other non-pointer values. Almost all other operations
+produce a value which cannot possibly be a valid pointer.</p>
+</div>
+<div class="sect1" title="11.4.How SGCheck Works: Stack and Global Checks">
+<div class="titlepage"><div><div><h2 class="title" style="clear: both">
+<a name="pc-manual.how-works.sg-checks"></a>11.4.How SGCheck Works: Stack and Global Checks</h2></div></div></div>
+<p>When a source file is compiled
+with <code class="option">-g</code>, the compiler attaches DWARF3
+debugging information which describes the location of all stack and
+global arrays in the file.</p>
+<p>Checking of accesses to such arrays would then be relatively
+simple, if the compiler could also tell us which array (if any) each
+memory referencing instruction was supposed to access. Unfortunately
+the DWARF3 debugging format does not provide a way to represent such
+information, so we have to resort to a heuristic technique to
+approximate the same information. The key observation is that
+ <span class="emphasis"><em>
+ if a memory referencing instruction accesses inside a stack or
+ global array once, then it is highly likely to always access that
+ same array</em></span>.</p>
+<p>To see how this might be useful, consider the following buggy
+fragment:</p>
+<pre class="programlisting">
+ { int i, a[10]; // both are auto vars
+ for (i = 0; i <= 10; i++)
+ a[i] = 42;
+ }
+</pre>
+<p>At run time we will know the precise address
+of <code class="computeroutput">a[]</code> on the stack, and so we can
+observe that the first store resulting from <code class="computeroutput">a[i] =
+42</code> writes <code class="computeroutput">a[]</code>, and
+we will (correctly) assume that that instruction is intended always to
+access <code class="computeroutput">a[]</code>. Then, on the 11th
+iteration, it accesses somewhere else, possibly a different local,
+possibly an un-accounted for area of the stack (eg, spill slot), so
+SGCheck reports an error.</p>
+<p>There is an important caveat.</p>
+<p>Imagine a function such as <code class="function">memcpy</code>, which is used
+to read and write many different areas of memory over the lifetime of the
+program. If we insist that the read and write instructions in its memory
+copying loop only ever access one particular stack or global variable, we
+will be flooded with errors resulting from calls to
+<code class="function">memcpy</code>.</p>
+<p>To avoid this problem, SGCheck instantiates fresh likely-target
+records for each entry to a function, and discards them on exit. This
+allows detection of cases where (e.g.) <code class="function">memcpy</code> overflows
+its source or destination buffers for any specific call, but does not carry
+any restriction from one call to the next. Indeed, multiple threads may be
+multiple simultaneous calls to (e.g.) <code class="function">memcpy</code> without
+mutual interference.</p>
+</div>
+<div class="sect1" title="11.5.Comparison with Memcheck">
+<div class="titlepage"><div><div><h2 class="title" style="clear: both">
+<a name="pc-manual.cmp-w-memcheck"></a>11.5.Comparison with Memcheck</h2></div></div></div>
+<p>Memcheck does not do any access checks for stack or global arrays, so
+the presence of those in SGCheck is a straight win. (But see
+"Limitations" below).</p>
+<p>Memcheck and SGCheck use different approaches for checking heap
+accesses. Memcheck maintains bitmaps telling it which areas of memory
+are accessible and which are not. If a memory access falls in an
+unaccessible area, it reports an error. By marking the 16 bytes
+before and after an allocated block unaccessible, Memcheck is able to
+detect small over- and underruns of the block. Similarly, by marking
+freed memory as unaccessible, Memcheck can detect all accesses to
+freed memory.</p>
+<p>Memcheck's approach is simple. But it's also weak. It can't
+catch block overruns beyond 16 bytes. And, more generally, because it
+focusses only on the question "is the target address accessible", it
+fails to detect invalid accesses which just happen to fall within some
+other valid area. This is not improbable, especially in crowded areas
+of the process' address space.</p>
+<p>SGCheck's approach is to keep track of pointers derived from
+heap blocks. It tracks pointers which are derived directly from calls
+to <code class="function">malloc</code> et al, but also ones derived indirectly, by
+adding or subtracting offsets from the directly-derived pointers. When a
+pointer is finally used to access memory, SGCheck compares the access
+address with that of the block it was originally derived from, and
+reports an error if the access address is not within the block
+bounds.</p>
+<p>Consequently SGCheck can detect any out of bounds access
+through a heap-derived pointer, no matter how far from the original
+block it is.</p>
+<p>A second advantage is that SGCheck is better at detecting
+accesses to blocks freed very far in the past. Memcheck can detect
+these too, but only for blocks freed relatively recently. To detect
+accesses to a freed block, Memcheck must make it inaccessible, hence
+requiring a space overhead proportional to the size of the block. If
+the blocks are large, Memcheck will have to make them available for
+re-allocation relatively quickly, thereby losing the ability to detect
+invalid accesses to them.</p>
+<p>By contrast, SGCheck has a constant per-block space requirement
+of four machine words, for detection of accesses to freed blocks. A
+freed block can be reallocated immediately, yet SGCheck can still
+detect all invalid accesses through any pointers derived from the old
+allocation, providing only that the four-word descriptor for the old
+allocation is stored. For example, on a 64-bit machine, to detect
+accesses in any of the most recently freed 10 million blocks, SGCheck
+will require only 320MB of extra storage. Achieving the same level of
+detection with Memcheck is close to impossible and would likely
+involve several gigabytes of extra storage.</p>
+<p>Having said all that, remember that Memcheck performs uninitialised
+value checking, invalid and mismatched free checking, overlap checking, and
+leak checking, none of which SGCheck do. Memcheck has also benefitted from
+years of refinement, tuning, and experience with production-level usage, and
+so is much faster than SGCheck as it currently stands.
+</p>
+<p>Consequently we recommend you first make your programs run Memcheck
+clean. Once that's done, try SGCheck to see if you can shake out any
+further heap, global or stack errors.</p>
+</div>
+<div class="sect1" title="11.6.Limitations">
+<div class="titlepage"><div><div><h2 class="title" style="clear: both">
+<a name="pc-manual.limitations"></a>11.6.Limitations</h2></div></div></div>
+<p>This is an experimental tool, which relies rather too heavily on some
+not-as-robust-as-I-would-like assumptions on the behaviour of correct
+programs. There are a number of limitations which you should be aware
+of.</p>
+<div class="itemizedlist"><ul class="itemizedlist" type="disc">
+<li class="listitem"><p>Heap checks: SGCheck can occasionally lose track of, or
+ become confused about, which heap block a given pointer has been
+ derived from. This can cause it to falsely report errors, or to
+ miss some errors. This is not believed to be a serious
+ problem.</p></li>
+<li class="listitem"><p>Heap checks: SGCheck only tracks pointers that are stored
+ properly aligned in memory. If a pointer is stored at a misaligned
+ address, and then later read again, SGCheck will lose track of
+ what it points at. Similar problem if a pointer is split into
+ pieces and later reconsitituted.</p></li>
+<li class="listitem"><p>Heap checks: SGCheck needs to "understand" which system
+ calls return pointers and which don't. Many, but not all system
+ calls are handled. If an unhandled one is encountered, SGCheck will
+ abort. Fortunately, adding support for a new syscall is very
+ easy.</p></li>
+<li class="listitem"><p>Stack checks: It follows from the description above (<a class="xref" href="pc-manual.html#pc-manual.how-works.sg-checks" title="11.4.How SGCheck Works: Stack and Global Checks">How SGCheck Works: Stack and Global Checks</a>) that the first access by a
+ memory referencing instruction to a stack or global array creates an
+ association between that instruction and the array, which is checked on
+ subsequent accesses by that instruction, until the containing function
+ exits. Hence, the first access by an instruction to an array (in any
+ given function instantiation) is not checked for overrun, since SGCheck
+ uses that as the "example" of how subsequent accesses should
+ behave.</p></li>
+<li class="listitem">
+<p>Stack checks: Similarly, and more serious, it is clearly
+ possible to write legitimate pieces of code which break the basic
+ assumption upon which the stack/global checking rests. For
+ example:</p>
+<pre class="programlisting">
+ { int a[10], b[10], *p, i;
+ for (i = 0; i < 10; i++) {
+ p = /* arbitrary condition */ ? &a[i] : &b[i];
+ *p = 42;
+ }
+ }
+</pre>
+<p>In this case the store sometimes
+ accesses <code class="computeroutput">a[]</code> and
+ sometimes <code class="computeroutput">b[]</code>, but in no cases is
+ the addressed array overrun. Nevertheless the change in target
+ will cause an error to be reported.</p>
+<p>It is hard to see how to get around this problem. The only
+ mitigating factor is that such constructions appear very rare, at
+ least judging from the results using the tool so far. Such a
+ construction appears only once in the Valgrind sources (running
+ Valgrind on Valgrind) and perhaps two or three times for a start
+ and exit of Firefox. The best that can be done is to suppress the
+ errors.</p>
+</li>
+<li class="listitem"><p>Performance: the stack/global checks require reading all of
+ the DWARF3 type and variable information on the executable and its
+ shared objects. This is computationally expensive and makes
+ startup quite slow. You can expect debuginfo reading time to be in
+ the region of a minute for an OpenOffice sized application, on a
+ 2.4 GHz Core 2 machine. Reading this information also requires a
+ lot of memory. To make it viable, SGCheck goes to considerable
+ trouble to compress the in-memory representation of the DWARF3
+ data, which is why the process of reading it appears slow.</p></li>
+<li class="listitem"><p>Performance: SGCheck runs slower than Memcheck. This is
+ partly due to a lack of tuning, but partly due to algorithmic
+ difficulties. The heap-check side is potentially quite fast. The
+ stack and global checks can sometimes require a number of range
+ checks per memory access, and these are difficult to short-circuit
+ (despite considerable efforts having been made).
+ </p></li>
+<li class="listitem">
+<p>Coverage: the heap checking is relatively robust, requiring
+ only that SGCheck can see calls to <code class="function">malloc</code> et al.
+ In that sense it has debug-info requirements comparable with Memcheck,
+ and is able to heap-check programs even with no debugging information
+ attached.</p>
+<p>Stack/global checking is much more fragile. If a shared
+ object does not have debug information attached, then SGCheck will
+ not be able to determine the bounds of any stack or global arrays
+ defined within that shared object, and so will not be able to check
+ accesses to them. This is true even when those arrays are accessed
+ from some other shared object which was compiled with debug
+ info.</p>
+<p>At the moment SGCheck accepts objects lacking debuginfo
+ without comment. This is dangerous as it causes SGCheck to
+ silently skip stack and global checking for such objects. It would
+ be better to print a warning in such circumstances.</p>
+</li>
+<li class="listitem"><p>Coverage: SGCheck checks that the areas read or written by
+ system calls do not overrun heap blocks. But it doesn't currently
+ check them for overruns stack and global arrays. This would be
+ easy to add.</p></li>
+<li class="listitem"><p>Platforms: the stack/global checks won't work properly on any
+ PowerPC platforms, only on x86 and amd64 targets. That's because
+ the stack and global checking requires tracking function calls and
+ exits reliably, and there's no obvious way to do it with the PPC
+ ABIs. (In comparison, with the x86 and amd64 ABIs this is relatively
+ straightforward.)</p></li>
+<li class="listitem"><p>Robustness: related to the previous point. Function
+ call/exit tracking for x86/amd64 is believed to work properly even
+ in the presence of longjmps within the same stack (although this
+ has not been tested). However, code which switches stacks is
+ likely to cause breakage/chaos.</p></li>
+</ul></div>
+</div>
+<div class="sect1" title="11.7.Still To Do: User-visible Functionality">
+<div class="titlepage"><div><div><h2 class="title" style="clear: both">
+<a name="pc-manual.todo-user-visible"></a>11.7.Still To Do: User-visible Functionality</h2></div></div></div>
+<div class="itemizedlist"><ul class="itemizedlist" type="disc">
+<li class="listitem"><p>Extend system call checking to work on stack and global arrays.</p></li>
+<li class="listitem"><p>Print a warning if a shared object does not have debug info
+ attached, or if, for whatever reason, debug info could not be
+ found, or read.</p></li>
+</ul></div>
+</div>
+<div class="sect1" title="11.8.Still To Do: Implementation Tidying">
+<div class="titlepage"><div><div><h2 class="title" style="clear: both">
+<a name="pc-manual.todo-implementation"></a>11.8.Still To Do: Implementation Tidying</h2></div></div></div>
+<p>Items marked CRITICAL are considered important for correctness:
+non-fixage of them is liable to lead to crashes or assertion failures
+in real use.</p>
+<div class="itemizedlist"><ul class="itemizedlist" type="disc">
+<li class="listitem"><p>h_main.c: make N_FREED_SEGS command-line configurable.</p></li>
+<li class="listitem"><p> sg_main.c: Improve the performance of the stack / global
+ checks by doing some up-front filtering to ignore references in
+ areas which "obviously" can't be stack or globals. This will
+ require using information that m_aspacemgr knows about the address
+ space layout.</p></li>
+<li class="listitem"><p>h_main.c: get rid of the last_seg_added hack; add suitable
+ plumbing to the core/tool interface to do this cleanly.</p></li>
+<li class="listitem"><p>h_main.c: move vast amounts of arch-dependent ugliness
+ (get_IntRegInfo et al) to its own source file, a la
+ mc_machine.c.</p></li>
+<li class="listitem"><p>h_main.c: make the lossage-check stuff work again, as a way
+ of doing quality assurance on the implementation.</p></li>
+<li class="listitem"><p>h_main.c: schemeEw_Atom: don't generate a call to
+ nonptr_or_unknown, this is really stupid, since it could be done at
+ translation time instead.</p></li>
+<li class="listitem"><p>CRITICAL: h_main.c: h_instrument (main instrumentation fn):
+ generate shadows for word-sized temps defined in the block's
+ preamble. (Why does this work at all, as it stands?)</p></li>
+<li class="listitem"><p>sg_main.c: fix compute_II_hash to make it a bit more sensible
+ for ppc32/64 targets (except that sg_ doesn't work on ppc32/64
+ targets, so this is a bit academic at the moment).</p></li>
+</ul></div>
+</div>
+</div>
+<div>
+<br><table class="nav" width="100%" cellspacing="3" cellpadding="2" border="0" summary="Navigation footer">
+<tr>
+<td rowspan="2" width="40%" align="left">
+<a accesskey="p" href="dh-manual.html"><<10.DHAT: a dynamic heap analysis tool</a></td>
+<td width="20%" align="center"><a accesskey="u" href="manual.html">Up</a></td>
+<td rowspan="2" width="40%" align="right"><a accesskey="n" href="bbv-manual.html">12.BBV: an experimental basic block vector generation tool>></a>
+</td>
+</tr>
+<tr><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td></tr>
+</table>
+</div>
+</body>
+</html>
|