|
From: <sv...@va...> - 2007-11-22 01:21:55
|
Author: sewardj
Date: 2007-11-22 01:21:56 +0000 (Thu, 22 Nov 2007)
New Revision: 7199
Log:
Update documents in preparation for 3.3.0, and restructure them
somewhat to move less relevant material out of the way to some extent.
The main changes are:
* Update date and version info
* Mention other tools in the quick-start guide
* Document --child-silent-after-fork
* Rearrange order of sections in the Valgrind Core chapter, to move
advanced stuff (client requests) to the end, and compact stuff
relevant to the majority of users towards the front
* Move MPI debugging stuff from the Core manual (a nonsensical place
for it) to the Memcheck chapter
* Update the manual's introductory chapter a bit
* Connect up new tech docs summary page, and disconnect old and
very out of date valgrind/memcheck tech docs
* Add section tags to the Cachegrind manual, to stop xsltproc
complaining about their absence
Modified:
trunk/ACKNOWLEDGEMENTS
trunk/AUTHORS
trunk/cachegrind/docs/cg-manual.xml
trunk/docs/xml/Makefile.am
trunk/docs/xml/manual-core.xml
trunk/docs/xml/manual-intro.xml
trunk/docs/xml/quick-start-guide.xml
trunk/docs/xml/tech-docs.xml
trunk/docs/xml/vg-entities.xml
trunk/memcheck/docs/mc-manual.xml
Modified: trunk/ACKNOWLEDGEMENTS
===================================================================
--- trunk/ACKNOWLEDGEMENTS 2007-11-22 01:07:57 UTC (rev 7198)
+++ trunk/ACKNOWLEDGEMENTS 2007-11-22 01:21:56 UTC (rev 7199)
@@ -6,8 +6,9 @@
Jeremy Fitzhardinge, je...@va...
-Jeremy wrote Helgrind and totally overhauled low-level syscall/signal
-and address space layout stuff, among many other improvements.
+Jeremy wrote Helgrind (in the 2.X line) and totally overhauled
+low-level syscall/signal and address space layout stuff, among many
+other improvements.
Tom Hughes, to...@va...
Modified: trunk/AUTHORS
===================================================================
--- trunk/AUTHORS 2007-11-22 01:07:57 UTC (rev 7198)
+++ trunk/AUTHORS 2007-11-22 01:21:56 UTC (rev 7199)
@@ -2,8 +2,9 @@
Cerion Armour-Brown worked on PowerPC instruction set support using
the Vex dynamic-translation framework.
-Jeremy Fitzhardinge wrote Helgrind and totally overhauled low-level
-syscall/signal and address space layout stuff, among many other things.
+Jeremy Fitzhardinge wrote Helgrind (in the 2.X line) and totally
+overhauled low-level syscall/signal and address space layout stuff,
+among many other things.
Tom Hughes did a vast number of bug fixes, and helped out with support
for more recent Linux/glibc versions.
Modified: trunk/cachegrind/docs/cg-manual.xml
===================================================================
--- trunk/cachegrind/docs/cg-manual.xml 2007-11-22 01:07:57 UTC (rev 7198)
+++ trunk/cachegrind/docs/cg-manual.xml 2007-11-22 01:21:56 UTC (rev 7199)
@@ -937,7 +937,7 @@
-<sect2>
+<sect2 id="cg-manual.annopts.warnings" xreflabel="Warnings">
<title>Warnings</title>
<para>There are a couple of situations in which
@@ -969,7 +969,8 @@
-<sect2>
+<sect2 id="cg-manual.annopts.things-to-watch-out-for"
+ xreflabel="Things to watch out for">
<title>Things to watch out for</title>
<para>Some odd things that can occur during annotation:</para>
@@ -1084,7 +1085,7 @@
-<sect2>
+<sect2 id="cg-manual.annopts.accuracy" xreflabel="Accuracy">
<title>Accuracy</title>
<para>Valgrind's cache profiling has a number of
@@ -1221,7 +1222,8 @@
</sect1>
-<sect1>
+<sect1 id="cg-manual.acting-on"
+ xreflabel="Acting on Cachegrind's information">
<title>Acting on Cachegrind's information</title>
<para>
So, you've managed to profile your program with Cachegrind. Now what?
@@ -1260,14 +1262,16 @@
</sect1>
-<sect1>
+<sect1 id="cg-manual.impl-details"
+ xreflabel="Implementation details">
<title>Implementation details</title>
<para>
This section talks about details you don't need to know about in order to
use Cachegrind, but may be of interest to some people.
</para>
-<sect2>
+<sect2 id="cg-manual.impl-details.how-cg-works"
+ xreflabel="How Cachegrind works">
<title>How Cachegrind works</title>
<para>The best reference for understanding how Cachegrind works is chapter 3 of
"Dynamic Binary Analysis and Instrumentation", by Nicholas Nethercote. It
@@ -1275,7 +1279,8 @@
page</ulink>.</para>
</sect2>
-<sect2>
+<sect2 id="cg-manual.impl-details.file-format"
+ xreflabel="Cachegrind output file format">
<title>Cachegrind output file format</title>
<para>The file format is fairly straightforward, basically giving the
cost centre for every line, grouped by files and
Modified: trunk/docs/xml/Makefile.am
===================================================================
--- trunk/docs/xml/Makefile.am 2007-11-22 01:07:57 UTC (rev 7198)
+++ trunk/docs/xml/Makefile.am 2007-11-22 01:21:56 UTC (rev 7199)
@@ -7,5 +7,6 @@
manual-writing-tools.xml\
quick-start-guide.xml \
tech-docs.xml \
+ new-tech-docs.xml \
vg-entities.xml \
xml_help.txt
Modified: trunk/docs/xml/manual-core.xml
===================================================================
--- trunk/docs/xml/manual-core.xml 2007-11-22 01:07:57 UTC (rev 7198)
+++ trunk/docs/xml/manual-core.xml 2007-11-22 01:21:56 UTC (rev 7199)
@@ -119,7 +119,7 @@
chances of false positives or false negatives from Memcheck. Also, you
should compile your code with <computeroutput>-Wall</computeroutput> because
it can identify some or all of the problems that Valgrind can miss at the
-higher optimisations levels. (Using <computeroutput>-Wall</computeroutput>
+higher optimisation levels. (Using <computeroutput>-Wall</computeroutput>
is also a good idea in general.) All other tools (as far as we know) are
unaffected by optimisation level.</para>
@@ -657,6 +657,25 @@
</listitem>
</varlistentry>
+ <varlistentry id="opt.child-silent-after-fork"
+ xreflabel="--child-silent-after-fork">
+ <term>
+ <option><![CDATA[--child-silent-after-fork=<yes|no> [default: no] ]]></option>
+ </term>
+ <listitem>
+ <para>When enabled, Valgrind will not show any debugging or
+ logging output for the child process resulting from
+ a <varname>fork</varname> call. This can make the output less
+ confusing (although more misleading) when dealing with processes
+ that create children. It is particularly useful in conjunction
+ with <varname>--trace-children=</varname>. Use of this flag is also
+ strongly recommended if you are requesting XML output
+ (<varname>--xml=yes</varname>), since otherwise the XML from child and
+ parent may become mixed up, which usually makes it useless.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="opt.track-fds" xreflabel="--track-fds">
<term>
<option><![CDATA[--track-fds=<yes|no> [default: no] ]]></option>
@@ -988,6 +1007,10 @@
process to be debugged and each instance of <literal>%f</literal>
expands to the path to the executable for the process to be
debugged.</para>
+
+ <para>Since <computeroutput><command></computeroutput> is likely
+ to contain spaces, you will need to put this entire flag in
+ quotes to ensure it is correctly handled by the shell.</para>
</listitem>
</varlistentry>
@@ -1273,6 +1296,517 @@
</sect1>
+
+
+<sect1 id="manual-core.pthreads" xreflabel="Support for Threads">
+<title>Support for Threads</title>
+
+<para>Valgrind supports programs which use POSIX pthreads.
+Getting this to work was technically challenging but it now works
+well enough for significant threaded applications to run.</para>
+
+<para>The main thing to point out is that although Valgrind works
+with the standard Linux threads library (eg. NPTL or LinuxThreads), it
+serialises execution so that only one thread is running at a time. This
+approach avoids the horrible implementation problems of implementing a
+truly multiprocessor version of Valgrind, but it does mean that threaded
+apps run only on one CPU, even if you have a multiprocessor
+machine.</para>
+
+<para>Valgrind schedules your program's threads in a round-robin fashion,
+with all threads having equal priority. It switches threads
+every 100000 basic blocks (on x86, typically around 600000
+instructions), which means you'll get a much finer interleaving
+of thread executions than when run natively. This in itself may
+cause your program to behave differently if you have some kind of
+concurrency, critical race, locking, or similar, bugs. In that case
+you might consider using Valgrind's Helgrind tool to track them down.</para>
+
+<para>Your program will use the native
+<computeroutput>libpthread</computeroutput>, but not all of its facilities
+will work. In particular, synchronisation of processes via shared-memory
+segments will not work. This relies on special atomic instruction sequences
+which Valgrind does not emulate in a way which works between processes.
+Unfortunately there's no way for Valgrind to warn when this is happening,
+and such calls will mostly work. Only when there's a race will
+it fail.
+</para>
+
+<para>Valgrind also supports direct use of the
+<computeroutput>clone()</computeroutput> system call,
+<computeroutput>futex()</computeroutput> and so on.
+<computeroutput>clone()</computeroutput> is supported where either
+everything is shared (a thread) or nothing is shared (fork-like); partial
+sharing will fail. Again, any use of atomic instruction sequences in shared
+memory between processes will not work reliably.
+</para>
+
+
+</sect1>
+
+<sect1 id="manual-core.signals" xreflabel="Handling of Signals">
+<title>Handling of Signals</title>
+
+<para>Valgrind has a fairly complete signal implementation. It should be
+able to cope with any POSIX-compliant use of signals.</para>
+
+<para>If you're using signals in clever ways (for example, catching
+SIGSEGV, modifying page state and restarting the instruction), you're
+probably relying on precise exceptions. In this case, you will need
+to use <computeroutput>--vex-iropt-precise-memory-exns=yes</computeroutput>.
+</para>
+
+<para>If your program dies as a result of a fatal core-dumping signal,
+Valgrind will generate its own core file
+(<computeroutput>vgcore.NNNNN</computeroutput>) containing your program's
+state. You may use this core file for post-mortem debugging with gdb or
+similar. (Note: it will not generate a core if your core dump size limit is
+0.) At the time of writing the core dumps do not include all the floating
+point register information.</para>
+
+<para>In the unlikely event that Valgrind itself crashes, the operating system
+will create a core dump in the usual way.</para>
+
+</sect1>
+
+
+
+
+
+
+
+
+<sect1 id="manual-core.install" xreflabel="Building and Installing">
+<title>Building and Installing Valgrind</title>
+
+<para>We use the standard Unix
+<computeroutput>./configure</computeroutput>,
+<computeroutput>make</computeroutput>, <computeroutput>make
+install</computeroutput> mechanism, and we have attempted to
+ensure that it works on machines with kernel 2.4 or 2.6 and glibc
+2.2.X to 2.5.X. Once you have completed
+<computeroutput>make install</computeroutput> you may then want
+to run the regression tests
+with <computeroutput>make regtest</computeroutput>.
+</para>
+
+<para>There are five options (in addition to the usual
+<option>--prefix=</option> which affect how Valgrind is built:
+<itemizedlist>
+
+ <listitem>
+ <para><option>--enable-inner</option></para>
+ <para>This builds Valgrind with some special magic hacks which make
+ it possible to run it on a standard build of Valgrind (what the
+ developers call "self-hosting"). Ordinarily you should not use
+ this flag as various kinds of safety checks are disabled.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para><option>--enable-tls</option></para>
+ <para>TLS (Thread Local Storage) is a relatively new mechanism which
+ requires compiler, linker and kernel support. Valgrind tries to
+ automatically test if TLS is supported and if so enables this option.
+ Sometimes it cannot test for TLS, so this option allows you to
+ override the automatic test.</para>
+ </listitem>
+
+ <listitem>
+ <para><option>--with-vex=</option></para>
+ <para>Specifies the path to the underlying VEX dynamic-translation
+ library. By default this is taken to be in the VEX directory off
+ the root of the source tree.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para><option>--enable-only64bit</option></para>
+ <para><option>--enable-only32bit</option></para>
+ <para>On 64-bit
+ platforms (amd64-linux, ppc64-linux), Valgrind is by default built
+ in such a way that both 32-bit and 64-bit executables can be run.
+ Sometimes this cleverness is a problem for a variety of reasons.
+ These two flags allow for single-target builds in this situation.
+ If you issue both, the configure script will complain. Note they
+ are ignored on 32-bit-only platforms (x86-linux, ppc32-linux).
+ </para>
+ </listitem>
+
+</itemizedlist>
+</para>
+
+<para>The <computeroutput>configure</computeroutput> script tests
+the version of the X server currently indicated by the current
+<computeroutput>$DISPLAY</computeroutput>. This is a known bug.
+The intention was to detect the version of the current X
+client libraries, so that correct suppressions could be selected
+for them, but instead the test checks the server version. This
+is just plain wrong.</para>
+
+<para>If you are building a binary package of Valgrind for
+distribution, please read <literal>README_PACKAGERS</literal>
+<xref linkend="dist.readme-packagers"/>. It contains some
+important information.</para>
+
+<para>Apart from that, there's not much excitement here. Let us
+know if you have build problems.</para>
+
+</sect1>
+
+
+
+<sect1 id="manual-core.problems" xreflabel="If You Have Problems">
+<title>If You Have Problems</title>
+
+<para>Contact us at <ulink url="&vg-url;">&vg-url;</ulink>.</para>
+
+<para>See <xref linkend="manual-core.limits"/> for the known
+limitations of Valgrind, and for a list of programs which are
+known not to work on it.</para>
+
+<para>All parts of the system make heavy use of assertions and
+internal self-checks. They are permanently enabled, and we have no
+plans to disable them. If one of them breaks, please mail us!</para>
+
+<para>If you get an assertion failure
+in <filename>m_mallocfree.c</filename>, this may have happened because
+your program wrote off the end of a malloc'd block, or before its
+beginning. Valgrind hopefully will have emitted a proper message to that
+effect before dying in this way. This is a known problem which
+we should fix.</para>
+
+<para>Read the <xref linkend="FAQ"/> for more advice about common problems,
+crashes, etc.</para>
+
+</sect1>
+
+
+
+<sect1 id="manual-core.limits" xreflabel="Limitations">
+<title>Limitations</title>
+
+<para>The following list of limitations seems long. However, most
+programs actually work fine.</para>
+
+<para>Valgrind will run Linux ELF binaries, on a kernel 2.4.X or 2.6.X
+system, on the x86, amd64, ppc32 and ppc64 architectures, subject to the
+following constraints:</para>
+
+ <itemizedlist>
+ <listitem>
+ <para>On x86 and amd64, there is no support for 3DNow! instructions.
+ If the translator encounters these, Valgrind will generate a SIGILL
+ when the instruction is executed. Apart from that, on x86 and amd64,
+ essentially all instructions are supported, up to and including SSE3.
+ </para>
+
+ <para>On ppc32 and ppc64, almost all integer, floating point and Altivec
+ instructions are supported. Specifically: integer and FP insns that are
+ mandatory for PowerPC, the "General-purpose optional" group (fsqrt, fsqrts,
+ stfiwx), the "Graphics optional" group (fre, fres, frsqrte, frsqrtes), and
+ the Altivec (also known as VMX) SIMD instruction set, are supported.</para>
+ </listitem>
+
+ <listitem>
+ <para>Atomic instruction sequences are not properly supported, in the
+ sense that their atomicity is not preserved. This will affect any
+ use of synchronization via memory shared between processes. They
+ will appear to work, but fail sporadically.</para>
+ </listitem>
+
+ <listitem>
+ <para>If your program does its own memory management, rather than
+ using malloc/new/free/delete, it should still work, but Memcheck's
+ error checking won't be so effective. If you describe your program's
+ memory management scheme using "client requests"
+ (see <xref linkend="manual-core.clientreq"/>), Memcheck can do
+ better. Nevertheless, using malloc/new and free/delete is still the
+ best approach.</para>
+ </listitem>
+
+ <listitem>
+ <para>Valgrind's signal simulation is not as robust as it could be.
+ Basic POSIX-compliant sigaction and sigprocmask functionality is
+ supplied, but it's conceivable that things could go badly awry if you
+ do weird things with signals. Workaround: don't. Programs that do
+ non-POSIX signal tricks are in any case inherently unportable, so
+ should be avoided if possible.</para>
+ </listitem>
+
+ <listitem>
+ <para>Machine instructions, and system calls, have been implemented
+ on demand. So it's possible, although unlikely, that a program will
+ fall over with a message to that effect. If this happens, please
+ report all the details printed out, so we can try and implement the
+ missing feature.</para>
+ </listitem>
+
+ <listitem>
+ <para>Memory consumption of your program is majorly increased whilst
+ running under Valgrind. This is due to the large amount of
+ administrative information maintained behind the scenes. Another
+ cause is that Valgrind dynamically translates the original
+ executable. Translated, instrumented code is 12-18 times larger than
+ the original so you can easily end up with 50+ MB of translations
+ when running (eg) a web browser.</para>
+ </listitem>
+
+ <listitem>
+ <para>Valgrind can handle dynamically-generated code just fine. If
+ you regenerate code over the top of old code (ie. at the same memory
+ addresses), if the code is on the stack Valgrind will realise the
+ code has changed, and work correctly. This is necessary to handle
+ the trampolines GCC uses to implemented nested functions. If you
+ regenerate code somewhere other than the stack, you will need to use
+ the <option>--smc-check=all</option> flag, and Valgrind will run more
+ slowly than normal.</para>
+ </listitem>
+
+ <listitem>
+ <para>As of version 3.0.0, Valgrind has the following limitations
+ in its implementation of x86/AMD64 floating point relative to
+ IEEE754.</para>
+
+ <para>Precision: There is no support for 80 bit arithmetic.
+ Internally, Valgrind represents all such "long double" numbers in 64
+ bits, and so there may be some differences in results. Whether or
+ not this is critical remains to be seen. Note, the x86/amd64
+ fldt/fstpt instructions (read/write 80-bit numbers) are correctly
+ simulated, using conversions to/from 64 bits, so that in-memory
+ images of 80-bit numbers look correct if anyone wants to see.</para>
+
+ <para>The impression observed from many FP regression tests is that
+ the accuracy differences aren't significant. Generally speaking, if
+ a program relies on 80-bit precision, there may be difficulties
+ porting it to non x86/amd64 platforms which only support 64-bit FP
+ precision. Even on x86/amd64, the program may get different results
+ depending on whether it is compiled to use SSE2 instructions (64-bits
+ only), or x87 instructions (80-bit). The net effect is to make FP
+ programs behave as if they had been run on a machine with 64-bit IEEE
+ floats, for example PowerPC. On amd64 FP arithmetic is done by
+ default on SSE2, so amd64 looks more like PowerPC than x86 from an FP
+ perspective, and there are far fewer noticeable accuracy differences
+ than with x86.</para>
+
+ <para>Rounding: Valgrind does observe the 4 IEEE-mandated rounding
+ modes (to nearest, to +infinity, to -infinity, to zero) for the
+ following conversions: float to integer, integer to float where
+ there is a possibility of loss of precision, and float-to-float
+ rounding. For all other FP operations, only the IEEE default mode
+ (round to nearest) is supported.</para>
+
+ <para>Numeric exceptions in FP code: IEEE754 defines five types of
+ numeric exception that can happen: invalid operation (sqrt of
+ negative number, etc), division by zero, overflow, underflow,
+ inexact (loss of precision).</para>
+
+ <para>For each exception, two courses of action are defined by IEEE754:
+ either (1) a user-defined exception handler may be called, or (2) a
+ default action is defined, which "fixes things up" and allows the
+ computation to proceed without throwing an exception.</para>
+
+ <para>Currently Valgrind only supports the default fixup actions.
+ Again, feedback on the importance of exception support would be
+ appreciated.</para>
+
+ <para>When Valgrind detects that the program is trying to exceed any
+ of these limitations (setting exception handlers, rounding mode, or
+ precision control), it can print a message giving a traceback of
+ where this has happened, and continue execution. This behaviour used
+ to be the default, but the messages are annoying and so showing them
+ is now disabled by default. Use <option>--show-emwarns=yes</option> to see
+ them.</para>
+
+ <para>The above limitations define precisely the IEEE754 'default'
+ behaviour: default fixup on all exceptions, round-to-nearest
+ operations, and 64-bit precision.</para>
+ </listitem>
+
+ <listitem>
+ <para>As of version 3.0.0, Valgrind has the following limitations in
+ its implementation of x86/AMD64 SSE2 FP arithmetic, relative to
+ IEEE754.</para>
+
+ <para>Essentially the same: no exceptions, and limited observance of
+ rounding mode. Also, SSE2 has control bits which make it treat
+ denormalised numbers as zero (DAZ) and a related action, flush
+ denormals to zero (FTZ). Both of these cause SSE2 arithmetic to be
+ less accurate than IEEE requires. Valgrind detects, ignores, and can
+ warn about, attempts to enable either mode.</para>
+ </listitem>
+
+ <listitem>
+ <para>As of version 3.2.0, Valgrind has the following limitations
+ in its implementation of PPC32 and PPC64 floating point
+ arithmetic, relative to IEEE754.</para>
+
+ <para>Scalar (non-Altivec): Valgrind provides a bit-exact emulation of
+ all floating point instructions, except for "fre" and "fres", which are
+ done more precisely than required by the PowerPC architecture specification.
+ All floating point operations observe the current rounding mode.
+ </para>
+
+ <para>However, fpscr[FPRF] is not set after each operation. That could
+ be done but would give measurable performance overheads, and so far
+ no need for it has been found.</para>
+
+ <para>As on x86/AMD64, IEEE754 exceptions are not supported: all floating
+ point exceptions are handled using the default IEEE fixup actions.
+ Valgrind detects, ignores, and can warn about, attempts to unmask
+ the 5 IEEE FP exception kinds by writing to the floating-point status
+ and control register (fpscr).
+ </para>
+
+ <para>Vector (Altivec, VMX): essentially as with x86/AMD64 SSE/SSE2:
+ no exceptions, and limited observance of rounding mode.
+ For Altivec, FP arithmetic
+ is done in IEEE/Java mode, which is more accurate than the Linux default
+ setting. "More accurate" means that denormals are handled properly,
+ rather than simply being flushed to zero.</para>
+ </listitem>
+ </itemizedlist>
+
+ <para>Programs which are known not to work are:</para>
+ <itemizedlist>
+ <listitem>
+ <para>emacs starts up but immediately concludes it is out of
+ memory and aborts. It may be that Memcheck does not provide
+ a good enough emulation of the
+ <computeroutput>mallinfo</computeroutput> function.
+ Emacs works fine if you build it to use
+ the standard malloc/free routines.</para>
+ </listitem>
+ </itemizedlist>
+
+</sect1>
+
+
+<sect1 id="manual-core.example" xreflabel="An Example Run">
+<title>An Example Run</title>
+
+<para>This is the log for a run of a small program using Memcheck.
+The program is in fact correct, and the reported error is as the
+result of a potentially serious code generation bug in GNU g++
+(snapshot 20010527).</para>
+
+<programlisting><![CDATA[
+sewardj@phoenix:~/newmat10$ ~/Valgrind-6/valgrind -v ./bogon
+==25832== Valgrind 0.10, a memory error detector for x86 RedHat 7.1.
+==25832== Copyright (C) 2000-2001, and GNU GPL'd, by Julian Seward.
+==25832== Startup, with flags:
+==25832== --suppressions=/home/sewardj/Valgrind/redhat71.supp
+==25832== reading syms from /lib/ld-linux.so.2
+==25832== reading syms from /lib/libc.so.6
+==25832== reading syms from /mnt/pima/jrs/Inst/lib/libgcc_s.so.0
+==25832== reading syms from /lib/libm.so.6
+==25832== reading syms from /mnt/pima/jrs/Inst/lib/libstdc++.so.3
+==25832== reading syms from /home/sewardj/Valgrind/valgrind.so
+==25832== reading syms from /proc/self/exe
+==25832==
+==25832== Invalid read of size 4
+==25832== at 0x8048724: BandMatrix::ReSize(int,int,int) (bogon.cpp:45)
+==25832== by 0x80487AF: main (bogon.cpp:66)
+==25832== Address 0xBFFFF74C is not stack'd, malloc'd or free'd
+==25832==
+==25832== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
+==25832== malloc/free: in use at exit: 0 bytes in 0 blocks.
+==25832== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
+==25832== For a detailed leak analysis, rerun with: --leak-check=yes
+]]></programlisting>
+
+<para>The GCC folks fixed this about a week before gcc-3.0
+shipped.</para>
+
+</sect1>
+
+
+<sect1 id="manual-core.warnings" xreflabel="Warning Messages">
+<title>Warning Messages You Might See</title>
+
+<para>Most of these only appear if you run in verbose mode
+(enabled by <computeroutput>-v</computeroutput>):</para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para><computeroutput>More than 100 errors detected. Subsequent
+ errors will still be recorded, but in less detail than
+ before.</computeroutput></para>
+
+ <para>After 100 different errors have been shown, Valgrind becomes
+ more conservative about collecting them. It then requires only the
+ program counters in the top two stack frames to match when deciding
+ whether or not two errors are really the same one. Prior to this
+ point, the PCs in the top four frames are required to match. This
+ hack has the effect of slowing down the appearance of new errors
+ after the first 100. The 100 constant can be changed by recompiling
+ Valgrind.</para>
+ </listitem>
+
+ <listitem>
+ <para><computeroutput>More than 1000 errors detected. I'm not
+ reporting any more. Final error counts may be inaccurate. Go fix
+ your program!</computeroutput></para>
+
+ <para>After 1000 different errors have been detected, Valgrind
+ ignores any more. It seems unlikely that collecting even more
+ different ones would be of practical help to anybody, and it avoids
+ the danger that Valgrind spends more and more of its time comparing
+ new errors against an ever-growing collection. As above, the 1000
+ number is a compile-time constant.</para>
+ </listitem>
+
+ <listitem>
+ <para><computeroutput>Warning: client switching stacks?</computeroutput></para>
+
+ <para>Valgrind spotted such a large change in the stack pointer
+ that it guesses the client is switching to
+ a different stack. At this point it makes a kludgey guess where the
+ base of the new stack is, and sets memory permissions accordingly.
+ You may get many bogus error messages following this, if Valgrind
+ guesses wrong. At the moment "large change" is defined as a change
+ of more that 2000000 in the value of the
+ stack pointer register.</para>
+ </listitem>
+
+ <listitem>
+ <para><computeroutput>Warning: client attempted to close Valgrind's
+ logfile fd <number></computeroutput></para>
+
+ <para>Valgrind doesn't allow the client to close the logfile,
+ because you'd never see any diagnostic information after that point.
+ If you see this message, you may want to use the
+ <option>--log-fd=<number></option> option to specify a
+ different logfile file-descriptor number.</para>
+ </listitem>
+
+ <listitem>
+ <para><computeroutput>Warning: noted but unhandled ioctl
+ <number></computeroutput></para>
+
+ <para>Valgrind observed a call to one of the vast family of
+ <computeroutput>ioctl</computeroutput> system calls, but did not
+ modify its memory status info (because nobody has yet written a
+ suitable wrapper). The call will still have gone through, but you may get
+ spurious errors after this as a result of the non-update of the
+ memory info.</para>
+ </listitem>
+
+ <listitem>
+ <para><computeroutput>Warning: set address range perms: large range
+ <number></computeroutput></para>
+
+ <para>Diagnostic message, mostly for benefit of the Valgrind
+ developers, to do with memory permissions.</para>
+ </listitem>
+
+ </itemizedlist>
+
+</sect1>
+
+
+
<sect1 id="manual-core.clientreq"
xreflabel="The Client Request mechanism">
<title>The Client Request mechanism</title>
@@ -1523,78 +2057,8 @@
-<sect1 id="manual-core.pthreads" xreflabel="Support for Threads">
-<title>Support for Threads</title>
-<para>Valgrind supports programs which use POSIX pthreads.
-Getting this to work was technically challenging but it now works
-well enough for significant threaded applications to work.</para>
-<para>The main thing to point out is that although Valgrind works
-with the standard Linux threads library (eg. NPTL or LinuxThreads), it
-serialises execution so that only one thread is running at a time. This
-approach avoids the horrible implementation problems of implementing a
-truly multiprocessor version of Valgrind, but it does mean that threaded
-apps run only on one CPU, even if you have a multiprocessor
-machine.</para>
-
-<para>Valgrind schedules your program's threads in a round-robin fashion,
-with all threads having equal priority. It switches threads
-every 100000 basic blocks (on x86, typically around 600000
-instructions), which means you'll get a much finer interleaving
-of thread executions than when run natively. This in itself may
-cause your program to behave differently if you have some kind of
-concurrency, critical race, locking, or similar, bugs.</para>
-
-<para>Your program will use the native
-<computeroutput>libpthread</computeroutput>, but not all of its facilities
-will work. In particular, synchronisation of processes via shared-memory
-segments will not work. This relies on special atomic instruction sequences
-which Valgrind does not emulate in a way which works between processes.
-Unfortunately there's no way for Valgrind to warn when this is happening,
-and such calls will mostly work. Only when there's a race will
-it fail.
-</para>
-
-<para>Valgrind also supports direct use of the
-<computeroutput>clone()</computeroutput> system call,
-<computeroutput>futex()</computeroutput> and so on.
-<computeroutput>clone()</computeroutput> is supported where either
-everything is shared (a thread) or nothing is shared (fork-like); partial
-sharing will fail. Again, any use of atomic instruction sequences in shared
-memory between processes will not work reliably.
-</para>
-
-
-</sect1>
-
-<sect1 id="manual-core.signals" xreflabel="Handling of Signals">
-<title>Handling of Signals</title>
-
-<para>Valgrind has a fairly complete signal implementation. It should be
-able to cope with any POSIX-compliant use of signals.</para>
-
-<para>If you're using signals in clever ways (for example, catching
-SIGSEGV, modifying page state and restarting the instruction), you're
-probably relying on precise exceptions. In this case, you will need
-to use <computeroutput>--vex-iropt-precise-memory-exns=yes</computeroutput>.
-</para>
-
-<para>If your program dies as a result of a fatal core-dumping signal,
-Valgrind will generate its own core file
-(<computeroutput>vgcore.NNNNN</computeroutput>) containing your program's
-state. You may use this core file for post-mortem debugging with gdb or
-similar. (Note: it will not generate a core if your core dump size limit is
-0.) At the time of writing the core dumps do not include all the floating
-point register information.</para>
-
-<para>In the unlikely event that Valgrind itself crashes, the operating system
-will create a core dump in the usual way.</para>
-
-</sect1>
-
-
-
<sect1 id="manual-core.wrapping" xreflabel="Function Wrapping">
<title>Function wrapping</title>
@@ -1987,811 +2451,5 @@
-<sect1 id="manual-core.install" xreflabel="Building and Installing">
-<title>Building and Installing Valgrind</title>
-<para>We use the standard Unix
-<computeroutput>./configure</computeroutput>,
-<computeroutput>make</computeroutput>, <computeroutput>make
-install</computeroutput> mechanism, and we have attempted to
-ensure that it works on machines with kernel 2.4 or 2.6 and glibc
-2.2.X to 2.5.X. Once you have completed
-<computeroutput>make install</computeroutput> you may then want
-to run the regression tests
-with <computeroutput>make regtest</computeroutput>.
-</para>
-
-<para>There are five options (in addition to the usual
-<option>--prefix=</option> which affect how Valgrind is built:
-<itemizedlist>
-
- <listitem>
- <para><option>--enable-inner</option></para>
- <para>This builds Valgrind with some special magic hacks which make
- it possible to run it on a standard build of Valgrind (what the
- developers call "self-hosting"). Ordinarily you should not use
- this flag as various kinds of safety checks are disabled.
- </para>
- </listitem>
-
- <listitem>
- <para><option>--enable-tls</option></para>
- <para>TLS (Thread Local Storage) is a relatively new mechanism which
- requires compiler, linker and kernel support. Valgrind tries to
- automatically test if TLS is supported and if so enables this option.
- Sometimes it cannot test for TLS, so this option allows you to
- override the automatic test.</para>
- </listitem>
-
- <listitem>
- <para><option>--with-vex=</option></para>
- <para>Specifies the path to the underlying VEX dynamic-translation
- library. By default this is taken to be in the VEX directory off
- the root of the source tree.
- </para>
- </listitem>
-
- <listitem>
- <para><option>--enable-only64bit</option></para>
- <para><option>--enable-only32bit</option></para>
- <para>On 64-bit
- platforms (amd64-linux, ppc64-linux), Valgrind is by default built
- in such a way that both 32-bit and 64-bit executables can be run.
- Sometimes this cleverness is a problem for a variety of reasons.
- These two flags allow for single-target builds in this situation.
- If you issue both, the configure script will complain. Note they
- are ignored on 32-bit-only platforms (x86-linux, ppc32-linux).
- </para>
- </listitem>
-
-</itemizedlist>
-</para>
-
-<para>The <computeroutput>configure</computeroutput> script tests
-the version of the X server currently indicated by the current
-<computeroutput>$DISPLAY</computeroutput>. This is a known bug.
-The intention was to detect the version of the current X
-client libraries, so that correct suppressions could be selected
-for them, but instead the test checks the server version. This
-is just plain wrong.</para>
-
-<para>If you are building a binary package of Valgrind for
-distribution, please read <literal>README_PACKAGERS</literal>
-<xref linkend="dist.readme-packagers"/>. It contains some
-important information.</para>
-
-<para>Apart from that, there's not much excitement here. Let us
-know if you have build problems.</para>
-
-</sect1>
-
-
-
-<sect1 id="manual-core.problems" xreflabel="If You Have Problems">
-<title>If You Have Problems</title>
-
-<para>Contact us at <ulink url="&vg-url;">&vg-url;</ulink>.</para>
-
-<para>See <xref linkend="manual-core.limits"/> for the known
-limitations of Valgrind, and for a list of programs which are
-known not to work on it.</para>
-
-<para>All parts of the system make heavy use of assertions and
-internal self-checks. They are permanently enabled, and we have no
-plans to disable them. If one of them breaks, please mail us!</para>
-
-<para>If you get an assertion failure
-in <filename>m_mallocfree.c</filename>, this may have happened because
-your program wrote off the end of a malloc'd block, or before its
-beginning. Valgrind hopefully will have emitted a proper message to that
-effect before dying in this way. This is a known problem which
-we should fix.</para>
-
-<para>Read the <xref linkend="FAQ"/> for more advice about common problems,
-crashes, etc.</para>
-
-</sect1>
-
-
-
-<sect1 id="manual-core.limits" xreflabel="Limitations">
-<title>Limitations</title>
-
-<para>The following list of limitations seems long. However, most
-programs actually work fine.</para>
-
-<para>Valgrind will run Linux ELF binaries, on a kernel 2.4.X or 2.6.X
-system, on the x86, amd64, ppc32 and ppc64 architectures, subject to the
-following constraints:</para>
-
- <itemizedlist>
- <listitem>
- <para>On x86 and amd64, there is no support for 3DNow! instructions.
- If the translator encounters these, Valgrind will generate a SIGILL
- when the instruction is executed. Apart from that, on x86 and amd64,
- essentially all instructions are supported, up to and including SSE3.
- </para>
-
- <para>On ppc32 and ppc64, almost all integer, floating point and Altivec
- instructions are supported. Specifically: integer and FP insns that are
- mandatory for PowerPC, the "General-purpose optional" group (fsqrt, fsqrts,
- stfiwx), the "Graphics optional" group (fre, fres, frsqrte, frsqrtes), and
- the Altivec (also known as VMX) SIMD instruction set, are supported.</para>
- </listitem>
-
- <listitem>
- <para>Atomic instruction sequences are not properly supported, in the
- sense that their atomicity is not preserved. This will affect any
- use of synchronization via memory shared between processes. They
- will appear to work, but fail sporadically.</para>
- </listitem>
-
- <listitem>
- <para>If your program does its own memory management, rather than
- using malloc/new/free/delete, it should still work, but Valgrind's
- error checking won't be so effective. If you describe your program's
- memory management scheme using "client requests"
- (see <xref linkend="manual-core.clientreq"/>), Memcheck can do
- better. Nevertheless, using malloc/new and free/delete is still the
- best approach.</para>
- </listitem>
-
- <listitem>
- <para>Valgrind's signal simulation is not as robust as it could be.
- Basic POSIX-compliant sigaction and sigprocmask functionality is
- supplied, but it's conceivable that things could go badly awry if you
- do weird things with signals. Workaround: don't. Programs that do
- non-POSIX signal tricks are in any case inherently unportable, so
- should be avoided if possible.</para>
- </listitem>
-
- <listitem>
- <para>Machine instructions, and system calls, have been implemented
- on demand. So it's possible, although unlikely, that a program will
- fall over with a message to that effect. If this happens, please
- report all the details printed out, so we can try and implement the
- missing feature.</para>
- </listitem>
-
- <listitem>
- <para>Memory consumption of your program is majorly increased whilst
- running under Valgrind. This is due to the large amount of
- administrative information maintained behind the scenes. Another
- cause is that Valgrind dynamically translates the original
- executable. Translated, instrumented code is 12-18 times larger than
- the original so you can easily end up with 50+ MB of translations
- when running (eg) a web browser.</para>
- </listitem>
-
- <listitem>
- <para>Valgrind can handle dynamically-generated code just fine. If
- you regenerate code over the top of old code (ie. at the same memory
- addresses), if the code is on the stack Valgrind will realise the
- code has changed, and work correctly. This is necessary to handle
- the trampolines GCC uses to implemented nested functions. If you
- regenerate code somewhere other than the stack, you will need to use
- the <option>--smc-check=all</option> flag, and Valgrind will run more
- slowly than normal.</para>
- </listitem>
-
- <listitem>
- <para>As of version 3.0.0, Valgrind has the following limitations
- in its implementation of x86/AMD64 floating point relative to
- IEEE754.</para>
-
- <para>Precision: There is no support for 80 bit arithmetic.
- Internally, Valgrind represents all such "long double" numbers in 64
- bits, and so there may be some differences in results. Whether or
- not this is critical remains to be seen. Note, the x86/amd64
- fldt/fstpt instructions (read/write 80-bit numbers) are correctly
- simulated, using conversions to/from 64 bits, so that in-memory
- images of 80-bit numbers look correct if anyone wants to see.</para>
-
- <para>The impression observed from many FP regression tests is that
- the accuracy differences aren't significant. Generally speaking, if
- a program relies on 80-bit precision, there may be difficulties
- porting it to non x86/amd64 platforms which only support 64-bit FP
- precision. Even on x86/amd64, the program may get different results
- depending on whether it is compiled to use SSE2 instructions (64-bits
- only), or x87 instructions (80-bit). The net effect is to make FP
- programs behave as if they had been run on a machine with 64-bit IEEE
- floats, for example PowerPC. On amd64 FP arithmetic is done by
- default on SSE2, so amd64 looks more like PowerPC than x86 from an FP
- perspective, and there are far fewer noticeable accuracy differences
- than with x86.</para>
-
- <para>Rounding: Valgrind does observe the 4 IEEE-mandated rounding
- modes (to nearest, to +infinity, to -infinity, to zero) for the
- following conversions: float to integer, integer to float where
- there is a possibility of loss of precision, and float-to-float
- rounding. For all other FP operations, only the IEEE default mode
- (round to nearest) is supported.</para>
-
- <para>Numeric exceptions in FP code: IEEE754 defines five types of
- numeric exception that can happen: invalid operation (sqrt of
- negative number, etc), division by zero, overflow, underflow,
- inexact (loss of precision).</para>
-
- <para>For each exception, two courses of action are defined by IEEE754:
- either (1) a user-defined exception handler may be called, or (2) a
- default action is defined, which "fixes things up" and allows the
- computation to proceed without throwing an exception.</para>
-
- <para>Currently Valgrind only supports the default fixup actions.
- Again, feedback on the importance of exception support would be
- appreciated.</para>
-
- <para>When Valgrind detects that the program is trying to exceed any
- of these limitations (setting exception handlers, rounding mode, or
- precision control), it can print a message giving a traceback of
- where this has happened, and continue execution. This behaviour used
- to be the default, but the messages are annoying and so showing them
- is now disabled by default. Use <option>--show-emwarns=yes</option> to see
- them.</para>
-
- <para>The above limitations define precisely the IEEE754 'default'
- behaviour: default fixup on all exceptions, round-to-nearest
- operations, and 64-bit precision.</para>
- </listitem>
-
- <listitem>
- <para>As of version 3.0.0, Valgrind has the following limitations in
- its implementation of x86/AMD64 SSE2 FP arithmetic, relative to
- IEEE754.</para>
-
- <para>Essentially the same: no exceptions, and limited observance of
- rounding mode. Also, SSE2 has control bits which make it treat
- denormalised numbers as zero (DAZ) and a related action, flush
- denormals to zero (FTZ). Both of these cause SSE2 arithmetic to be
- less accurate than IEEE requires. Valgrind detects, ignores, and can
- warn about, attempts to enable either mode.</para>
- </listitem>
-
- <listitem>
- <para>As of version 3.2.0, Valgrind has the following limitations
- in its implementation of PPC32 and PPC64 floating point
- arithmetic, relative to IEEE754.</para>
-
- <para>Scalar (non-Altivec): Valgrind provides a bit-exact emulation of
- all floating point instructions, except for "fre" and "fres", which are
- done more precisely than required by the PowerPC architecture specification.
- All floating point operations observe the current rounding mode.
- </para>
-
- <para>However, fpscr[FPRF] is not set after each operation. That could
- be done but would give measurable performance overheads, and so far
- no need for it has been found.</para>
-
- <para>As on x86/AMD64, IEEE754 exceptions are not supported: all floating
- point exceptions are handled using the default IEEE fixup actions.
- Valgrind detects, ignores, and can warn about, attempts to unmask
- the 5 IEEE FP exception kinds by writing to the floating-point status
- and control register (fpscr).
- </para>
-
- <para>Vector (Altivec, VMX): essentially as with x86/AMD64 SSE/SSE2:
- no exceptions, and limited observance of rounding mode.
- For Altivec, FP arithmetic
- is done in IEEE/Java mode, which is more accurate than the Linux default
- setting. "More accurate" means that denormals are handled properly,
- rather than simply being flushed to zero.</para>
- </listitem>
- </itemizedlist>
-
- <para>Programs which are known not to work are:</para>
- <itemizedlist>
- <listitem>
- <para>emacs starts up but immediately concludes it is out of
- memory and aborts. It may be that Memcheck does not provide
- a good enough emulation of the
- <computeroutput>mallinfo</computeroutput> function.
- Emacs works fine if you build it to use
- the standard malloc/free routines.</para>
- </listitem>
- </itemizedlist>
-
-</sect1>
-
-
-<sect1 id="manual-core.example" xreflabel="An Example Run">
-<title>An Example Run</title>
-
-<para>This is the log for a run of a small program using Memcheck.
-The program is in fact correct, and the reported error is as the
-result of a potentially serious code generation bug in GNU g++
-(snapshot 20010527).</para>
-
-<programlisting><![CDATA[
-sewardj@phoenix:~/newmat10$ ~/Valgrind-6/valgrind -v ./bogon
-==25832== Valgrind 0.10, a memory error detector for x86 RedHat 7.1.
-==25832== Copyright (C) 2000-2001, and GNU GPL'd, by Julian Seward.
-==25832== Startup, with flags:
-==25832== --suppressions=/home/sewardj/Valgrind/redhat71.supp
-==25832== reading syms from /lib/ld-linux.so.2
-==25832== reading syms from /lib/libc.so.6
-==25832== reading syms from /mnt/pima/jrs/Inst/lib/libgcc_s.so.0
-==25832== reading syms from /lib/libm.so.6
-==25832== reading syms from /mnt/pima/jrs/Inst/lib/libstdc++.so.3
-==25832== reading syms from /home/sewardj/Valgrind/valgrind.so
-==25832== reading syms from /proc/self/exe
-==25832==
-==25832== Invalid read of size 4
-==25832== at 0x8048724: _ZN10BandMatrix6ReSizeEiii (bogon.cpp:45)
-==25832== by 0x80487AF: main (bogon.cpp:66)
-==25832== Address 0xBFFFF74C is not stack'd, malloc'd or free'd
-==25832==
-==25832== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
-==25832== malloc/free: in use at exit: 0 bytes in 0 blocks.
-==25832== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
-==25832== For a detailed leak analysis, rerun with: --leak-check=yes
-==25832==
-==25832== exiting, did 1881 basic blocks, 0 misses.
-==25832== 223 translations, 3626 bytes in, 56801 bytes out.]]></programlisting>
-
-<para>The GCC folks fixed this about a week before gcc-3.0
-shipped.</para>
-
-</sect1>
-
-
-<sect1 id="manual-core.warnings" xreflabel="Warning Messages">
-<title>Warning Messages You Might See</title>
-
-<para>Most of these only appear if you run in verbose mode
-(enabled by <computeroutput>-v</computeroutput>):</para>
-
- <itemizedlist>
-
- <listitem>
- <para><computeroutput>More than 100 errors detected. Subsequent
- errors will still be recorded, but in less detail than
- before.</computeroutput></para>
-
- <para>After 100 different errors have been shown, Valgrind becomes
- more conservative about collecting them. It then requires only the
- program counters in the top two stack frames to match when deciding
- whether or not two errors are really the same one. Prior to this
- point, the PCs in the top four frames are required to match. This
- hack has the effect of slowing down the appearance of new errors
- after the first 100. The 100 constant can be changed by recompiling
- Valgrind.</para>
- </listitem>
-
- <listitem>
- <para><computeroutput>More than 1000 errors detected. I'm not
- reporting any more. Final error counts may be inaccurate. Go fix
- your program!</computeroutput></para>
-
- <para>After 1000 different errors have been detected, Valgrind
- ignores any more. It seems unlikely that collecting even more
- different ones would be of practical help to anybody, and it avoids
- the danger that Valgrind spends more and more of its time comparing
- new errors against an ever-growing collection. As above, the 1000
- number is a compile-time constant.</para>
- </listitem>
-
- <listitem>
- <para><computeroutput>Warning: client switching stacks?</computeroutput></para>
-
- <para>Valgrind spotted such a large change in the stack pointer
- that it guesses the client is switching to
- a different stack. At this point it makes a kludgey guess where the
- base of the new stack is, and sets memory permissions accordingly.
- You may get many bogus error messages following this, if Valgrind
- guesses wrong. At the moment "large change" is defined as a change
- of more that 2000000 in the value of the
- stack pointer register.</para>
- </listitem>
-
- <listitem>
- <para><computeroutput>Warning: client attempted to close Valgrind's
- logfile fd <number></computeroutput></para>
-
- <para>Valgrind doesn't allow the client to close the logfile,
- because you'd never see any diagnostic information after that point.
- If you see this message, you may want to use the
- <option>--log-fd=<number></option> option to specify a
- different logfile file-descriptor number.</para>
- </listitem>
-
- <listitem>
- <para><computeroutput>Warning: noted but unhandled ioctl
- <number></computeroutput></para>
-
- <para>Valgrind observed a call to one of the vast family of
- <computeroutput>ioctl</computeroutput> system calls, but did not
- modify its memory status info (because nobody has yet written a
- suitable wrapper). The call will still have gone through, but you may get
- spurious errors after this as a result of the non-update of the
- memory info.</para>
- </listitem>
-
- <listitem>
- <para><computeroutput>Warning: set address range perms: large range
- <number></computeroutput></para>
-
- <para>Diagnostic message, mostly for benefit of the Valgrind
- developers, to do with memory permissions.</para>
- </listitem>
-
- </itemizedlist>
-
-</sect1>
-
-
-<sect1 id="manual-core.mpiwrap" xreflabel="MPI Wrappers">
-<title>Debugging MPI Parallel Programs with Valgrind</title>
-
-<para> Valgrind supports debugging of distributed-memory applications
-which use the MPI message passing standard. This support consists of a
-library of wrapper functions for the
-<computeroutput>PMPI_*</computeroutput> interface. When incorporated
-into the application's address space, either by direct linking or by
-<computeroutput>LD_PRELOAD</computeroutput>, the wrappers intercept
-calls to <computeroutput>PMPI_Send</computeroutput>,
-<computeroutput>PMPI_Recv</computeroutput>, etc. They then
-use client requests to inform Valgrind of memory state changes caused
-by the function being wrapped. This reduces the number of false
-positives that Memcheck otherwise typically reports for MPI
-applications.</para>
-
-<para>The wrappers also take the opportunity to carefully check
-size and definedness of buffers passed as arguments to MPI functions, hence
-detecting errors such as passing undefined data to
-<computeroutput>PMPI_Send</computeroutput>, or receiving data into a
-buffer which is too small.</para>
-
-<para>Unlike most of the rest of Valgrind, the wrapper library is subject to a
-BSD-style license, so you can link it into any code base you like.
-See the top of <computeroutput>auxprogs/libmpiwrap.c</computeroutput>
-for license details.</para>
-
-
-<sect2 id="manual-core.mpiwrap.build" xreflabel="Building MPI Wrappers">
-<title>Building and installing the wrappers</title>
-
-<para> The wrapper library will be built automatically if possible.
-Valgrind's configure script will look for a suitable
-<computeroutput>mpicc</computeroutput> to build it with. This must be
-the same <computeroutput>mpicc</computeroutput> you use to build the
-MPI application you want to debug. By default, Valgrind tries
-<computeroutput>mpicc</computeroutput>, but you can specify a
-different one by using the configure-time flag
-<computeroutput>--with-mpicc=</computeroutput>. Currently the
-wrappers are only buildable with
-<computeroutput>mpicc</computeroutput>s which are based on GNU
-<computeroutput>gcc</computeroutput> or Intel's
-<computeroutput>icc</computeroutput>.</para>
-
-<para>Check that the configure script prints a line like this:</para>
-
-<programlisting><![CDATA[
-checking for usable MPI2-compliant mpicc and mpi.h... yes, mpicc
-]]></programlisting>
-
-<para>If it says <computeroutput>... no</computeroutput>, your
-<computeroutput>mpicc</computeroutput> has failed to compile and link
-a test MPI2 program.</para>
-
-<para>If the configure test succeeds, continue in the usual way with
-<computeroutput>make</computeroutput> and <computeroutput>make
-install</computeroutput>. The final install tree should then contain
-<computeroutput>libmpiwrap.so</computeroutput>.
-</para>
-
-<para>Compile up a test MPI program (eg, MPI hello-world) and try
-this:</para>
-
-<programlisting><![CDATA[
-LD_PRELOAD=$prefix/lib/valgrind/<platform>/libmpiwrap.so \
- mpirun [args] $prefix/bin/valgrind ./hello
-]]></programlisting>
-
-<para>You should see something similar to the following</para>
-
-<programlisting><![CDATA[
-valgrind MPI wrappers 31901: Active for pid 31901
-valgrind MPI wrappers 31901: Try MPIWRAP_DEBUG=help for possible options
-]]></programlisting>
-
-<para>repeated for every process in the group. If you do not see
-these, there is an build/installation problem of some kind.</para>
-
-<para> The MPI functions to be wrapped are assumed to be in an ELF
-shared object with soname matching
-<computeroutput>libmpi.so*</computeroutput>. This is known to be
-correct at least for Open MPI and Quadrics MPI, and can easily be
-changed if required.</para>
-</sect2>
-
-
-<sect2 id="manual-core.mpiwrap.gettingstarted"
- xreflabel="Getting started with MPI Wrappers">
-<title>Getting started</title>
-
-<para>Compile your MPI application as usual, taking care to link it
-using the same <computeroutput>mpicc</computeroutput> that your
-Valgrind build was configured with.</para>
-
-<para>
-Use the following basic scheme to run your application on Valgrind with
-the wrappers engaged:</para>
-
-<programlisting><![CDATA[
-MPIWRAP_DEBUG=[wrapper-args] \
- LD_PRELOAD=$prefix/lib/valgrind/<platform>/libmpiwrap.so \
- mpirun [mpirun-args] \
- $prefix/bin/valgrind [valgrind-args] \
- [application] [app-args]
-]]></programlisting>
-
-<para>As an alternative to
-<computeroutput>LD_PRELOAD</computeroutput>ing
-<computeroutput>libmpiwrap.so</computeroutput>, you can simply link it
-to your application if desired. This should not disturb native
-behaviour of your application in any way.</para>
-</sect2>
-
-
-<sect2 id="manual-core.mpiwrap.controlling"
- xreflabel="Controlling the MPI Wrappers">
-<title>Controlling the wrapper library</title>
-
-<para>Environment variable
-<computeroutput>MPIWRAP_DEBUG</computeroutput> is consulted at
-startup. The default behaviour is to print a starting banner</para>
-
-<programlisting><![CDATA[
-valgrind MPI wrappers 16386: Active for pid 16386
-valgrind MPI wrappers 16386: Try MPIWRAP_DEBUG=help for possible options
-]]></programlisting>
-
-<para> and then be relatively quiet.</para>
-
-<para>You can give a list of comma-separated options in
-<computeroutput>MPIWRAP_DEBUG</computeroutput>. These are</para>
-
-<itemizedlist>
- <listitem>
- <para><computeroutput>verbose</computeroutput>:
- show entries/exits of all wrappers. Also show extra
- debugging info, such as the status of outstanding
- <computeroutput>MPI_Request</computeroutput>s resulting
- from uncompleted <computeroutput>MPI_Irecv</computeroutput>s.</para>
- </listitem>
- <listitem>
- <para><computeroutput>quiet</computeroutput>:
- opposite of <computeroutput>verbose</computeroutput>, only print
- anything when the wrappers want
- to report a detected programming error, or in case of catastrophic
- failure of the wrappers.</para>
- </listitem>
- <listitem>
- <para><computeroutput>warn</computeroutput>:
- by default, functions which lack proper wrappers
- are not commented on, just silently
- ignored. This causes a warning to be printed for each unwrapped
- function used, up to a maximum of three warnings per function.</para>
- </listitem>
- <listitem>
- <para><computeroutput>strict</computeroutput>:
- print an error message and abort the program if
- a function lacking a wrapper is used.</para>
- </listitem>
-</itemizedlist>
-
-<para> If you want to use Valgrind's XML output facility
-(<computeroutput>--xml=yes</computeroutput>), you should pass
-<computeroutput>quiet</computeroutput> in
-<computeroutput>MPIWRAP_DEBUG</computeroutput> so as to get rid of any
-extraneous printing from the wrappers.</para>
-
-</sect2>
-
-
-<sect2 id="manual-core.mpiwrap.limitations"
- xreflabel="Abilities and Limitations of MPI Wrappers">
-<title>Abilities and limitations</title>
-
-<sect3>
-<title>Functions</title>
-
-<para>All MPI2 functions except
-<computeroutput>MPI_Wtick</computeroutput>,
-<computeroutput>MPI_Wtime</computeroutput> and
-<computeroutput>MPI_Pcontrol</computeroutput> have wrappers. The
-first two are not wrapped because they return a
-<computeroutput>double</computeroutput>, and Valgrind's
-function-wrap mechanism cannot handle that (it could easily enough be
-extended to). <computeroutput>MPI_Pcontrol</computeroutput> cannot be
-wrapped as it has variable arity:
-<computeroutput>int MPI_Pcontrol(const int level, ...)</computeroutput></para>
-
-<para>Most functions are wrapped with a default wrapper which does
-nothing except complain or abort if it is called, depending on
-settings in <computeroutput>MPIWRAP_DEBUG</computeroutput> listed
-above. The following functions have "real", do-something-useful
-wrappers:</para>
-
-<programlisting><![CDATA[
-PMPI_Send PMPI_Bsend PMPI_Ssend PMPI_Rsend
-
-PMPI_Recv PMPI_Get_count
-
-PMPI_Isend PMPI_Ibsend PMPI_Issend PMPI_Irsend
-
-PMPI_Irecv
-PMPI_Wait PMPI_Waitall
-PMPI_Test PMPI_Testall
-
-PMPI_Iprobe PMPI_Probe
-
-PMPI_Cancel
-
-PMPI_Sendrecv
-
-PMPI_Type_commit PMPI_Type_free
-
-PMPI_Pack PMPI_Unpack
-
-PMPI_Bcast PMPI_Gather PMPI_Scatter PMPI_Alltoall
-PMPI_Reduce PMPI_Allreduce PMPI_Op_create
-
-PMPI_Comm_create PMPI_Comm_dup PMPI_Comm_free PMPI_Comm_rank PMPI_Comm_size
-
-PMPI_Error_string
-PMPI_Init PMPI_Initialized PMPI_Finalize
-]]></programlisting>
-
-<para> A few functions such as
-<computeroutput>PMPI_Address</computeroutput> are listed ...
[truncated message content] |