[Valgrind-developers] Heads up: Full Virtualization

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi all,

You may have seen some references to "full virtualization" (or FV). This
is the work I've been doing to restructure the way that Valgrind relates
to its client.

Currently, Valgrind relies on using LD_PRELOAD to hook itself into the
client early in its execution.  This has several problems:
      * It doesn't get in early enough, so some unknown amount of code
        has run before Valgrind starts
      * It relies on the dynamic linker, so it doesn't work for static
        binaries
      * Valgrind and the client share a dynamic linker, which means that
        Valgrind can't use any standard libraries
      * Client state and Valgrind state are intermingled in memory, so a
        stray client memory write can cause Valgrind to crash or
        misbehave

The FV changes remove all these limitations.  Valgrind no longer relies
on the dynamic linker - it actually loads the client's ELF executable
itself, and starts it from the very first instruction under Valgrind
control.  

It also means that there's a much stricter barrier between Valgrind and
the client, akin to the barrier between the kernel and user-space. 
Valgrind's use of the dynamic linker, libraries, etc is completely
independent of the client's.  In addition, the client address space is
self-contained, with Valgrind's memory near the top of the address
space.  Valgrind takes advantage of that by generating bounds checks on
client memory accesses, so they are prohibited from going outside the
client address space (this is done with x86 segmentation, and so adds
little or no extra overhead).

All the client's address-space manipulation syscalls (mmap, shmat, brk,
etc) are vetted to make sure that they are constrained within the client
address space.  Some special ioctls which can create mappings (like the
DRI ioctls) can create mappings outside the client address space, which
can cause problems (at the very least it will cause the client to take
an exception with --pointercheck=yes).

For users, there shouldn't be much in the way visible changes.  Apart
from static executables working, the main visible difference is in the
valgrind command, which is now an executable rather than a script.  It
supports all the same options, as well as the VALGRIND_OPTS environment
variable.

The changes are bigger for Valgrind developers.  Over time, we can start
dissolving vg_mylibc.c, and use standard library functions instead.  We
can also consider using new libraries and languages.  There are now
fewer restrictions on programming within Valgrind; the main one is that
direct use of mmap/munmap/mprotect() is discouraged, because we need
still need to be careful about where things are placed in the address
space, and track where things are.  VG_(mmap) does its own memory
placement algorithm so that Valgrind's mmaps don't accidentally appear
in the client address space.  There are also a few extra VKI_MAP_* flags
to control various Valgrind-specific mmap behaviours.

For tool (aka skin) authors, there have been a few changes in the tool
interface.  Within the core, the tool interface functions are all
defined in coregrind/toolfuncs.def, which is used to generate vg_skin.h
and other files.  This cuts down on a lot of tedious typing whenever a
new tool interface is added or changed; the cost is that perl is now
required for building (as opposed to just running the regression tests).

The tool interface itself has been changed a bit:

        The VG_DETERMINE_INTERFACE_VERSION macro now takes two
        arguments: a pointer to the tools pre_clo_init function (which
        need not have any particular name, and may be static; and the
        tools requirements for shadow memory, expressed as a floating
        point number which is the shadow:client memory ratio (so
        addrcheck uses 1/8th the client memory in shadow memory, since
        it uses one bit per byte; memcheck uses 9/8ths the client
        memory, because it has 8 V bits and 1 A bit per byte).

        All the SK_(track_*) functions have been renamed to
        SK_(init_*).  This is because all tool entrypoints can be
        explicitly set with a corresponding SK_(init_*) function, rather
        than relying on functions with special names (though the special
        names still work).  The intention of this is to move away from
        special filenames, since it can be a bit fragile if the names
        change (if you rename the function in the core without updating
        the tool, then the tool may silently fail to work, rather than
        alerting you to the rename at compile time).

        There's a special area of memory for shadow data.  As I
        mentioned above, the tool's init now has to tell the core how
        much shadow memory it wants to use.  There are now two ways of
        using shadow memory:
             1. You can allocate page-sized chunks of the shadow memory
                with VG_(shadow_alloc)(size).  This just returns a
                pointer to the next piece of free shadow memory.  If it
                runs out (ie, you ask for more shadow memory than you
                said you would), it panics.
             2. You can also treat all shadow memory as a big array. 
                This array is incrementally initialized as you touch
                it.  The first time you touch a particular shadow page,
                it calls your SK_(init_shadow_page) function to
                initialize that page.  This is basically called from a
                signal handler, so you have to be careful to keep this
                function as simple as possible.
        Shadow memory is from VG_(shadow_base)() to VG_(shadow_end)().

        The tools are running in a very different context from the
        client code.  This means that if you want to override some
        client functions, you can't just declare them and expect your
        code to be run by the client.  You need to create a separate .so
        file for your tool, called vgpreload_TOOLNAME.so.  If the core
        sees this when it loads your tool, it also sets the client
        environment up to LD_PRELOAD this into the client address
        space.  If you want to replace the malloc calls, you can also
        link coregrind/vg_replace_malloc.o into your vgpreload_*.so
        file.

        Similarly, if you want to allocate something in the client
        address space, you need to use VG_(cli_malloc/free).  If you
        pass a pointer to the client which is in the Valgrind address
        space, it won't be able to dereference it.  Similarly, you
        cannot call code which is in the Valgrind core or tool from the
        client - you must arrange for the code to be in the client
        address space.

You can look at the changes to memcheck and addrcheck to see all of
these being put to use (well, not shadow memory as a virtual array).  In
general it only takes a few minutes to update a tool to the new
interface.

I've updated all the standard tools, and they all seem to mostly work. 
Cachegrind is having trouble with dlclose(), which I haven't
investigated yet.

Oh, and the --in-place=<path> command-line option has gone.  Its
replacement is the VALGRINDLIB environment variable.  The build process
creates a $topdir/.in_place directory which is populated with symlinks
to the newly built core and tools, so you can use 'VALGRINDLIB=.in_place
coregrind/valgrind ...' to run it in place.

I've been testing this pretty solidly for a while, so I think it should
work OK.  No doubt you'll find some problems, but that's why I'm
checking it in (and why we did the 2.1.0 release *before* checking it in
- so that there's something semi-stable for people to play with).

	J