Re: [BProc] Valgrind and BProc (again)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

> I've never tried to use it in this manner.  I did take a quick peek at
> this once a while back.  At the time it looked to me like starting a
> process with valgrind was (essentially) setting LD_PRELOAD to load the
> valgrind .so files and maybe a few other environment variables.  Is
> this still more or less what valgrind is doing?  /usr/bin/valgrind I
> found on my system here is just a binary.

Erik

The LD_PRELOAD mechanism went away some time back as it relies too
heavily on glibc and libpthread specifics.  Startup is tricky because
we need to load both Valgrind and the application to be debugged into
the same address space (same process); but the application has no 
idea this is happening.  How it works now is:

* User runs  /usr/bin/valgrind prog args-for-prog

* /usr/bin/valgrind is not the "real" valgrind executable.
  That is /usr/lib/valgrind/stage2.  /usr/bin/valgrind
  loads stage2 high in the address space and hands control to it.

* stage2 unmaps /usr/bin/valgrind.  It is now alone in the
  address space, and in particular there is a hole at the
  standard load address (0x8040000, or wherever).

* stage2 has its own implementation of exec() (sort of).
  It uses this to load prog (+ dependent .so's) and start it.

* From prog's point of view it is started just as it would be 
  normally.

  In reality it is running on a virtual CPU provided by stage2.
  stage2 intercepts and messes with all mmap() etc done by prog
  to ensure it doesn't screw up valgrind.

So ..

> 1. Valgrind has to have some reasonably nice mechanism to tell us what
> exactly needs to be set.  I'm not sure exactly what that should look
> like but I figure there's lots of possibilities

No env vars are needed for startup, I think.  There's some env var
trickery if a valgrinded process wants to start a valgrinded
child process, but we can ignore that for now.

> 3. The valgrind libraries need to be available on the slave nodes.
> This is just a system configuration issue.  I did some work
> experimenting with migration after linking so this requirement could
> potentially go away.

Well, not just the .so's.  stage2 assumes it can grab any of the stuff
in PREFIX/lib/valgrind as/when it likes.  There are various .so's forced
into the address space at startup, but there are also a bunch of text
files (*.supp) which are important.

So far I managed to get it to work as follows:

* on the master node, install into /opt/valgrind

* bpcp -r /opt/valgrind to the slave nodes

* bpcp 'prog' to somewhere on the slave nodes, say /opt/prog

* on the master do

   bpsh <nodeid> /opt/valgrind/bin/valgrind /opt/prog args-for-prog

  This is pretty ugly.  As I understand it, bpsh takes
  /opt/valgrind/bin/valgrind from the master, migrates it to the slave(s),
  starts it there, and it just happens to work because the valgrind
  install trees on the master and slaves are identical.

I don't have any better ideas.  Fundamentally it seems difficult because
bpsh is only prepared to migrate one executable and that has to be 
/opt/valgrind/bin/valgrind, so you have to have a different way to get
the executable-to-be-debugged to the slaves.

If the entire V install tree could be pre-installed on the slaves that
would help.  I guess one option is for all slaves to refer to a global
NFS mount.  But there would still be a problem of moving the executable.

Ideas?

J