Re: [BProc] Valgrind and BProc (again)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On 6/17/05, Julian Seward <ju...@va...> wrote:
=20
[snip]

> * User runs  /usr/bin/valgrind prog args-for-prog
>=20
> * /usr/bin/valgrind is not the "real" valgrind executable.
>   That is /usr/lib/valgrind/stage2.  /usr/bin/valgrind
>   loads stage2 high in the address space and hands control to it.
>=20
> * stage2 unmaps /usr/bin/valgrind.  It is now alone in the
>   address space, and in particular there is a hole at the
>   standard load address (0x8040000, or wherever).
>=20
> * stage2 has its own implementation of exec() (sort of).
>   It uses this to load prog (+ dependent .so's) and start it.
>=20
> * From prog's point of view it is started just as it would be
>   normally.
>=20
>   In reality it is running on a virtual CPU provided by stage2.
>   stage2 intercepts and messes with all mmap() etc done by prog
>   to ensure it doesn't screw up valgrind.

Interesting.  If I read this right valgrind acting as ELF loader.=20
Does it do linking stuff too?  Does the target program effectively
have it's own dynamic linker or is it shared with valgrind?  Does it
share instances of the libraries? - it appears that stage2 is
dynamically linked as well.

[snip]

>   This is pretty ugly.  As I understand it, bpsh takes
>   /opt/valgrind/bin/valgrind from the master, migrates it to the slave(s)=
,
>   starts it there, and it just happens to work because the valgrind
>   install trees on the master and slaves are identical.

Yup.  That's right.
=20
> I don't have any better ideas.  Fundamentally it seems difficult because
> bpsh is only prepared to migrate one executable and that has to be
> /opt/valgrind/bin/valgrind, so you have to have a different way to get
> the executable-to-be-debugged to the slaves.

It's true that it's only one executable but that could be something
pretty weird.  I'm kinda just thinking out loud here but what about
the following:

- valgrind starts up and gets through loading the program to be debugged.
- valgrind stops and dumps itself w/ vmadump (bproc_dump()).
- bpsh/mpirun migrates THAT process image instead of some fresh executable.
- half started process w/ valgrind + other executable wakes up and
runs on the slave node.

The nasty bit here is that valgrind would have to be linked w/ bproc.=20
I did some weird stuff w/ editing freshly loaded elf binaries to add a
preinit section that called bproc.  That basically allowed the kernel
to take over again after dynamic linking was done but before the
program ran.  I don't know if some similar hack could work here.  I
don't know - just a thought.

This would be pretty easy to test, I think. If you added the
bproc_dump call and just dumped to a plain file, you can execve that
file directly to reload the dump.  That would allow bpsh to do its
thing.  I real solution would probably look more like dump into a pipe
or something.

That still leaves the problem of valgrind getting at files when it
pleases.  Would it be possible/reasonable for valgrind to pre-load
everything it *might* need down the line?  That could be optional.

- Erik