From: Erik H. <eah...@gm...> - 2005-06-17 23:58:57
|
On 6/17/05, Julian Seward <ju...@va...> wrote: =20 [snip] > * User runs /usr/bin/valgrind prog args-for-prog >=20 > * /usr/bin/valgrind is not the "real" valgrind executable. > That is /usr/lib/valgrind/stage2. /usr/bin/valgrind > loads stage2 high in the address space and hands control to it. >=20 > * stage2 unmaps /usr/bin/valgrind. It is now alone in the > address space, and in particular there is a hole at the > standard load address (0x8040000, or wherever). >=20 > * stage2 has its own implementation of exec() (sort of). > It uses this to load prog (+ dependent .so's) and start it. >=20 > * From prog's point of view it is started just as it would be > normally. >=20 > In reality it is running on a virtual CPU provided by stage2. > stage2 intercepts and messes with all mmap() etc done by prog > to ensure it doesn't screw up valgrind. Interesting. If I read this right valgrind acting as ELF loader.=20 Does it do linking stuff too? Does the target program effectively have it's own dynamic linker or is it shared with valgrind? Does it share instances of the libraries? - it appears that stage2 is dynamically linked as well. [snip] > This is pretty ugly. As I understand it, bpsh takes > /opt/valgrind/bin/valgrind from the master, migrates it to the slave(s)= , > starts it there, and it just happens to work because the valgrind > install trees on the master and slaves are identical. Yup. That's right. =20 > I don't have any better ideas. Fundamentally it seems difficult because > bpsh is only prepared to migrate one executable and that has to be > /opt/valgrind/bin/valgrind, so you have to have a different way to get > the executable-to-be-debugged to the slaves. It's true that it's only one executable but that could be something pretty weird. I'm kinda just thinking out loud here but what about the following: - valgrind starts up and gets through loading the program to be debugged. - valgrind stops and dumps itself w/ vmadump (bproc_dump()). - bpsh/mpirun migrates THAT process image instead of some fresh executable. - half started process w/ valgrind + other executable wakes up and runs on the slave node. The nasty bit here is that valgrind would have to be linked w/ bproc.=20 I did some weird stuff w/ editing freshly loaded elf binaries to add a preinit section that called bproc. That basically allowed the kernel to take over again after dynamic linking was done but before the program ran. I don't know if some similar hack could work here. I don't know - just a thought. This would be pretty easy to test, I think. If you added the bproc_dump call and just dumped to a plain file, you can execve that file directly to reload the dump. That would allow bpsh to do its thing. I real solution would probably look more like dump into a pipe or something. That still leaves the problem of valgrind getting at files when it pleases. Would it be possible/reasonable for valgrind to pre-load everything it *might* need down the line? That could be optional. - Erik |