|
From: Jeremy F. <je...@go...> - 2004-12-07 17:18:45
|
On Fri, 2004-12-03 at 18:09 -0800, Roland McGrath wrote: > > I'm not so worried about page table management. Valgrind already needs > > to support user-mode programs using mmap, that's just a high-level > > interface to the paging hardware. > > Uh, sort of. I am less worried about implementation difficulties than > about being clear in my head and in our discussion what we are actually > talking about. :-) (It probably doesn't help that just in what I am myself > clear on, I have at least two opposing plans of attack I'm describing.) > There are two areas of complication here. > > First, you can switch page tables (a virtualized %cr3 change). This > changes the universe of what each flat address means. To a first > approximation, this requires that valgrind have a big hook on which to hang its > tables of memory state and translation cache lhs values, and swap different > whole worlds onto that hook in response to the appropriate hypercall. > But slightly deeper contemplation reveals this approximation is way, way off. > > The page tables are like user programs' use of mmap in that writing page > table entries constitutes a page-granularity allocation. But in user > programs you are used to thinking about this as an extra allocation > mechanism you don't do a lot with, and focus more on the malloc allocations > where you know object boundaries from intercepting those known calls. In a > xen guest kernel, the page table allocations are the basic thing underlying > all the object allocators. It is necessary and useful just to track the > use of the pages and the initializedness of their bytes, but not useful > enough. I want to teach valgrind to understand calls to my xen guest > kernel's object allocators so it knows the object boundaries of the memory > used in the guest kernel just as it knows the boundaries of malloc > allocations in a user program. This is what I was getting at when I said > page tables are intertwined with the allocation tracking plan. I'd like to > explore how you think about this. Oh, no, that's what I was thinking. mmap is not really used in the definedness tracking, because mmap always returns pages with defined contents. Pages in the Xen universe are generally undefined, but the OS scrubs them if it cares. But the kernel's internal allocators are no different from malloc and friends; we can already use all the mechanisms already Valgrind has for a client to annotate its allocators so that Valgrind understands what's going on. One particularly helpful thing is that Xen has a hypercall for indicating stack switches, which removes one piece of guesswork for Valgrind. > To get into the deeper contemplation I mentioned, I again diverge into two > disparate implementation styles. I think both have merit for different > purposes, and ultimately would like to see a flexible hybrid approach. > I have yet to settle on which I think is easier to get going first. > > First, there is the "pure virtual" approach. That is, valgrind's task is > to virtualize the entire xen/x86 virtual machine fully, just as vanilla > valgrind virtualizes the entire linux-user/x86 virtual machine provided in > each individual user process address space by a linux kernel. This is like > unto moving towards the whole-machine emulation ability of something like > vmware or qemu, but still a whole lot simpler than that because the xen/x86 > virtual machine is substantially simpler than the x86 privileged hardware. > What this means is that valgrind directly simulates in software all the > behavior of privilege modes and page tables that xen/x86 guests can see. > valgrind runs the kernel, valgrind runs the user. For the memory handling, > valgrind emulates the work of the MMU by translating virtual addresses > according to the page tables, and indexes its shadow memory, translation > caches, and whatnot, on physical addresses that come out of the MMU module. > The translation hopefully can optimize this, analogous to how it's possible > to optimize the memory checks done for a translated block containing > multiple accesses to a single object; i.e., figure out that some blocks can > access a given address range and do the MMU translation once at the start > of a translated block. Well, given that the (Xen-virtualized) MMU hardware is available to Valgrind, surely it could be used to do any of the client's V->P translations? The page tables would be a superset of the client's PTE setup, no? Besides that, all the tools which use shadow memory already use a pagetable-like structure for mapping between client addresses and shadow addresses. This is slow, but it isn't desperately slow. The address-space switching isn't a big problem, since it happens at well-defined places, and like the kernel, Valgrind doesn't touch client memory willy-nilly (though leak-checking might be interesting). > I probably said before this is straightforward. I didn't say it wouldn't > be dog slow. This is the completist approach, and it has some benefits. > You get a system where one user process can copy around some uninitialized > data, write it into a pipe, have it read and written by three other user > processes through three other pipes, and come out someplace where valgrind > says "this uninitialized garbage you see before you came from over here". > That is pretty cool. Of more every day use will be the ability you > mentioned earlier to notice kernel code examining uninitialized data copied > in from user memory. There are many worthwhile kinds of analysis that > become possible when you are tracking the entire use of the machine with no > boundary lines (such as process address space, or user/kernel mode) on your > knowledge of the details. Yep, I think that would be very useful for a lot of types of analyses. It would also be the simplest to implement, since working out how to draw boundaries is generally the tricky bit. Also, if Valgrind/Xen presents a complete Xen interface, we can run it on itself, which is something we haven't been able to do so far. > The other approach is in fact what I had in mind at the genesis of the > discussion. That is, take advantage of some knowledge of the guest kernels > we're interested in running under valgrind, and only try to instrument > kernel code, not user code... That sounds like a rather more complex stage-2 effort. (We could intercept the various get/put user calls so that kernel->user accesses are more efficient than trying to derived them from looking at random memory accesses.) J |