|
From: Jeremy F. <je...@go...> - 2004-12-03 04:24:52
|
[ Some background valgrind-developers: Xen is a hypervisor for allowing multiple virtual machines on one real machine. It requires kernels to be ported to its environment, but the benefit is that it has much higher performance than things like VMWare. Xen is GPLd, and available from http://www.cl.cam.ac.uk/netos/xen/ One nice aspect is that Xen doesn't allow guest OS's to run privileged instructions; instead they must use "hypercalls" to change privileged state, and thereby provide us with a nice hook to mediate these changes, and to avoid having to emulate the whole privileged side of the x86 architecture. ] Well, after a bit of a read of the Xen docs, I think I agree with you. I don't think there's any fundamental reason we couldn't adapt Valgrind to be a Xen guest kernel, and have it present a virtualized Xen interface to its own client. The Xen hypervisor call interface seems to have everything we'd need to get started. There are a few tricky points: The main one is probably working out how to handle the page tables and segments. Need to look at this in more detail. Hm, will we need to understand multiple code segments? Valgrind will need to know a little bit about CPU privilege modes. It should only allow int $0x82 from ring 1. We'll need to emulate the syscall instructions too (or pretend they don't exist). Device memory look like it might be a bit tricky, since we'd like to be able to do special things with reads and writes to it. In particular, it probably doesn't make sense to keep shadow memory for it; for memcheck we'd just assume that reads are all valid and writes need to be checked for validity. There isn't currently a good mechanism for doing this; Robert has his watchpoint patch, but it is surprisingly expensive, and we wouldn't need byte-level granularity. Hm, exception handling will also be interesting to deal with. It will definitely need precise exception state information from the VCPU. It is nice that Xen already makes it the guest OS's job to map between machine physical pages and virtual physical pages; it makes Valgrind's job much easier, since it can allocate memory anywhere. Hm, but having the machine->phys mapping readable at a fixed virtual address will make it tricky to virtualize. It will take a bit of work to adapt Valgrind to running in a (mostly) unhosted environment. Lots of details to sort out, like how to get output from Valgrind. (console_io would be a good first start, but I think a network connection is probably the best long-term solution.) Ironically, it might also be easier to get Valgrind to run on itself in this environment than in a normal Linux environment, because the "OS" layer is so much simpler, and therefore easier to completely virtualize. So, definitely interesting, doable, but non-trivial. J |
|
From: Roland M. <ro...@re...> - 2004-12-03 10:22:29
|
Let me preface this by saying that I know zilch about valgrind internals,
and also do not really have anything to do with Xen. I haven't usefully
worked on or even used it, though I have read some of its code and am
generally clueful about this sort of thing in the abstract. Mostly I'm
just someone who thought these two great tastes would taste great together.
Surely more intelligent comments about the details are to be had from
actual Xen developers than my baseless pontifications can provide.
From an implementation perspective, Xen is attractive exactly for the
reasons you cite, and that's a significant reason I suggested it. It
indeed does not allow guest OS code to use privileged instructions, and the
interface its hypercalls provide is substantially simpler to understand and
emulate thoroughly than is the x86 privileged hardware. At the same time,
part of the beauty of Xen is that it does so little--the arrangement is
sufficiently straightforward (from the perspective of understanding the
hardware features) that it seems that it may wind up presenting fewer
idiosyncratic implementation issues than UML does. Xen guest OS's are
already adapted to being in an environment that is not quite real but close
enough, so valgrind isn't adding new kinds of constraints.
> The main one is probably working out how to handle the page tables and
> segments. Need to look at this in more detail.
Page table issues and segmentation issues are really quite separate and I'd
prefer to address their details independently. The page table handling is
intertwined with the essential allocation tracking. This is I think where
the real meat is that determines what several of the other decisions might
mean. I would like to discuss this in detail before getting too mired in
the other areas of detail.
> Hm, will we need to understand multiple code segments?
In the general case, yes. In practice, not really. (And anyway, there
really isn't all that much to it.) The yes is because the Xen hypercall
interface is pretty free in what it lets a guest OS do with its segments.
Loading and using segment registers (including setting %cs with jumps and
such) are not privileged instructions. There is a hypercall interface to
load segment descriptor tables of the guest OS's choice. Separate code
segments can be used both directly in jumps, and in hypercall interface
when specifying trap handlers.
The not really is because in extant guest OS implementations anyone wants
to use (such as xen/linux), the kernel code that you want instrumented only
ever uses the single standard flat code segment.
> Valgrind will need to know a little bit about CPU privilege modes.
This is intertwined with code segments in the x86 world. In fact, for
practical purposes the only need there will be to support code segment
changes is when they are used to change privilege mode.
> It should only allow int $0x82 from ring 1. We'll need to emulate the
> syscall instructions too (or pretend they don't exist).
I'm not clear on where you're headed here. For at least the first crack
plan, you only want the kernel code itself (ring 1) to be instrumented, and
when instrumented code switches rings, that should bust you out of
valgrindland to run user mode normally on the hardware.
> Device memory look like it might be a bit tricky [...]
This is something I'm interested in thinking about eventually. But one of
the benefits of using Xen is that you can punt it for later. A very useful
Xen guest OS can run without any kind of direct device access, just using
the provided virtual disk and network devices.
> Hm, exception handling will also be interesting to deal with. It will
> definitely need precise exception state information from the VCPU.
I don't know what issues you are anticipating here. The situation seems
straightforward to me.
> It will take a bit of work to adapt Valgrind to running in a (mostly)
> unhosted environment.
There are two different sets of issues to attack. One is valgrind as a Xen
client, which has these issues. The other is valgrind as a Xen hypervisor
implementation from the perspective of the guest OS running under
translation. These are separable. After all, one could very well have
valgrind run as a normal user program on whatever kind of host, and run a
Xen guest OS binary purely through translation and internal bookkeeping
implementing what the hypervisor interface claims are segments and page
tables. I actually haven't decided at the moment whether that's just a
rhetorical notion or might be a worthwhile prototyping direction to stave
off coping with the actual hypervisor environment while implementing the
virtual hypervisor interface support.
> Lots of details to sort out, like how to get output from Valgrind.
The hypervisor provides a general mechanism for guest domains to
communicate, which is in fact what underlies the virtual devices. It might
be most useful to do something special with this stuff directly, extending
the control stuff that runs on the domain0 ("host" roughly) OS to provide
valgrind log channels directly rather than kludging something with network
packets. Anyway, details indeed.
> So, definitely interesting, doable, but non-trivial.
As it should be. :-) I would like to pursue it, at least as a thought
experiment for the moment. I think this is a very doable direction to
start in, and it can lead in some even more interesting places once it
starts to gel.
Thanks,
Roland
|
|
From: Jeremy F. <je...@go...> - 2004-12-03 21:40:49
|
On Fri, 2004-12-03 at 02:21 -0800, Roland McGrath wrote:
> > The main one is probably working out how to handle the page tables and
> > segments. Need to look at this in more detail.
>
> Page table issues and segmentation issues are really quite separate and I'd
> prefer to address their details independently. The page table handling is
> intertwined with the essential allocation tracking. This is I think where
> the real meat is that determines what several of the other decisions might
> mean. I would like to discuss this in detail before getting too mired in
> the other areas of detail.
I'm not so worried about page table management. Valgrind already needs
to support user-mode programs using mmap, that's just a high-level
interface to the paging hardware. If the client wants to operate on
page tables, then that's OK. The slightly tricky bit will be in dealing
with "writable page table" mode, but it probably isn't any more complex
than the signal virtualization we have now.
Segments are a bit trickier, because Valgrind takes short-cuts. We
assume that ds, es and ss are all flat segments which are identical to
each other, and are never changed. This means that normal memory access
instructions just ignore segmentation, and assume an identity mapping.
We only do segment handling for instructions with explicit segment
overrides, which typically means fs and gs. If a client/guest OS wants
to use segmentation in more complex ways, then it will cost a fair
amount of performance.
> > Hm, will we need to understand multiple code segments?
>
> In the general case, yes. In practice, not really. (And anyway, there
> really isn't all that much to it.) The yes is because the Xen hypercall
> interface is pretty free in what it lets a guest OS do with its segments.
> Loading and using segment registers (including setting %cs with jumps and
> such) are not privileged instructions. There is a hypercall interface to
> load segment descriptor tables of the guest OS's choice. Separate code
> segments can be used both directly in jumps, and in hypercall interface
> when specifying trap handlers.
>
> The not really is because in extant guest OS implementations anyone wants
> to use (such as xen/linux), the kernel code that you want instrumented only
> ever uses the single standard flat code segment.
That's what we assume at the moment. Valgrind maintains a cache of
translated code, which is indexed by eip; like segments for data access,
cs is assumed to be flat and unchanging. If cs changes around, we'd
need to make sure that the translation cache is maintained
appropriately. One wart is that the mode of the CPU (16/32/64?) is a
property of the current cs, and that affects how the instruction bytes
are actually interpreted; if we wanted to support the different sizing
modes, we'd need to make sure that we maintain the translation cache
properly.
At least changes to cs are pretty obvious; when translating any kind of
trap/long jump/call/ret we generate code to observe how cs is changed.
> > Valgrind will need to know a little bit about CPU privilege modes.
>
> This is intertwined with code segments in the x86 world. In fact, for
> practical purposes the only need there will be to support code segment
> changes is when they are used to change privilege mode.
Yes, and fortunately the privilege modes don't affect how the
instructions are interpreted (only whether they trap or not).
> > It should only allow int $0x82 from ring 1. We'll need to emulate the
> > syscall instructions too (or pretend they don't exist).
>
> I'm not clear on where you're headed here. For at least the first crack
> plan, you only want the kernel code itself (ring 1) to be instrumented, and
> when instrumented code switches rings, that should bust you out of
> valgrindland to run user mode normally on the hardware.
That could be tricky. It's generally pretty hard to do that kind switch
between real and virtual CPU, because the virtual CPU has a lot more
state than the real one. I would assume, at least initially, that
running a kernel in this environment would have a very small
special-purpose user-mode component to exercise the kernel. Besides,
doing the whole system, you can trace how values from user-mode are
copied through the kernel and how they are used (taint checking, as well
as definedness checking).
> > Device memory look like it might be a bit tricky [...]
>
> This is something I'm interested in thinking about eventually. But one of
> the benefits of using Xen is that you can punt it for later. A very useful
> Xen guest OS can run without any kind of direct device access, just using
> the provided virtual disk and network devices.
Yep, good point.
> > Hm, exception handling will also be interesting to deal with. It will
> > definitely need precise exception state information from the VCPU.
>
> I don't know what issues you are anticipating here. The situation seems
> straightforward to me.
This is more a Valgrind issue. The current VCPU machinery does
everything at the basic-block level, and defers handling async signals
(interrupts) to between basic blocks; this works well, because the
virtual CPU state is well defined then. Within a basic blocks, the
state is not well defined, since the mapping of virtual registers to
real is non-deterministic (well, its deterministic, but nothing records
the mapping of virtual->real as a function of eip), so getting precise
machine state if an instruction faults is not possible. Mostly this
doesn't matter, since not many user-mode programs really care about the
precise machine state when dealing with a SIGSEGV. The kernel clearly
will.
But Julian is working on new VCPU machinery which will support recovery
of precise machine state at any point.
> > It will take a bit of work to adapt Valgrind to running in a (mostly)
> > unhosted environment.
>
> There are two different sets of issues to attack. One is valgrind as a Xen
> client, which has these issues. The other is valgrind as a Xen hypervisor
> implementation from the perspective of the guest OS running under
> translation. These are separable. After all, one could very well have
> valgrind run as a normal user program on whatever kind of host, and run a
> Xen guest OS binary purely through translation and internal bookkeeping
> implementing what the hypervisor interface claims are segments and page
> tables. I actually haven't decided at the moment whether that's just a
> rhetorical notion or might be a worthwhile prototyping direction to stave
> off coping with the actual hypervisor environment while implementing the
> virtual hypervisor interface support.
Yes, that's an interesting line of thought. Even without the Valgrind
component, user-mode Xen would subsume UML. Might get very expensive
emulating the paging hardware with mmap though.
> The hypervisor provides a general mechanism for guest domains to
> communicate, which is in fact what underlies the virtual devices. It might
> be most useful to do something special with this stuff directly, extending
> the control stuff that runs on the domain0 ("host" roughly) OS to provide
> valgrind log channels directly rather than kludging something with network
> packets. Anyway, details indeed.
Well, Valgrind already supports sending all its output through a socket,
and there's a long-standing wishlist item to produce more structured
output for programmatic consumption, so it might all come together
naturally.
There's also the question of getting input from the user (since Valgrind
will go interactive sometimes if you want), and attaching a debugger.
J
|
|
From: Roland M. <ro...@re...> - 2004-12-04 02:09:27
|
> I'm not so worried about page table management. Valgrind already needs > to support user-mode programs using mmap, that's just a high-level > interface to the paging hardware. Uh, sort of. I am less worried about implementation difficulties than about being clear in my head and in our discussion what we are actually talking about. :-) (It probably doesn't help that just in what I am myself clear on, I have at least two opposing plans of attack I'm describing.) There are two areas of complication here. First, you can switch page tables (a virtualized %cr3 change). This changes the universe of what each flat address means. To a first approximation, this requires that valgrind have a big hook on which to hang its tables of memory state and translation cache lhs values, and swap different whole worlds onto that hook in response to the appropriate hypercall. But slightly deeper contemplation reveals this approximation is way, way off. The page tables are like user programs' use of mmap in that writing page table entries constitutes a page-granularity allocation. But in user programs you are used to thinking about this as an extra allocation mechanism you don't do a lot with, and focus more on the malloc allocations where you know object boundaries from intercepting those known calls. In a xen guest kernel, the page table allocations are the basic thing underlying all the object allocators. It is necessary and useful just to track the use of the pages and the initializedness of their bytes, but not useful enough. I want to teach valgrind to understand calls to my xen guest kernel's object allocators so it knows the object boundaries of the memory used in the guest kernel just as it knows the boundaries of malloc allocations in a user program. This is what I was getting at when I said page tables are intertwined with the allocation tracking plan. I'd like to explore how you think about this. To get into the deeper contemplation I mentioned, I again diverge into two disparate implementation styles. I think both have merit for different purposes, and ultimately would like to see a flexible hybrid approach. I have yet to settle on which I think is easier to get going first. First, there is the "pure virtual" approach. That is, valgrind's task is to virtualize the entire xen/x86 virtual machine fully, just as vanilla valgrind virtualizes the entire linux-user/x86 virtual machine provided in each individual user process address space by a linux kernel. This is like unto moving towards the whole-machine emulation ability of something like vmware or qemu, but still a whole lot simpler than that because the xen/x86 virtual machine is substantially simpler than the x86 privileged hardware. What this means is that valgrind directly simulates in software all the behavior of privilege modes and page tables that xen/x86 guests can see. valgrind runs the kernel, valgrind runs the user. For the memory handling, valgrind emulates the work of the MMU by translating virtual addresses according to the page tables, and indexes its shadow memory, translation caches, and whatnot, on physical addresses that come out of the MMU module. The translation hopefully can optimize this, analogous to how it's possible to optimize the memory checks done for a translated block containing multiple accesses to a single object; i.e., figure out that some blocks can access a given address range and do the MMU translation once at the start of a translated block. I probably said before this is straightforward. I didn't say it wouldn't be dog slow. This is the completist approach, and it has some benefits. You get a system where one user process can copy around some uninitialized data, write it into a pipe, have it read and written by three other user processes through three other pipes, and come out someplace where valgrind says "this uninitialized garbage you see before you came from over here". That is pretty cool. Of more every day use will be the ability you mentioned earlier to notice kernel code examining uninitialized data copied in from user memory. There are many worthwhile kinds of analysis that become possible when you are tracking the entire use of the machine with no boundary lines (such as process address space, or user/kernel mode) on your knowledge of the details. The other approach is in fact what I had in mind at the genesis of the discussion. That is, take advantage of some knowledge of the guest kernels we're interested in running under valgrind, and only try to instrument kernel code, not user code. First, we decide that user-mode execution is a black box--we don't translate code in user mode, we just run it natively (more in another message about how this is done). When a page table entry is written with user-mode write permissions enabled, and then we've gone to user mode, we just assume something broad about what happened to all the bytes in those pages. (Well actually we can optimize this to know whether the page was touched at all or not.) Second, we observe that the guest kernels of interest actually use their page tables such that the kernel-mode-only pages are all the same in all the address spaces they ever switch among. These two assumptions allow us to go back to the innocent world we know, where there is just one single address space used by a single program: the guest kernel, in the kernel-only subset of the address space set up by its page tables. (Furthermore, kernels draw a fixed line between the two kinds of pages in the page tables; so in fact, the valgrind implementation of the guest hypercalls to install page table entries need only enforce that user-accessible pages are on one side of a threshold address and that kernel-only pages are on the other side, and then it need only keep track of this threshold for identifying user addresses on use rather than consult the page tables.) Now, the kernel does actually access user memory, and when it does so, the results depend on the address space switches that we just decided to pretend weren't happening. So, the translated code needs to identify reads from user addresses as getting random bits from the black box. Basically, any load from user memory is equivalent to getting those bytes from a `read' system call in userland valgrind's perspective. Usefully, in practice each block that loads from user memory always loads from user memory and each block that stores to user memory always stores to user memory. So you can notice at the first execution of a block that it will refer to user memory, and translate that block to quickly check against the threshold address and then do the memory-tracking machinery appropriate for the black box loads/stores. (In practice the guest kernels never call these blocks with an address not in the user side of the address space, so that quick check is just for wild bugs.) Thanks, Roland |
|
From: Jeremy F. <je...@go...> - 2004-12-07 17:18:45
|
On Fri, 2004-12-03 at 18:09 -0800, Roland McGrath wrote: > > I'm not so worried about page table management. Valgrind already needs > > to support user-mode programs using mmap, that's just a high-level > > interface to the paging hardware. > > Uh, sort of. I am less worried about implementation difficulties than > about being clear in my head and in our discussion what we are actually > talking about. :-) (It probably doesn't help that just in what I am myself > clear on, I have at least two opposing plans of attack I'm describing.) > There are two areas of complication here. > > First, you can switch page tables (a virtualized %cr3 change). This > changes the universe of what each flat address means. To a first > approximation, this requires that valgrind have a big hook on which to hang its > tables of memory state and translation cache lhs values, and swap different > whole worlds onto that hook in response to the appropriate hypercall. > But slightly deeper contemplation reveals this approximation is way, way off. > > The page tables are like user programs' use of mmap in that writing page > table entries constitutes a page-granularity allocation. But in user > programs you are used to thinking about this as an extra allocation > mechanism you don't do a lot with, and focus more on the malloc allocations > where you know object boundaries from intercepting those known calls. In a > xen guest kernel, the page table allocations are the basic thing underlying > all the object allocators. It is necessary and useful just to track the > use of the pages and the initializedness of their bytes, but not useful > enough. I want to teach valgrind to understand calls to my xen guest > kernel's object allocators so it knows the object boundaries of the memory > used in the guest kernel just as it knows the boundaries of malloc > allocations in a user program. This is what I was getting at when I said > page tables are intertwined with the allocation tracking plan. I'd like to > explore how you think about this. Oh, no, that's what I was thinking. mmap is not really used in the definedness tracking, because mmap always returns pages with defined contents. Pages in the Xen universe are generally undefined, but the OS scrubs them if it cares. But the kernel's internal allocators are no different from malloc and friends; we can already use all the mechanisms already Valgrind has for a client to annotate its allocators so that Valgrind understands what's going on. One particularly helpful thing is that Xen has a hypercall for indicating stack switches, which removes one piece of guesswork for Valgrind. > To get into the deeper contemplation I mentioned, I again diverge into two > disparate implementation styles. I think both have merit for different > purposes, and ultimately would like to see a flexible hybrid approach. > I have yet to settle on which I think is easier to get going first. > > First, there is the "pure virtual" approach. That is, valgrind's task is > to virtualize the entire xen/x86 virtual machine fully, just as vanilla > valgrind virtualizes the entire linux-user/x86 virtual machine provided in > each individual user process address space by a linux kernel. This is like > unto moving towards the whole-machine emulation ability of something like > vmware or qemu, but still a whole lot simpler than that because the xen/x86 > virtual machine is substantially simpler than the x86 privileged hardware. > What this means is that valgrind directly simulates in software all the > behavior of privilege modes and page tables that xen/x86 guests can see. > valgrind runs the kernel, valgrind runs the user. For the memory handling, > valgrind emulates the work of the MMU by translating virtual addresses > according to the page tables, and indexes its shadow memory, translation > caches, and whatnot, on physical addresses that come out of the MMU module. > The translation hopefully can optimize this, analogous to how it's possible > to optimize the memory checks done for a translated block containing > multiple accesses to a single object; i.e., figure out that some blocks can > access a given address range and do the MMU translation once at the start > of a translated block. Well, given that the (Xen-virtualized) MMU hardware is available to Valgrind, surely it could be used to do any of the client's V->P translations? The page tables would be a superset of the client's PTE setup, no? Besides that, all the tools which use shadow memory already use a pagetable-like structure for mapping between client addresses and shadow addresses. This is slow, but it isn't desperately slow. The address-space switching isn't a big problem, since it happens at well-defined places, and like the kernel, Valgrind doesn't touch client memory willy-nilly (though leak-checking might be interesting). > I probably said before this is straightforward. I didn't say it wouldn't > be dog slow. This is the completist approach, and it has some benefits. > You get a system where one user process can copy around some uninitialized > data, write it into a pipe, have it read and written by three other user > processes through three other pipes, and come out someplace where valgrind > says "this uninitialized garbage you see before you came from over here". > That is pretty cool. Of more every day use will be the ability you > mentioned earlier to notice kernel code examining uninitialized data copied > in from user memory. There are many worthwhile kinds of analysis that > become possible when you are tracking the entire use of the machine with no > boundary lines (such as process address space, or user/kernel mode) on your > knowledge of the details. Yep, I think that would be very useful for a lot of types of analyses. It would also be the simplest to implement, since working out how to draw boundaries is generally the tricky bit. Also, if Valgrind/Xen presents a complete Xen interface, we can run it on itself, which is something we haven't been able to do so far. > The other approach is in fact what I had in mind at the genesis of the > discussion. That is, take advantage of some knowledge of the guest kernels > we're interested in running under valgrind, and only try to instrument > kernel code, not user code... That sounds like a rather more complex stage-2 effort. (We could intercept the various get/put user calls so that kernel->user accesses are more efficient than trying to derived them from looking at random memory accesses.) J |
|
From: Roland M. <ro...@re...> - 2004-12-04 02:31:09
|
> Segments are a bit trickier, because Valgrind takes short-cuts. As I said, in practice these assumptions are valid for what interesting guest kernels actually do. If you do want to cope with it, it's a) not hard to identify the case, and b) not all that complex to support. That is, when you hit a segment register load instruction or an intersegment jump/call instruction, stop the world, possibly throw away all your cached translations, and switch to the slower plan where the translated code does the segmented->linear translation work. I said it's not too complex to support, but that doesn't mean it wouldn't be slow. That's why it's a feature to support on demand that will never be demanded. Even the current hardware presumes flat segmentation will be used and, relatively speaking, is dog slow if a kernel tries to use it any other way. > Valgrind maintains a cache of translated code, which is indexed by eip; > like segments for data access, cs is assumed to be flat and unchanging. > If cs changes around, we'd need to make sure that the translation cache > is maintained appropriately. You just want the translation cache to remain in terms of linear eip values, and apply segmentation transformations to virtual CPU eip values before going to the code translation step. > One wart is that the mode of the CPU (16/32/64?) is a property of the > current cs, and that affects how the instruction bytes are actually > interpreted; if we wanted to support the different sizing modes, we'd > need to make sure that we maintain the translation cache properly. This is another thing that is straightforward and that in practice probably noone will ever ask us to do. That is, 16-bit mode. If we have a pure-virtual valgrind for xen/x86-64, it should support going into 32-bit user mode. Keeping the translations straight is simple; you just need to include some mode bits from the segmentation universe along with the linear address in what constitutes the lhs of the translation cache. More of the work is making the translator understand the 64-bit (16-bit) instruction set you get in 64-bit mode. > At least changes to cs are pretty obvious; when translating any kind of > trap/long jump/call/ret we generate code to observe how cs is changed. All segment register moves are as easily noted in translation. Anyway, this is pretty much hypothetical. There is no call to support code that makes use of segmentation much, because none around does. It's sufficient to have checks on segment register changes (rare, easily-identified instructions) and punt in flames when any happen that might mean something. Thanks, Roland |
|
From: Jeremy F. <je...@go...> - 2004-12-07 17:18:45
|
On Fri, 2004-12-03 at 18:31 -0800, Roland McGrath wrote: > That > is, when you hit a segment register load instruction or an intersegment > jump/call instruction, stop the world, possibly throw away all your cached > translations, and switch to the slower plan where the translated code does > the segmented->linear translation work. Yep. > You just want the translation cache to remain in terms of linear eip Yep. > This is another thing that is straightforward and that in practice probably > noone will ever ask us to do. That is, 16-bit mode. If we have a > pure-virtual valgrind for xen/x86-64, it should support going into 32-bit > user mode. Keeping the translations straight is simple; you just need to > include some mode bits from the segmentation universe along with the linear > address in what constitutes the lhs of the translation cache. More of the > work is making the translator understand the 64-bit (16-bit) instruction > set you get in 64-bit mode. With x86-64, you could use some of the address bits to tag the code type (either by assuming that we're only using 48 address bits, or by using a >64bit cache tag). J |
|
From: Roland M. <ro...@re...> - 2004-12-04 02:46:39
|
> That could be tricky. It's generally pretty hard to do that kind switch > between real and virtual CPU, because the virtual CPU has a lot more > state than the real one. I don't think that matters, though I'm also not quite sure what it means. You don't have to handle "going real" in a general way. There are only a tiny number of ways a guest kernel can go into user mode, and in practice there is only one that you'll need to handle. It will be a block that pops every register off the stack and ends with an `iret' instruction. Incidentally, inside that block will be the only place you encounter an instruction in a xen/x86 guest kernel that sets a segment register. You just have to identify a block that does this, and translate it into a special thing that escapes valgrindland into the implementation glue code and provides it all the register values the translated block wanted to load. Those values are going into a black box, just as if they were copied into a buffer passed to a `write' system call in the userland valgrind's perspective. You check whether they are uninitialized data. But after that, you don't care what they are or whether they are going to be segment register values or whatever, because it's not happening in your world. The black box comes back at some point through some glue code that reenters valgrindland telling you that the virtual CPU's kernel-mode eip and esp have these here values and the other registers have just been filled in with random bits from the black box (again, as if loaded from the buffer filled by a `read' system call from the other side of the looking glass). Thanks, Roland |
|
From: Jeremy F. <je...@go...> - 2004-12-07 17:18:45
|
On Fri, 2004-12-03 at 18:46 -0800, Roland McGrath wrote: > > That could be tricky. It's generally pretty hard to do that kind switch > > between real and virtual CPU, because the virtual CPU has a lot more > > state than the real one. > > I don't think that matters, though I'm also not quite sure what it means. > You don't have to handle "going real" in a general way. Virtual->real is easy, since you just discard all the shadow state. With real->virtual you need to make up a pile of shadow state. I guess You'd just assume that all registers are fully defined, and all memory pointed to is fully defined. This split kernel/user thing is what Jeff did with UML under Valgrind, and it was relatively straightforward since the break was pretty clearly present in the UML process architecture. When you're at a lower abstraction level, it clearly gets more complex. J |