|
From: Stephen M.
|
The shadow guest state is a part of the Valgrind machine state used by tools to record shadow information associated with registers: for instance, Memcheck uses it to store the V bits of values in registers. At the moment, the size of the shadow guest state is hard-coded to be equal in size to the regular guest state it shadows, which happens to work well for Memcheck because (uncompressed) V bits use one bit for each data bit. However, I've now worked on a couple of other shadow-value-style tools where this limitation was a problem, since I wanted to store more shadow state with each register. I ended up finding the right set of places to change to hard-code a larger limit, but I think a better fix would be to make this configurable on a per-tool basis. The most obvious interface to me would be for the tool to specify an integer multiplier; then the shadow guest state size would be that multiple of the regular guest state size. Memcheck would supply 1 as the multiplier, and tools that don't need shadow register values could use 0 (which also seems like the best default). Does this sound like a good idea? If so, I'd be willing to work on a patch. -- Stephen |
|
From: Stephen M.
|
>>>>> "SMcC" == Stephen McCamant <smcc@CSAIL.MIT.EDU> writes: SMcC> The shadow guest state is a part of the Valgrind machine state SMcC> used by tools to record shadow information associated with SMcC> registers: for instance, Memcheck uses it to store the V bits of SMcC> values in registers. At the moment, the size of the shadow guest SMcC> state is hard-coded to be equal in size to the regular guest SMcC> state it shadows, which happens to work well for Memcheck SMcC> because (uncompressed) V bits use one bit for each data bit. SMcC> However, I've now worked on a couple of other shadow-value-style SMcC> tools where this limitation was a problem, since I wanted to SMcC> store more shadow state with each register. I ended up finding SMcC> the right set of places to change to hard-code a larger limit, SMcC> but I think a better fix would be to make this configurable on a SMcC> per-tool basis. The most obvious interface to me would be for SMcC> the tool to specify an integer multiplier; then the shadow guest SMcC> state size would be that multiple of the regular guest state SMcC> size. Memcheck would supply 1 as the multiplier, and tools that SMcC> don't need shadow register values could use 0 (which also seems SMcC> like the best default). SMcC> Does this sound like a good idea? If so, I'd be willing to work SMcC> on a patch. I did some working on this today, and found that the changes required (at least for the approach I tried first) were more extensive than I'd hoped. I've got a set of changes that pass the regression tests, but perhaps some more design discussion is in order. Right now, the ThreadArchState structure that holds the guest state is inlined into the ThreadState object. This has to change if ThreadArchState is going to be variable length, and that's the largest reason for code changes: "arch." has to change to "arch->" in a whole lot of places. If the vex_shadow field is going to be variable-length, my first impulse was that it should move to the end of the ThreadArchState structure. In retrospect, though, I think it might be less disruptive to leave it in its current position, in between the regular guest state (the "vex" field) and the VEX spill area. Leaving it in place means the tool's view of the guest state stays the same: they can still start allocating shadow value right after the regular guest state. The disadvantage is that putting a variable length structure before it makes the start of the spill area unpredictable, but that would just require a change to the VEX interface; nothing on the Valgrind side really cares where the spills go. A slightly crazier idea would be to make the guest state indexing bidirectional, say using positive offsets for guest state and shadow guest state, and negative offsets for spills. That would let both the shadow state size and the number of spills vary more flexibly, but it doesn't make the view of the structure from C code any cleaner. Another set of fixed-length structures that currently hold a copy of the shadow information is the sigframes. Since these are allocated specially, it wasn't immediately clear to me whether it would be safe to add a level of indirection to them. I'm not sure now whether the better default for the multiplier would be 0 or 1. Most tools don't seem to use shadow registers, so 0 would save space for them, but 1 would be more backwards-compatible. Given the complexities of adding indirection, it seems somewhat in the spirit of the rest of Valgrind to just hard-code a larger maximum; in that case, I'd propose a multiplier of 10. But 300 threads * 300 bytes guest space * 10 copies = 900k is not a trivial amount of memory to waste. Any thoughts on the above issues? If you'd like to look at the patches I've been working on so far, they're at: http://people.csail.mit.edu/smcc/valgrind-patches/guest-resize-try1-vg.patch http://people.csail.mit.edu/smcc/valgrind-patches/guest-resize-try1-vex.patch (svn diffs for Valgrind proper and VEX respectively). -- Stephen |
|
From: Nicholas N. <nj...@cs...> - 2007-08-25 01:02:35
|
On Fri, 24 Aug 2007, Stephen McCamant wrote: > If the vex_shadow field is going to be variable-length, my first > impulse was that it should move to the end of the ThreadArchState > structure. In retrospect, though, I think it might be less disruptive > to leave it in its current position, in between the regular guest > state (the "vex" field) and the VEX spill area. Leaving it in place > means the tool's view of the guest state stays the same: they can > still start allocating shadow value right after the regular guest > state. The disadvantage is that putting a variable length structure > before it makes the start of the spill area unpredictable, but that > would just require a change to the VEX interface; nothing on the > Valgrind side really cares where the spills go. This sounds, at first thought, the best way to go. > Another set of fixed-length structures that currently hold a copy of > the shadow information is the sigframes. Since these are allocated > specially, it wasn't immediately clear to me whether it would be safe > to add a level of indirection to them. I don't know about them... > I'm not sure now whether the better default for the multiplier would > be 0 or 1. Most tools don't seem to use shadow registers, so 0 would > save space for them, but 1 would be more backwards-compatible. Don't worry about backwards compatibility, since you'll probably be breaking that anyway :) No point wasting space if you don't have to. > Given the complexities of adding indirection, it seems somewhat in the > spirit of the rest of Valgrind to just hard-code a larger maximum; in > that case, I'd propose a multiplier of 10. But 300 threads * 300 bytes > guest space * 10 copies = 900k is not a trivial amount of memory to > waste. I think specifying the multiplier would be better. But one issue that worries me: even if the shadow size is only double the normal size, will all the register stuff work in the IR? For example, if the code is dealing with registers of type F64 or V128, there's no way to talk about shadow registers of twice that size, is there? It's probably ok for the integer registers, but that's only part of the story. Nick |
|
From: Stephen M.
|
SMcC> Another set of fixed-length structures that currently hold a SMcC> copy of the shadow information is the sigframes. Since these are SMcC> allocated specially, it wasn't immediately clear to me whether SMcC> it would be safe to add a level of indirection to them. >>>>> "NJN" == Nicholas Nethercote <nj...@cs...> writes: NJN> I don't know about them... Probably the easiest thing to do is just to try it and see if anything breaks. There are two things that seem like they could be potential problems: first, if sigframe_destroy() doesn't get called, the allocated space would leak, but it doesn't look like the sort of thing that could get bypassed. Second, in user programs you might worry about calling a non-reentrant malloc() from a signal that was delivered while malloc() was running, but I'd presume Valgrind serializes things so that that wouldn't be a problem. SMcC> I'm not sure now whether the better default for the multiplier SMcC> would be 0 or 1. Most tools don't seem to use shadow registers, SMcC> so 0 would save space for them, but 1 would be more SMcC> backwards-compatible. NJN> Don't worry about backwards compatibility, since you'll probably NJN> be breaking that anyway :) No point wasting space if you don't NJN> have to. Reasonable enough. I am actually expecting this will be pretty backwards-compatible, at least with the more compatible layout, but in any case it's just a one-line change. NJN> But one issue that worries me: even if the shadow size is only NJN> double the normal size, will all the register stuff work in the NJN> IR? For example, if the code is dealing with registers of type NJN> F64 or V128, there's no way to talk about shadow registers of NJN> twice that size, is there? It's probably ok for the integer NJN> registers, but that's only part of the story. Yes, this gets to the point that what my tools are doing is in fact a little more different. The notion of a multiplier isn't really the most natural one for them, since it isn't that they want an n bits-1 bit shadow area in the way Memcheck wants a 1 bit-1 bit one. Instead, they want to have a fixed-size tag for each separate register (e.g., 32 bits per register for the abstract type inference tool, at least on 32 bit machines). So the most space efficient thing would actually be to have a quite different guest-space layout, though depending on how many registers of different sizes there are it might still have to be bigger overall. But since the IR representation of guest state locations is just integer offsets, this would require a somewhat ugly lookup table. The easier scheme that we've hit on is just to multiply all the offsets by a constant when computing the shadow offset. If you pick the constant to be the largest factor by which anything expands (e.g., 4 if there are byte registers that have 32-bit shadows), then nothing overlaps, though there's unused space around the shadows for larger registers. (This scheme would also get in trouble if there were registers that were sometimes accessed with GET/PUT and other times with GETI/PUTI, but that doesn't seem to happen with the current VEX on x86 or amd64.) -- Stephen |
|
From: Nicholas N. <nj...@cs...> - 2007-08-28 22:12:07
|
On Tue, 28 Aug 2007, Stephen McCamant wrote: > NJN> But one issue that worries me: even if the shadow size is only > NJN> double the normal size, will all the register stuff work in the > NJN> IR? For example, if the code is dealing with registers of type > NJN> F64 or V128, there's no way to talk about shadow registers of > NJN> twice that size, is there? It's probably ok for the integer > NJN> registers, but that's only part of the story. > > Yes, this gets to the point that what my tools are doing is in fact a > little more different. The notion of a multiplier isn't really the > most natural one for them, since it isn't that they want an n bits-1 > bit shadow area in the way Memcheck wants a 1 bit-1 bit one. Instead, > they want to have a fixed-size tag for each separate register (e.g., > 32 bits per register for the abstract type inference tool, at least on > 32 bit machines). So the most space efficient thing would actually be > to have a quite different guest-space layout, though depending on how > many registers of different sizes there are it might still have to be > bigger overall. But since the IR representation of guest state > locations is just integer offsets, this would require a somewhat ugly > lookup table. So every register has a 32-bit shadow, regardless of its size? In that case, specifying the absolute shadow guest state size might be better than specifying a multiplier. (And you don't have the problem I suggested because all your shadow values are normal Vex types. I guess if you wanted a shadow value that wasn't a Vex type (eg. 256 bits) you'd have to model it via a word-sized pointer to a 256-bit region. That 256-bit region could be in the shadow guest state or elsewhere. Hmm, I'm just thinking aloud.) What do you do about the overlapping registers, eg. eax/ax/ah/al? Nick |
|
From: Stephen M.
|
>>>>> "NJN" == Nicholas Nethercote <nj...@cs...> writes: NJN> So every register has a 32-bit shadow, regardless of its size? NJN> In that case, specifying the absolute shadow guest state size NJN> might be better than specifying a multiplier. Specifying an absolute size is more flexible: it lets you save more space when the amount you need isn't an exact multiple of the original guest state size, and it would be more natural if a tool needed an amount of extra guest space that wasn't proportional to the original guest area. The reason I'd proposed a multiplier is that it seems easier to use for the cases I've seen actually implemented so far. To allow specifying an absolute size, I think you'd want a two-function interface: either one to set the size absolutely and one to set it as a multiple, or one to set the size absolutely and one to get the original guest area size (IIRC, that is also available in the VexGuestLayout structure, but we'd like to set this earlier than that is available.) NJN> What do you do about the overlapping registers, eg. eax/ax/ah/al? Heh, an insightful question. Those represent the same kind of potential problem as GETs overlapping with GETIs, but I don't think we'd previously considered them. So I think we just silently do the wrong thing. %eax, %ax, and %al are at the same offset on a little-endian host, so they're treated as if they were all the same register, and %ah is treated as if it were completely unconnected. Given what we've implemented so far, I think the easiest way to handle this properly would be to add a pre-pass that translated the GET and PUT operations on sub-registers into bit operations; i.e., the IR version of changing "%ah = t42" into "%eax = (%eax & 0xffff00ff) | (t42 << 8)". We're probably not worried enough about performance to care about making the instruction selector recognize such patterns to translate back. The other reasonable possibility would be to associate a tag value with every register byte separately, and treat larger than byte-sized operations as working on all of the bytes. This is what we do for memory, but the extra precision seems more important there because it's quite common for two bytes in a memory word to have different meanings (e.g., adjacent but unrelated char fields in a structure). By contrast, I suspect gcc at least never uses %al and %ah for unrelated byte variables, and since register operations are more common we'd like to optimize the common case. -- Stephen |
|
From: Stephen M.
|
>>>>> "NJN" == Nicholas Nethercote <nj...@cs...> writes:
NJN> So every register has a 32-bit shadow, regardless of its size?
NJN> In that case, specifying the absolute shadow guest state size
NJN> might be better than specifying a multiplier.
I've now put together a revised patch including the changes we
discussed. The tool interface now looks like:
/* How much space, in bytes, should be allocated for shadow data
describing the guest program's registers? */
extern void VG_(details_shadow_guest_sizeB)( Int size );
/* How much space, in bytes, does it take to store the registers of
the guest program's architecture? Often, the argument to
details_shadow_guest_sizeB is a multiple of this. */
extern Int VG_(get_guest_sizeB)(void);
The default is an empty shadow guest area, so the standard tools other
than Memcheck don't need to be changed at all. To get the same amount
of space it's used in the past, Memcheck does:
VG_(details_shadow_guest_sizeB)(VG_(get_guest_sizeB)());
Notes on the interface:
* I put the order of the ThreadArchState structure back to the way it
was, so that the shadow guest area still starts right after the
regular guest area. This means that the spill are is now no longer
at a fixed offset, and the size of the shadow area needs to get
passed to Vex.
* To implement get_guest_sizeB(), I added an interface to Vex to make
the GuestLayout available at any time, not just as a parameter to
the translate function.
* Since Valgrind delivers signals synchronously, I think it's safe to
call VG_(arena_malloc/free) in the signal frame setup code, so I
used that to allocate the saved shadow guest state.
This patch compiles and passes "make regtest" on the two platforms I
can test on here, Debian Etch/x86 and Debian Etch/amd64. I've also
made the corresponding changes to the PPC code, but can't test them.
Besides the standard tools, I've also tested the code with one of mine
that uses
VG_(details_shadow_guest_sizeB)(9 * VG_(get_guest_sizeB)());
and it also works as expected.
As before, I've split the patch into Valgrind-proper and Vex-specific
parts:
http://people.csail.mit.edu/smcc/valgrind-patches/guest-resize-try2-vg.patch
http://people.csail.mit.edu/smcc/valgrind-patches/guest-resize-try2-vex.patch
-- Stephen
|