|
From: Nicholas N. <nj...@ca...> - 2004-07-15 22:20:17
|
Hi,
I'm looking at how new shadow memory pages are allocated when necessary.
It's weird, there seem to be two mechanisms for this.
First, every tool that uses shadow memory (memcheck, addrcheck, helgrind)
is very careful to call their respective ENSURE_MAPPABLE macros before
accessing shadow memory, which checks if there is a shadow page for the
address and allocates a new one if not.
That would seem to be enough. But in vg_signals.c, there's a bit in the
SEGV handler with this comment:
/* If there's a fault within the shadow memory range, and it
is a permissions fault, then it means that the client is
using some memory which had not previously been used.
This catches those faults, makes the memory accessible,
and calls the tool to initialize that page.
*/
This calls VG_(init_shadow_range)(), which calls the init_shadow_page
trackable event, telling the tool to make itself a shadow page. But none
of the tools actually provide the necessary init_shadow_page callback.
This 2nd mechanism is simply not being used; I tried removing it and the
entire regression test suite ran fine.
So I see two options:
1. Just rely on the currently used ENSURE_MAPPABLE macros in the tools.
This would allow a couple of functions to be removed or simplified, saving
50 lines of code.
2. Just rely on the SEGV handling bit. This would require adding the
init_shadow_page callback to each of the tools. The advantage with this
option is that it might make things a bit faster, since we wouldn't have
to do ENSURE_MAPPABLE (which is a comparison like "x == y[z >> 16]") for
every shadow memory access. (But it might make no discernible difference,
since this is just arithmetic which might be swamped by the associated
memory accesses. In which case option (1) is clearly better.)
Any opinions? Jeremy, do you know why both these mechanisms are present?
Thanks.
N
|
|
From: Tom H. <th...@cy...> - 2004-07-15 22:44:06
|
In message <Pin...@he...>
Nicholas Nethercote <nj...@ca...> wrote:
> That would seem to be enough. But in vg_signals.c, there's a bit in the
> SEGV handler with this comment:
>
> /* If there's a fault within the shadow memory range, and it
> is a permissions fault, then it means that the client is
> using some memory which had not previously been used.
> This catches those faults, makes the memory accessible,
> and calls the tool to initialize that page.
> */
>
> This calls VG_(init_shadow_range)(), which calls the init_shadow_page
> trackable event, telling the tool to make itself a shadow page. But none
> of the tools actually provide the necessary init_shadow_page callback.
Somebody did manage to reach that code however, although I'm not
quite sure how. See bug 80932 where I was also trying to work out
what that bit in the SEGV handler was for.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Jeremy F. <je...@go...> - 2004-07-15 23:33:38
|
On Thu, 2004-07-15 at 23:44 +0100, Tom Hughes wrote: > Somebody did manage to reach that code however, although I'm not > quite sure how. See bug 80932 where I was also trying to work out > what that bit in the SEGV handler was for. Hm. The code should probably check to see if the tool defines init_shadow_page, and just pass it through like a normal SIGSEGV if not. But the client could only get here using --pointercheck=no anyway, so it means that all of Valgrind is vulnerable to wild pointers. J |
|
From: Jeremy F. <je...@go...> - 2004-07-15 23:27:05
|
On Thu, 2004-07-15 at 23:20 +0100, Nicholas Nethercote wrote: > Hi, > > I'm looking at how new shadow memory pages are allocated when necessary. > It's weird, there seem to be two mechanisms for this. > > First, every tool that uses shadow memory (memcheck, addrcheck, helgrind) > is very careful to call their respective ENSURE_MAPPABLE macros before > accessing shadow memory, which checks if there is a shadow page for the > address and allocates a new one if not. > > That would seem to be enough. But in vg_signals.c, there's a bit in the > SEGV handler with this comment: > > /* If there's a fault within the shadow memory range, and it > is a permissions fault, then it means that the client is > using some memory which had not previously been used. > This catches those faults, makes the memory accessible, > and calls the tool to initialize that page. > */ > > This calls VG_(init_shadow_range)(), which calls the init_shadow_page > trackable event, telling the tool to make itself a shadow page. But none > of the tools actually provide the necessary init_shadow_page callback. > > This 2nd mechanism is simply not being used; I tried removing it and the > entire regression test suite ran fine. > > So I see two options: > > 1. Just rely on the currently used ENSURE_MAPPABLE macros in the tools. > This would allow a couple of functions to be removed or simplified, saving > 50 lines of code. > > 2. Just rely on the SEGV handling bit. This would require adding the > init_shadow_page callback to each of the tools. The advantage with this > option is that it might make things a bit faster, since we wouldn't have > to do ENSURE_MAPPABLE (which is a comparison like "x == y[z >> 16]") for > every shadow memory access. (But it might make no discernible difference, > since this is just arithmetic which might be swamped by the associated > memory accesses. In which case option (1) is clearly better.) > > Any opinions? Jeremy, do you know why both these mechanisms are present? Yup, I put them there. The SIGSEGV mechanism is a result of a discussion we had ages ago about direct-mapping the shadow memory from the client address - basically so that there's a simple addr*scale+offset to map from a client address to a shadow address. The theory was that this computation is simple enough to do inline, so a number of the shadow memory accesses wouldn't require calls to helpers. It would also be, in theory, faster because it removes a memory reference indirecting through the page. There are a couple of problems in practice. The direct-mapped approach also needs a bounds check to make sure that the pointer is actually a user-address-space pointer - if it isn't the computed shadow pointer could end up in the middle of the Valgrind address space. This bounds check approximately doubles the size of the address calculation, and probably undermines any performance improvement. Which leads the other problem: I never actually measured a clear performance improvement. I'm not really sure why. It wasn't because of the overhead of the SIGSEGV handler, since the fault rate dropped a lot once the working set was established. But I was never really sure of what it did to cache access patterns, and the address calculation often ended up being quite fiddly. So, no, nothing is using this at the moment. It looks like it should be a useful mechanism, but it isn't clear that any of the existing tools can easily use it. On the other hand, there's no huge performance hit, so if the code turns out to be simpler with direct mapping, it might be the way to go (ie, it isn't worth changing existing tools, but it might make sense for new ones). J |
|
From: Nicholas N. <nj...@ca...> - 2004-07-16 13:01:44
|
On Thu, 15 Jul 2004, Jeremy Fitzhardinge wrote: > The SIGSEGV mechanism is a result of a discussion we had ages ago about > direct-mapping the shadow memory from the client address - basically so > that there's a simple addr*scale+offset to map from a client address to > a shadow address. > > The theory was that this computation is simple enough to do inline, so a > number of the shadow memory accesses wouldn't require calls to helpers. > It would also be, in theory, faster because it removes a memory > reference indirecting through the page. > > There are a couple of problems in practice. The direct-mapped approach > also needs a bounds check to make sure that the pointer is actually a > user-address-space pointer - if it isn't the computed shadow pointer > could end up in the middle of the Valgrind address space. This bounds > check approximately doubles the size of the address calculation, and > probably undermines any performance improvement. > > Which leads the other problem: I never actually measured a clear > performance improvement. I'm not really sure why. It wasn't because of > the overhead of the SIGSEGV handler, since the fault rate dropped a lot > once the working set was established. But I was never really sure of > what it did to cache access patterns, and the address calculation often > ended up being quite fiddly. > > So, no, nothing is using this at the moment. It looks like it should be > a useful mechanism, but it isn't clear that any of the existing tools > can easily use it. On the other hand, there's no huge performance hit, > so if the code turns out to be simpler with direct mapping, it might be > the way to go (ie, it isn't worth changing existing tools, but it might > make sense for new ones). Here are my thoughts: * I was an early advocate of the addr*scale+offset direct-mapping approach, but I've now changed my mind. Partly because it didn't, as you say, help performance very much. And partly because direct-mapping requires shadow memory to be in a single block, which constrains memory layout too much. With the memory layout changes I'm considering, Valgrind memory, tool memory and shadow memory will all be intermingled and not in any particular order. * As for the ENSURE_MAPPABLE vs. allocate-on-SEGV approaches for shadow page allocation, I'm not too fussed either way. But I'd like to choose one, and remove the other. I don't like having two ways of doing something. It's confusing and increases code size. Having code that is never used is a liability. There was a paper written a couple of years ago about why GCC is so hard to maintain, and having more than one way of doing things was identified as a major factor. If pressed, I'd go for ENSURE_MAPPABLE because I think it's simpler and has less code and won't be noticeably slower. How does that sound? What does everyone think? N |
|
From: Julian S. <js...@ac...> - 2004-07-16 13:20:18
|
> * I was an early advocate of the addr*scale+offset direct-mapping > approach, but I've now changed my mind. Partly because it didn't, as you > say, help performance very much. And partly because direct-mapping > requires shadow memory to be in a single block, which constrains memory > layout too much. With the memory layout changes I'm considering, Valgrind > memory, tool memory and shadow memory will all be intermingled and not in > any particular order. > > * As for the ENSURE_MAPPABLE vs. allocate-on-SEGV approaches for shadow > page allocation, I'm not too fussed either way. But I'd like to choose > one, and remove the other. I don't like having two ways of doing > something. It's confusing and increases code size. Having code that is > never used is a liability. There was a paper written a couple of years > ago about why GCC is so hard to maintain, and having more than one way of > doing things was identified as a major factor. If pressed, I'd go for > ENSURE_MAPPABLE because I think it's simpler and has less code and won't > be noticeably slower. I agree. A single mechanism is much better, and I would prefer the ENSURE_MAPPABLE scheme, as it doesn't make any assumptions about the underlying OS. But: isn't the allocate-on-SEGV approach required for tools which don't do shadow memory? Apologies if this is a dumb question, I didn't follow all this thread in detail. > How does that sound? What does everyone think? I think a major coup would be to arrive at a design which (a) reduces, perhaps, or localises, any Linux-specific assumptions about layout, to make porting to other OSs easier, and more (b) works uniformly well on both 32- and 64-bit platforms. Achieving both would be a majorly Good Thing. J |
|
From: Nicholas N. <nj...@ca...> - 2004-07-16 13:46:18
|
On Fri, 16 Jul 2004, Julian Seward wrote: > I agree. A single mechanism is much better, and I would prefer the > ENSURE_MAPPABLE scheme, as it doesn't make any assumptions about the > underlying OS. But: isn't the allocate-on-SEGV approach required for > tools which don't do shadow memory? No, this is all about allocating shadow pages; you can either get the tools to do it themselves (ENSURE_MAPPABLE) or just let them assume the pages exist and let the core allocate-on-SEGV. N |
|
From: Julian S. <js...@ac...> - 2004-07-16 13:50:09
|
On Friday 16 July 2004 14:46, Nicholas Nethercote wrote: > On Fri, 16 Jul 2004, Julian Seward wrote: > > I agree. A single mechanism is much better, and I would prefer the > > ENSURE_MAPPABLE scheme, as it doesn't make any assumptions about the > > underlying OS. But: isn't the allocate-on-SEGV approach required for > > tools which don't do shadow memory? > > No, this is all about allocating shadow pages; you can either get the > tools to do it themselves (ENSURE_MAPPABLE) or just let them assume the > pages exist and let the core allocate-on-SEGV. Ah, ok, thanks. J |
|
From: Nicholas N. <nj...@ca...> - 2004-07-16 13:51:50
|
On Fri, 16 Jul 2004, Julian Seward wrote: > I think a major coup would be to arrive at a design which > (a) reduces, perhaps, or localises, any Linux-specific > assumptions about layout, to make porting to other OSs easier, > and more (b) works uniformly well on both 32- and 64-bit > platforms. Achieving both would be a majorly Good Thing. Yeah, I'm concerned about how shadow pages are going to be stored on 64-bit architectures, because moving from a 2-level page table to a 4-level one sounds bad. N |
|
From: Julian S. <js...@ac...> - 2004-07-16 14:26:44
|
On Friday 16 July 2004 14:51, Nicholas Nethercote wrote: > On Fri, 16 Jul 2004, Julian Seward wrote: > > I think a major coup would be to arrive at a design which > > (a) reduces, perhaps, or localises, any Linux-specific > > assumptions about layout, to make porting to other OSs easier, > > and more (b) works uniformly well on both 32- and 64-bit > > platforms. Achieving both would be a majorly Good Thing. > > Yeah, I'm concerned about how shadow pages are going to be stored on > 64-bit architectures, because moving from a 2-level page table to a > 4-level one sounds bad. It is. But do we need 4 levels? What about a 3-level scheme where the top level table has a 22-bit index, then level 2 also has a 22-bit index, and the final level has a 20-bit index --(22,22,20) so to speak. On 32-bit machine we could stick with (16,16) or move to (12,20) so the final level code is shared with the 64-bit case. Idea #2. Is there a portable/sane way to restrict the addresses generated to (say) 40 bits, so that the top 24 are zero? Then we could use a (20,20) scheme. 40 bits is 256 GB, which is a big step up from 4GB. Probably not doable, I guess. Idea #3. Have a two-level scheme. The bottom level tables are as now, covering power-of-two sized bits of address space. The top level table is (conceptually) a list of (start, size, pointer-to-low-level-table) triples. An access first searches the top table to find the bottom table. This could work equally well/badly on both 32 and 64 bit platforms. If the top-level table is dynamically rearranged to bring frequently used entries to the front, and the bottom level pages are big enough, then you may wind up with the first 3 entries being for the currently active stack, heap and data-area sub-tables. In which case the accesses would be cheap for the most case. Alignment checking in the common case could be made free by making each stated chunk in the top-level section marginally smaller than the low-level table it points at. So an access which straddles a low-level table boundary would simply fail to match any top level entries. A slow-case handler which understood this trickery would then deal with it. Another comment is that it's probably worth considering a scheme which minimises the cache miss rate rather than the dynamic instruction count. Euro 0.02, etc. J |
|
From: Julian S. <js...@ac...> - 2004-07-16 14:38:05
|
> If the top-level table is dynamically rearranged to bring frequently > used entries to the front, and the bottom level pages are big enough, > then you may wind up with the first 3 entries being for the currently > active stack, heap and data-area sub-tables. In which case the > accesses would be cheap for the most case. witter witter witter ... Alternatively, the top-level table could have an associated small direct-mapped cache, which would naturally come to hold the popular entries, falling back to the main table on a miss. One problem is then a new scheme would be needed to do the free alignment checking and I'm not sure what that would be. J |
|
From: Tom H. <th...@cy...> - 2004-07-16 14:39:38
|
In message <200...@ac...>
Julian Seward <js...@ac...> wrote:
> Idea #2. Is there a portable/sane way to restrict the addresses
> generated to (say) 40 bits, so that the top 24 are zero? Then
> we could use a (20,20) scheme. 40 bits is 256 GB, which is a big
> step up from 4GB. Probably not doable, I guess.
Well currently Athlon 64's actually use a 48 bit virtual address
space and a 40 bit physical address space although I believe that
can change with different chip models.
It is reported in /proc/cpuinfo so is presumably read with cpuid
or something - here's what /proc/cpuinfo says on our Athlon 64:
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 4
model name : AMD Athlon(tm) 64 Processor 3200+
stepping : 8
cpu MHz : 1994.895
cache size : 1024 KB
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow
bogomips : 4089.44
TLB size : 1088 4K pages
clflush size : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Jeremy F. <je...@go...> - 2004-07-16 18:04:56
|
On Fri, 2004-07-16 at 15:28 +0100, Julian Seward wrote: > On Friday 16 July 2004 14:51, Nicholas Nethercote wrote: > > On Fri, 16 Jul 2004, Julian Seward wrote: > > > I think a major coup would be to arrive at a design which > > > (a) reduces, perhaps, or localises, any Linux-specific > > > assumptions about layout, to make porting to other OSs easier, > > > and more (b) works uniformly well on both 32- and 64-bit > > > platforms. Achieving both would be a majorly Good Thing. > > > > Yeah, I'm concerned about how shadow pages are going to be stored on > > 64-bit architectures, because moving from a 2-level page table to a > > 4-level one sounds bad. > > It is. But do we need 4 levels? What about a 3-level scheme > where the top level table has a 22-bit index, then level 2 > also has a 22-bit index, and the final level has a 20-bit > index --(22,22,20) so to speak. On 32-bit machine we could > stick with (16,16) or move to (12,20) so the final level code > is shared with the 64-bit case. > > Idea #2. Is there a portable/sane way to restrict the addresses > generated to (say) 40 bits, so that the top 24 are zero? Then > we could use a (20,20) scheme. 40 bits is 256 GB, which is a big > step up from 4GB. Probably not doable, I guess. In both cases, a 1MByte page size sounds pretty expensive. Maybe not. Not sure. My feeling is that once you add all this extra complexity, the direct mapped SEGV scheme looks a lot more attractive. The performance difference vs the current scheme was around +/- 1%. In the 64-bit environment, the SEGV scheme would cost the same as it does now, whereas adding extra levels/cache lookups/etc are going to cost more. J |
|
From: Jeremy F. <je...@go...> - 2004-07-16 18:01:11
|
On Fri, 2004-07-16 at 14:21 +0100, Julian Seward wrote: > I agree. A single mechanism is much better, and I would prefer the > ENSURE_MAPPABLE scheme, as it doesn't make any assumptions about the > underlying OS. But: isn't the allocate-on-SEGV approach required for > tools which don't do shadow memory? Apologies if this is a dumb > question, I didn't follow all this thread in detail. If they don't use shadow, they don't need either mechanism. > I think a major coup would be to arrive at a design which > (a) reduces, perhaps, or localises, any Linux-specific > assumptions about layout, to make porting to other OSs easier, > and more (b) works uniformly well on both 32- and 64-bit > platforms. Achieving both would be a majorly Good Thing. The other motivation for the SEGV direct map scheme is that it does work for 64 bit targets. The current scheme doesn't scale, and would probably need another level of page table to make it work on 64-bit targets, which affects all the Tools which use it. The SEGV scheme can hide those details from the Tools. J |
|
From: Nicholas N. <nj...@ca...> - 2004-07-16 18:30:32
|
On Fri, 16 Jul 2004, Jeremy Fitzhardinge wrote: > On Fri, 2004-07-16 at 14:21 +0100, Julian Seward wrote: > > The other motivation for the SEGV direct map scheme is that it does work > for 64 bit targets. The current scheme doesn't scale, and would > probably need another level of page table to make it work on 64-bit > targets, which affects all the Tools which use it. The SEGV scheme can > hide those details from the Tools. > > My feeling is that once you add all this extra complexity, the direct > mapped SEGV scheme looks a lot more attractive. The performance > difference vs the current scheme was around +/- 1%. In the 64-bit > environment, the SEGV scheme would cost the same as it does now, whereas > adding extra levels/cache lookups/etc are going to cost more. We should be careful with terminology, two different things are being mixed up here. There are two dimensions here, giving three three possible approaches: 1. Current: ENSURE_MAPPABLE with shadow chunk table 2. allocate-on-SEGV with shadow chunk table 3. allocate-on-SEGV with direct mapping (1) and (2) are better for 32-bit machines, where address space is cramped, because direct mapping is an address space hog, and we only need a 2-level table for 32-bit addresses. (3) is better for 64-bit machines because address space is plentiful and a shadow chunk table is difficult to do well with 64-bit addresses. Hmm. N |
|
From: Jeremy F. <je...@go...> - 2004-07-17 23:15:23
|
On Fri, 2004-07-16 at 19:30 +0100, Nicholas Nethercote wrote: > (1) and (2) are better for 32-bit machines, where address space is > cramped, because direct mapping is an address space hog, and we only need > a 2-level table for 32-bit addresses. > > (3) is better for 64-bit machines because address space is plentiful and a > shadow chunk table is difficult to do well with 64-bit addresses. Um, 2 seems completely pointless to me, so I was only considering 1 and 3. We can get the kernel to do allocate-on-write by making the shadow segment mapped with PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE - the kernel will only allocate real pages once you write to it. The problem with this is that it allows a buggy Tool to use memory it wasn't expecting to or something. I think its valuable to make the skin explicitly allocate the address space in case 1. J |