|
From: Jeremy F. <je...@go...> - 2005-02-14 17:33:53
|
On Mon, 2005-02-14 at 11:18 +0000, Julian Seward wrote: > I had a run-in last night with mmap. This, recent amd64 > hackery, and contemplation of ppc and MacOSX, got me thinking > again about address space layout. > > Currently on x86 we divide the address space at some point > (eg, 0x52000000), force the client to exist below that boundary > and give all above it to V. This scheme allows pointercheck to > be implemented for free using x86 segmentation. > > On x86 that works well, although it does seem to have given > various kinds of brittleness when running large processes > and/or for people with unusual kernel/user address boundaries. Using -fpie has relieved a lot of that brittleness. We still get constrained by address space if the user boundary is too low, but at least it runs. > On amd64 that scheme isn't going to work, since all > client code has to be below 2G. A more flexible layout is > needed. That only applies to the main executable. All the shared libraries, ld.so, etc are much higher (~0x3f3b'0000'0000 on my machine). If we can keep generated code within 2G of Valgrind itself, then we can generate fairly compact code; otherwise it will need to do everything with absolute 64-bit pointers. > On ppc32, and other architectures, which I'm sure we'll > eventually get to, we can't implement pointercheck the way we > do now anyway. PPC has its own interesting segmentation magic which might be usable. > To support MacOSX and other OSs, a more flexible approach > wouldn't hurt. > > I've been thinking about a low-overhead, portable, software > implementation of pointercheck which would allow more flexibility. > > One observation is (unless I am wrong) is that pointercheck > is valueless when running Memcheck, since Memcheck should be > able to catch all illegal accesses (if this isn't true then > Memcheck is broken and we should fix it). That means we > can omit pointercheck when doing Memcheck and so any > software-based checking scheme will have zero overhead for > our most-used tool. As Tom said, I think this is a bit over-simplified. Besides, I suspect the cost of a pointer check is tiny compared to all the other extra work memcheck does - particularly if the codegen can schedule the check in advance of the actual memory access (and even CSE it for multiple accesses). > If the table is accessible at a fixed offset from the guest state > pointer, this becomes four insns on x86: > > movl %addr, %tmp > shrl $n, %tmp > testb $0, offset(%ebp, %tmp, 1) > jz OK > <handle failure somehow> > OK: > > There are a couple of options for <handle failure now>, which need > to be investigated for their impact on code size and the extent to > which they inhibit IR optimisation. Either way, they can be expressed > purely in IR, so no portability questions there either. Could we generate the <handle failure> code out of line (after or before the BB's main generated code), and change it to jnz,pn failure In either case, on ia32 we could use an interrupt instruction to raise a fault (2 byte instruction). > Why 64M-sized blocks? 64M is a coarse but just-about-manageable > granularity. On a 32-bit target, with 64M superblocks, the check > table has only 64 entries -- that is, 64 bytes -- so accesses to it > are unlikely to cause significant cache pollution (iow I hope all of > it will end up permanently in D1). > > On a 64-bit target, we clearly cannot have a check table covering > the entire address space since that would require 2^38 entries. > However, even a 4096-entry table would cover 256GB of address space, > which is more than enough for the foreseeable future. The access > test would become a little more expensive because it would also have > to test that the upper bits of the address which are not covered by > the table, are all zero -- assuming the table maps the address > space of 0 - 256GB and not some other part. > > So, what am I missing? It looks reasonable. For ia32 I would suggest sticking to the current layout, so there would just be a series of fixed, preallocated superblocks. In fact, for any 32-bit implementation I think that's the best way to go. Since there are only 64 superblocks, we can pretty easily determine in advance whether they'll end up being client or Valgrind usable (maybe a few of them would be unallocated). For 64-bit, there's no reason we couldn't use, say, a 16Gbyte super-page size and have a 16k-entry table for a 48-bit address space. At the moment Linux constrains user-space to 256Gbytes, but after 2.6.1[12] that will unconstrained (ie, up to 2^47 or so, I think) because it will use a 4-level pagetable structure. At that point, we could put Valgrind way up high and leave the client with a very clear address space. J |