|
From: Jeremy F. <je...@go...> - 2004-09-01 08:36:08
|
On Tue, 2004-08-31 at 23:50 +0100, Nicholas Nethercote wrote: > P2. For kernels with "overcommit" mmapping off -- which prevents a process > from allocating more address space than the available swap space -- you > need at least 1.5GB of swap for Memcheck to run, because swap must be at > least as large as any individual segment. (And I think users with ulimit -v > set suffer the same problem.) > > S2. Avoiding this requires not using the big-bang shadow allocation > method, and that shadow memory instead be done incrementally. (More about > that below.) Does MAP_NORESERVE work? I'm not sure that incremental mmaping is enough in all circumstances, because you're trying to work around heuristics the kernel applies to allocations. MAP_NORESERVE simply tells the kernel that it shouldn't apply any heuristics to this allocation. If you have strict overcommit enabled, then there's no choice but to have enough swap - it assumes you're going to use every byte you map. > ----------------------------------------------------------------------------- > P3. Machines with small user spaces (eg. 2G:2G machines) cannot run > Memcheck, because the big shadow memory region covers 0x40000000, which > is where normal programs want to put their shadow maps. Shadow maps? Do you mean something else? Client mapbase? We can put that somewhere else (put it high, and allocate mappings growing down, for example). > S3. Fixing this requires that the boundary between client and Valgrind > is not fixed, which requires incremental shadow memory allocation. > > ----------------------------------------------------------------------------- > P4. Tools sometimes run out of address space when there is still address > space in other regions free. > > S4. The rigidity of the client/shadow-mem/valgrind division must be > reduced to fix this. > > ----------------------------------------------------------------------------- > P5. Large executables (eg. 200MB+) cannot be loaded in memory by > Valgrind in order to read their debug info. > > S5. Two possible fixes: > - make client/shadow-mem/valgrind divisions less rigid > - incremental debug info reading (but that's impossible for stabs) Well, even for stabs there's no need to mmap the whole thing in; if we just read (or mmaped) chunks at a time and processed them serially, we can extract everything needed. > ----------------------------------------------------------------------------- > Discussion > ----------------------------------------------------------------------------- > P1 can be solved independently of the others, and uncontroversially. > > P2--P5 are all related. For 32-bit machines, big-bang shadow memory > allocation does not seem appropriate -- the size of the map required > causes P2. The resulting rigidity of the address space causes P3--P5. > > The downside of switching to incremental shadow memory is that it makes > direct-offset shadow addressing impossible, at least on 32-bit. > Direct-offset seems much more plausible on 64-bit, where we have more > address to play with. But the benefits of direct-offset still are not > clear (Jeremy's experiments didn't show a speed improvement), and we don't > have any 64-bit ports working yet. The trouble is that memcheck&co do have a fixed ratio of shadow memory to real memory used. If the client uses its address space sparsely then it causes sparse (wasteful) use of the shadow memory, but since we get to place all the mmaps, we needn't make it have sparse memory use. The exception is if the client explicitly places its mappings, but I don't think that's common. So I know that people are running into memory problems, but it isn't clear to me that we can't solve them by using the address space more densely. Tools which don't have a fixed ratio (cachegrind) are another issue. They're not, technically, using shadow memory (since there isn't the 1:1 relationship between client addresses and shadow addresses), but Valgrind heap. > > ----------------------------------------------------------------------------- > Solution > ----------------------------------------------------------------------------- > A couple of steps: > > - Don't use big-bang allocation for shadow memory. Make shadow memory > maps allocated just like any other Valgrind/tool memory. Thus > Valgrind would have a single region for itself, instead of two > separate ones. This would solve P2, P4 and P5. > > - Make the client/valgrind division movable. Client memory would grow > up, Valgrind memory would grow down. This would largely solve P3. > > Only question here is: where does the stack go? If the stack size is > ulimited (eg. to 8MB), there's little problem. Otherwise, perhaps > below client_mapbase, so it grows down towards the upward-growing > heap. [nb: what happens if they collide? undefined?] We could put the stack below the executable. The x86 ABI allows this (and Solaris x86 does this). It could break some programs which assume the stack is high, but most won't care. J |