|
From: Julian S. <js...@ac...> - 2005-09-19 17:58:10
|
As you may know, Valgrind's low level address space manager has been completely rewritten in this past couple of weeks. The aims are: - to allow more flexible memory layout, which will help portability and removes the reliance on VM overcommit in kernels - to allow use of > 2 G of address space on 64-bit platforms. Currently it is set up to limit the process size to about 16G, but that is trivially changed. - on 64-bit targets, to keep as many client mappings as possible below 16G/32G/whatever we choose, so that they are handled by the fast-case paths in memcheck - to be initialised extremely early in the startup sequence, and to rely on practically nothing (specifically, no dynamic memory allocation) so as to get rid of some longstanding circular dependencies and resulting awkward startup-time problems. As of r4681, the it works well enough for hardened hacker types to try it out. It is still a work in progress, so expect breakage. I plan to clean up and stabilise it during this week, and, if nothing adverse happens, then move on to merging it into the trunk. The plan is to ship the new address space manager in 3.1.0. You can get hold of a copy using svn co svn://svn.valgrind.org/branches/ASPACEM Detaching the old manager and reconnecting the new one has been a massively intrusive process, so lots of little bits and pieces have changed all over the place, and various things are broken that weren't before. As of now I have 19 stderr fails and 6 stdout fails on SuSE 9.1 (x86). Here are some notes about stuff still outstanding: * the old manager code is all still there, but disabled, so as to make bringing up the new one easier. * the root thread's stack is limited to 8M, regardless of what rlimit.stack_size says. Ditto its data segment. * m_libcmman is basically dead. All requests that have anything to do with address space (mmap, etc) have to go direct to aspacem (see pub_core_aspacemgr.h, which has extensive comments on the new entry points). * Architectural issue: tools that ask for malloc replacements (massif, memcheck) cannot currently set a non-default redzone size. In order to make memcheck work I have currently kludged it by setting the default RZ size to 16; however that makes massif totally not work. Nick to consider? * Tom: you had some layout issues re amd64? Can you outline them again? You said you had to change map_base in order that the dataseg placement didn't fail. I need to make the dataseg placement more robust. * none/tests/cmdline* mostly fail. I don't understand how the command line parsing works. Anyhow, there is some bad interaction with the new startup sequence. * Architectural issues: - get_seg_starts() is duplicated in m_main.c and mc_main.c. They should be commoned up and put somewhere suitable, but I am not sure where. - m_mallocfree has a new function VG_(out_of_memory_NORETURN), which really should be given a proper home. There is probably more breakage, but that'll do for now :-) J |
|
From: Nicholas N. <nj...@cs...> - 2005-09-19 19:30:33
|
On Mon, 19 Sep 2005, Julian Seward wrote: > * Architectural issue: tools that ask for malloc replacements > (massif, memcheck) cannot currently set a non-default redzone > size. In order to make memcheck work I have currently kludged > it by setting the default RZ size to 16; however that makes massif > totally not work. Nick to consider? I just fixed this by making the client arena get initialised later than the others, so that the tool has a chance to set the redzone size beforehand. Nick |
|
From: Julian S. <js...@ac...> - 2005-09-19 23:18:44
|
> > * Architectural issue: tools that ask for malloc replacements > > (massif, memcheck) cannot currently set a non-default redzone > > size. In order to make memcheck work I have currently kludged > > it by setting the default RZ size to 16; however that makes massif > > totally not work. Nick to consider? > > I just fixed this by making the client arena get initialised later than > the others, so that the tool has a chance to set the redzone size > beforehand. Great, thanks. Also for the command line stuff. J |
|
From: Nicholas N. <nj...@cs...> - 2005-09-19 20:17:33
|
On Mon, 19 Sep 2005, Julian Seward wrote: > * none/tests/cmdline* mostly fail. I don't understand how the command > line parsing works. Anyhow, there is some bad interaction with the > new startup sequence. I've fixed this. Nick |
|
From: Tom H. <to...@co...> - 2005-09-20 07:55:26
Attachments:
aspacem.patch
|
In message <200...@ac...>
Julian Seward <js...@ac...> wrote:
> * Tom: you had some layout issues re amd64? Can you outline them
> again? You said you had to change map_base in order that the
> dataseg placement didn't fail. I need to make the dataseg
> placement more robust.
You seem to have addressed most of these last night, although in
different ways to me.
There are (or were) two show stoppers. The first was the fact that
memory below the 64Mb point was reserved which I fixed by only
reserving 4Mb and you have now fixed by allowing fixed maps in
reserved areas.
The second was that the interpreter tended to get mapped immediately
above the client which then caused mapping of the data segment to
fail. I fixed that by moving map_base and you fixed it by moving the
data segment if the first mapping failed. My patch to map the
interpreter at it's preferred address can also help with this by
stopping it being mapped immediately above the client so I have
just committed that - it is good anyway as it makes the memory layout
more like the non-valgrind case.
One outstanding patch I have which I mentioned before it to adjust
the memory layout a bit so valgrind's allocations are properly above
the 16Gb level and the client stack is below it. That patch is
attached - it also reduces the reserved chunk at address zero
to 4Mb but you might want to drop that bit now.
The major outstanding problem I'm seeing on x86 is the redirection
of _dl_sysinfo_int80 to a routine inside valgrind as valgrind now
refuses to do a translation from code in an SkFileV segment. There
is a similar problem with the vsyscall routines on amd64.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Julian S. <js...@ac...> - 2005-09-20 09:50:07
|
> The major outstanding problem I'm seeing on x86 is the redirection > of _dl_sysinfo_int80 to a routine inside valgrind as valgrind now > refuses to do a translation from code in an SkFileV segment. There > is a similar problem with the vsyscall routines on amd64. One possibility is to use the same solution Jeremy devised for 2.2.0, which was to copy that code to a page in the initial client stack and use a system of offsets to figure out where the entry points went. That's a bit awkward because of the offsets. Another possibility is to find the one page which these routines occupy and change its ownership from V to C. In order to ensure that the client didn't inadvertantly acquire execute permission for any other bits of V which happened to lie on that one page, we could put 4096 bytes worth of ud2s immediately before and after the routines. That would guarantee that the only useful stuff in the page is the routines themselves. What do you reckon? Insane hack or plausible? J |
|
From: Tom H. <to...@co...> - 2005-09-20 10:17:26
|
In message <200...@ac...>
Julian Seward <js...@ac...> wrote:
>> The major outstanding problem I'm seeing on x86 is the redirection
>> of _dl_sysinfo_int80 to a routine inside valgrind as valgrind now
>> refuses to do a translation from code in an SkFileV segment. There
>> is a similar problem with the vsyscall routines on amd64.
>
> One possibility is to use the same solution Jeremy devised for 2.2.0,
> which was to copy that code to a page in the initial client stack and
> use a system of offsets to figure out where the entry points went.
That was one of the two solutions I came up with - it doesn't have
to be on the stack of course, we could just allocate a client page
and copy the code to that.
The other one was to move those routines to vgpreload_core so that
the intercepts are done in the normal way - it would mean adding a
symbol encoding for addr->addr intercepts for amd64.
There might be an issue with the intercepts not being in place
early enough, but if that was a problem then moving to the scheme
that we discussed briefly a while ago of doing the preloads
ourselves rather than using LD_PRELOAD might work.
> That's a bit awkward because of the offsets. Another possibility
> is to find the one page which these routines occupy and change its
> ownership from V to C. In order to ensure that the client didn't
> inadvertantly acquire execute permission for any other bits of V
> which happened to lie on that one page, we could put 4096 bytes
> worth of ud2s immediately before and after the routines. That
> would guarantee that the only useful stuff in the page is the
> routines themselves.
You would also have to make sure the trampoline code was at the
start of a page or put 4096 bytes of ud2s before it as well...
Putting that code in a separate ELF section or something might make
it easier to make it a separate part that had it's own page.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Julian S. <js...@ac...> - 2005-09-20 12:33:11
|
> > That's a bit awkward because of the offsets. Another possibility > > is to find the one page which these routines occupy and change its > > ownership from V to C. In order to ensure that the client didn't > > inadvertantly acquire execute permission for any other bits of V > > which happened to lie on that one page, we could put 4096 bytes > > worth of ud2s immediately before and after the routines. That > > would guarantee that the only useful stuff in the page is the > > routines themselves. Easy enough to do (r4699). This gets me down to == 160 tests, 15 stderr failures, 5 stdout failures ================= on amd64, which is not bad. > You would also have to make sure the trampoline code was at the > start of a page or put 4096 bytes of ud2s before it as well... I just put 4k of ud2s before and after it, so the alignment then doesn't matter (anything for a simple life :-). Resulting maps then are: ( 0) /home/sewardj/VgASPACEM/aspacem/Inst/lib/valgrind/memcheck 11: FILE 70000000-7001EFFF 126976 r-x- d=0x302 i=555480 o=0 (0) 12: file 7001F000-7001FFFF 4096 r-x- d=0x302 i=555480 o=126976 (0) 13: FILE 70020000-7012DFFF 1105920 r-x- d=0x302 i=555480 o=131072 (0) (where "FILE" == SkAnonV, "file" == SkAnonC), which is as expected. Does it work for you? J |
|
From: Tom H. <to...@co...> - 2005-09-20 13:10:42
|
In message <200...@ac...>
Julian Seward <js...@ac...> wrote:
> Easy enough to do (r4699). This gets me down to
> == 160 tests, 15 stderr failures, 5 stdout failures =================
> on amd64, which is not bad.
Seems to work for me too. After adding the same for x86 I now
get this on amd64:
== 160 tests, 16 stderr failures, 6 stdout failures =================
and this on x86:
== 182 tests, 17 stderr failures, 5 stdout failures =================
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Nicholas N. <nj...@cs...> - 2005-09-20 13:21:23
|
On Tue, 20 Sep 2005, Tom Hughes wrote: >> Easy enough to do (r4699). This gets me down to >> == 160 tests, 15 stderr failures, 5 stdout failures ================= >> on amd64, which is not bad. > > Seems to work for me too. After adding the same for x86 I now > get this on amd64: > > == 160 tests, 16 stderr failures, 6 stdout failures ================= > > and this on x86: > > == 182 tests, 17 stderr failures, 5 stdout failures ================= I have this on x86: == 181 tests, 15 stderr failures, 6 stdout failures ================= memcheck/tests/execve2 (stderr) memcheck/tests/leak-cycle (stderr) memcheck/tests/leak-tree (stderr) memcheck/tests/leakotron (stdout) memcheck/tests/mempool (stderr) memcheck/tests/pointer-trace (stderr) memcheck/tests/stack_changes (stderr) memcheck/tests/vgtest_ume (stderr) memcheck/tests/x86/scalar (stderr) none/tests/as_mmap (stderr) none/tests/as_shm (stdout) none/tests/as_shm (stderr) none/tests/faultstatus (stderr) none/tests/map_unmap (stdout) none/tests/map_unmap (stderr) none/tests/mremap2 (stdout) none/tests/sigstackgrowth (stdout) none/tests/sigstackgrowth (stderr) none/tests/stackgrowth (stdout) none/tests/stackgrowth (stderr) none/tests/x86/int (stderr) Nick |
|
From: Julian S. <js...@ac...> - 2005-09-20 13:30:27
|
> Seems to work for me too. After adding the same for x86 I now > get this on amd64: > > == 160 tests, 16 stderr failures, 6 stdout failures ================= > > and this on x86: > > == 182 tests, 17 stderr failures, 5 stdout failures ================= Cool! That looks like progress to me! J |
|
From: Tom H. <to...@co...> - 2005-09-20 13:29:05
|
In message <200...@ac...>
Julian Seward <js...@ac...> wrote:
> I just put 4k of ud2s before and after it, so the alignment then
> doesn't matter (anything for a simple life :-). Resulting maps then
> are:
>
> ( 0) /home/sewardj/VgASPACEM/aspacem/Inst/lib/valgrind/memcheck
> 11: FILE 70000000-7001EFFF 126976 r-x- d=0x302 i=555480 o=0 (0)
> 12: file 7001F000-7001FFFF 4096 r-x- d=0x302 i=555480 o=126976 (0)
> 13: FILE 70020000-7012DFFF 1105920 r-x- d=0x302 i=555480 o=131072 (0)
>
> (where "FILE" == SkAnonV, "file" == SkAnonC), which is as expected.
The problem with that is that the segment list no longs matches
the kernel mapping list, so running with --sanity-level=3 gives:
--9783:0:aspacem sync_check_callback: segment mismatch: V's seg:
--9783:0:aspacem NSegment{FILE, start=0x70000000, end=0x70020FFF, smode=SmFixed, dev=64768, ino=3965434, offset=0, fnIdx=0, hasR=1, hasW=0, hasX=1, hasT=0, mark=0}
--9783:0:aspacem sync_check_callback: segment mismatch: kernel's seg:
--9783:0:aspacem start=0x70000000 end=0x70136FFF dev=64768 ino=3965434 offset=0--9783:0:aspacem sync check at aspacemgr.c:1537 (vgPlain_am_get_advisory): FAILED
--9783:0:aspacem
--9783:0:aspacem Valgrind: FATAL: aspacem assertion failed:
--9783:0:aspacem do_sync_check(__PRETTY_FUNCTION__, __FILE__,__LINE__)
--9783:0:aspacem at aspacemgr.c:1537 (vgPlain_am_get_advisory)
--9783:0:aspacem Exiting now.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Julian S. <js...@ac...> - 2005-09-20 13:50:50
|
> > ( 0) /home/sewardj/VgASPACEM/aspacem/Inst/lib/valgrind/memcheck > > 11: FILE 70000000-7001EFFF 126976 r-x- d=0x302 i=555480 o=0 (0) > > 12: file 7001F000-7001FFFF 4096 r-x- d=0x302 i=555480 o=126976 (0) > > 13: FILE 70020000-7012DFFF 1105920 r-x- d=0x302 i=555480 o=131072 (0) > > > > (where "FILE" == SkAnonV, "file" == SkAnonC), which is as expected. > > The problem with that is that the segment list no longs matches > the kernel mapping list, so running with --sanity-level=3 gives: This is true. But I noticed a couple of days back that the comparison scheme is flawed anyway: after doing a suitable series of mprotects, the kernel's map winds up with two adjacent mappings which (afaics) could be merged but aren't, whereas the segment list does merge them. So the comparer needs a redesign anyway, and I'm hoping that that will fix this problem too. The (vague) idea I had in mind was to change VG_(read_procselfmaps) so that it not only passes to the callback all the segments it reads, but also all the spaces in between them. The result is that the callback would see the kernel's account of the entire address space. For each presented kernel segment-or-hole, the callback snoops around the segment array to find not just one, but possibly a set of NSegments which cover the presented range with the correct permissions. So it's more robust. I don't think I have the details right yet .. the comparer will have to cope with both multiple kernel segs associated with one aspacem seg, and the other way around too. Perhaps a better way to say what we're looking for is: go look for places where the kernel and V disagree about whether a given page is mapped or not. (even though the segment boundaries may disagree, V and K should agree on a page-by-page basis both about mappedness and about the permissions, and the identity and offset into any associated file). It would be great to have the sanity checker working robustly again. It did help me find some bugs in mremap handling but I rapidly abandoned it because of these problems. If you have enthusiasm .. J |
|
From: Nicholas N. <nj...@cs...> - 2005-09-20 15:07:21
|
On Tue, 20 Sep 2005, Julian Seward wrote: > I don't think I have the details right yet .. the comparer will have to > cope with both multiple kernel segs associated with one aspacem seg, > and the other way around too. During comparison, shouldn't you just merge any mergeable, adjacent segments in both lists? Then they should match. Nick |
|
From: Greg P. <gp...@us...> - 2005-09-20 18:55:22
|
Julian Seward writes: > Perhaps a better way to say what we're > looking for is: go look for places where the kernel and V disagree about > whether a given page is mapped or not. (even though the segment > boundaries may disagree, V and K should agree on a page-by-page basis > both about mappedness and about the permissions, and the identity and > offset into any associated file). I did exactly this for a quick checker on Darwin. It simply iterated through address space a page at a time, and asked both Valgrind and the kernel what was at that address. (It was actually a little smarter than that, to avoid making 4 million requests, but it never tried to match Valgrind segments to kernel regions.) Running the checker after syscalls is a great way to find RPC calls that change the memory map behind Valgrind's back. I only checked mappedness. It looks like the kernel's rwx permissions don't tell the whole story (have to work on that later), and I don't have a region->file mechanism yet. -- Greg Parker gp...@us... |