You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
|
1
(15) |
2
(12) |
3
(11) |
4
(20) |
5
(6) |
|
6
(6) |
7
(7) |
8
(8) |
9
(17) |
10
(25) |
11
(27) |
12
(6) |
|
13
(28) |
14
(16) |
15
(20) |
16
(9) |
17
(26) |
18
(7) |
19
(25) |
|
20
(7) |
21
(18) |
22
(25) |
23
(15) |
24
(21) |
25
(32) |
26
(15) |
|
27
(23) |
28
(33) |
|
|
|
|
|
|
From: Jeremy F. <je...@go...> - 2005-02-24 17:33:56
|
Julian Seward wrote:
>Yes, that's the kludge we have at present. It's almost plausible, but
>I have been unable to figure out the boundary conditions that have to
>be observed to *guarantee* there will never be an infinite recursion.
>I don't much like the idea of a fixed-sized table either, but I have
>yet to be convinced the "early enough" scheme can be made reliable.
>
>
Well, the current code waits until the existing superblock has no space
at all for the new allocation, and then it allocates a new superblock.
If each superblock had some reserved space, enough to describe its
successor, there wouldn't be a problem.
>> What problems are there, and how are they avoided by this scheme?
>>
>>
>
>* big-bang shadow allocation causes problems on kernels that don't
> do overcommitment
>
>* a fixed partitioning scheme is less appropriate if we move towards
> compressed representations of shadow memory, since that compression
> ratio could be variable
>
>
OK.
>I think what bothers me is -- on a 64-bit platform we will have to
>generate two temporary mappings that together cover almost all
>of the 2^64 bytes of address space. So are we assured that the kernel
>will not barf at this point? Particularly given that on RH8 we had
>kernels barfing on allocating a 1.5GB shadow area all in one go.
>
>
Fortunately, the 64-bit PC world is pretty new, and people are using
recent kernels and are more likely to upgrade (because the existing
kernels are relatively buggy). I've been talking to the kernel people
to make sure we can get the kind of functionality we need, like being
able to create terabyte mappings if we promise not to touch them. (Oh,
and its *only* 2^48ish bytes.)
>IOW - padding is fine if it's reliable and portable(ish). So, exactly
>how reliable and portable is it? On non-Linux kernels?
>That I don't know.
>
Yep, it's a concern. We might need to do things like construct the
padding incrementally to sneak under the kernel's radar. If the kernel
has an absolute prohibition on overcommit, we pretty screwed.
J
|
|
From: Jeremy F. <je...@go...> - 2005-02-24 17:06:58
|
Julian Seward wrote:
>The existing scheme is at least simple, in that you can simply say
>that a block is leaked/potentially leaked if no pointer to it is
>found/only a pointer to its interior is found. And from the kind of
>questions that have arisen in the past, it already confuses users.
>
>
That's possible. But I think a lot of people (including me) are being
mislead into thinking "oh, its only a small leak, no need to worry". A
lot of the code I've pointed this at (which isn't very much yet) has had
a much larger leakage problem than previously reported.
>Once you get into cycle detection, we need to explain about root sets
>and chasing pointers from there. Could you circulate a proposed
>documentation update?
>
>
OK. Hm, there doesn't seem to be any description about what the existing
leak checker does.
>* Clique is the wrong terminology. My understanding is that a clique
> is a completely-connected set of nodes in an undirected graph: each
> node-pair in the clique has a connecting edge. What you mean is
> Strongly Connected Components (SCCs), which are always associated with
> finding cycles in directed graphs.
>
>
Hm, it is a directed graph, and it is just looking for all the nodes it
contains. It isn't trying to find SCCs. The algorithm is very simple:
for each Unreached block:
push it onto the mark stack
mark everything traced from it as being IndirectLeak
The "clique leader" is the block which is left Unreached, and it
contains a summary of the size of all the blocks it points to.
The only slightly tricky part is when, while marking, we encounter a
block which was previously considered to be the root of a lost graph; it
gets merged into the current graph.
>* How much extra storage will the SCC detector require in the
> worst case?
>
>
None. It just reuses the marker machinery. The marker uses 2 words per
allocated block, which is only allocated during leak checking.
>In short, chase from all register sources,
>including vector registers.
>
Fair enough. It should probably skip registers/parts of registers which
the tool says is not defined though.
J
|
|
From: Julian S. <js...@ac...> - 2005-02-24 11:16:27
|
[My ISP seems to have lost the last 24 hours of inbound mail. So am constructing a reply from the list archive at SF. Sigh.] Taking Jeremy and Greg's comments together, it certainly seems like flexibility over stage2 load addresses, overall layout, and perhaps superblock size is important. > >* The 64M superblocks have 4 possible ownership states: > > Unallocated > > Valgrind"s -- V"s text, stack, static data, dynamic data > > Shadow > > Client > > > > > Is there any difference between Shadow and Valgrind? Do they behave > differently in any respect, or is it just that their contents mean > different things? No real difference; they could be merged. I just thought it might be nice for debugging & accounting purposes to distinguish the two kinds. > > That"s not wonderful, but even a 1 Mb static area should hold > > enough info to track several thousand segments. > > > Well, you could still allocate them dynamically, so long as you do it > early enough (ie, before you"re so short of space that there"s no room > to describe the new space you"re allocating). Yes, that's the kludge we have at present. It's almost plausible, but I have been unable to figure out the boundary conditions that have to be observed to *guarantee* there will never be an infinite recursion. I don't much like the idea of a fixed-sized table either, but I have yet to be convinced the "early enough" scheme can be made reliable. > > - In general, on a 32-bit machine, because memory is allocated > > in 64M superblocks to either shadow, client or V-internal, we get > > rid of all problems associated with the current hard partitioning > > scheme between client and shadow memory. Big-bang allocation is > > done away with. We know we can still protect V from wild writes > > by the client at fairly minimal expense. > > > What problems are there, and how are they avoided by this scheme? * big-bang shadow allocation causes problems on kernels that don't do overcommitment * a fixed partitioning scheme is less appropriate if we move towards compressed representations of shadow memory, since that compression ratio could be variable > > - On a 64-bit machine, all code is to be mapped in below 1 G, but > > apart from that ASpaceMgr can be fairly relaxed about fragmentation > > in the area above 1 G. > > > Er, why? We have terabytes of address space to play with. Why make > holes in it? There"s no technical reason we need to put code down that low. Fair enough. I was just trying to come up with a single load address which would work OK for all Linuxes, but perhaps being flexible is better. See comment at top of this msg. > Also, choosing a fixed address prevents Valgrind from running under > itself. I think we should keep that. Agreed. I realised my initial proposal would have killed that and hadn't yet figured out how to fix it. > No, the main disadvantages are that this is fantastically complex, makes > us very dependent on the toolchain (particularly all the little > side-channels between gcc and binutils), and is architecture and OS > dependent (since not everyone uses ELF). By comparison, padding the > address space and using libc carefully are very simple and portable. > > I"m still in favour of 1) use PIE where available (otherwise choose a > static loading address), 2) using a flexible address space configuration > which allows Valgrind to run under itself, 3) try not to get too > involved with the object file formats. My current incarnation of the linker is 1400 lines of code, and that does x86 amd64 ppc32 sparc32 and arm (all ELF). Let me say I am less than convinced by this idea myself -- it's just a possibility. The dependence on object file formats bothers me, as does the lack of debug info support. It's just that I find the idea of address-space-padding kind-of unappealing, but otoh (1) it does appear to work, and (2) it's simple. So perhaps it's the least- worst option. I think what bothers me is -- on a 64-bit platform we will have to generate two temporary mappings that together cover almost all of the 2^64 bytes of address space. So are we assured that the kernel will not barf at this point? Particularly given that on RH8 we had kernels barfing on allocating a 1.5GB shadow area all in one go. IOW - padding is fine if it's reliable and portable(ish). So, exactly how reliable and portable is it? On non-Linux kernels? That I don't know. J |
|
From: Julian S. <js...@ac...> - 2005-02-24 10:32:03
|
[My ISP seems to have lost the last 24 hours of inbound mail. So am constructing a reply from the list archive at SF.] > What do people think? Overall, it sounds like a good thing. I have some questions tho (in no particular order) > * It under-reports lost memory, by only pointing out completely > undereferenced allocations. This means that apparently small > leaks are actually large, if they refer to a lot of other > memory. > * It completely fails to report leaked cycles. We have software > which uses refcounting, which also loses cycles; we were hoping > that Valgrind would point out deficiencies in the refcount management. > * It doesn"t trace from registers, so it can report blocks as leaked > even if there"s a register reference (not common, I admit). > > To fix all this, I changed the leak checker to use a standard mark-sweep > algorithm. It does a pass from the root set to find unleaked memory, > and then makes a pass over the leaked memory to group it into cliques > (connected graph of allocations); each clique is reported as a leak, > rather than each individual allocation. The real problem is not to implement algorithm X, Y or Z. It is to explain to users what it is that V is really measuring for them - build them a useful mental model of what's happening - so they can reason about what's going on. The existing scheme is at least simple, in that you can simply say that a block is leaked/potentially leaked if no pointer to it is found/only a pointer to its interior is found. And from the kind of questions that have arisen in the past, it already confuses users. Once you get into cycle detection, we need to explain about root sets and chasing pointers from there. Could you circulate a proposed documentation update? ---------- Other comments. * Clique is the wrong terminology. My understanding is that a clique is a completely-connected set of nodes in an undirected graph: each node-pair in the clique has a connecting edge. What you mean is Strongly Connected Components (SCCs), which are always associated with finding cycles in directed graphs. * How much extra storage will the SCC detector require in the worst case? > >On some architectures (e.g. Mac OS X on PowerPC) you need to > >scan registers other than the general-purpose ones. If you catch > >optimized, unrolled memmove() at the wrong point, a floating-point > >or vector register could contain the only extant copy of a pointer > >value. You'd have to be pretty unlucky to hit this while using > >Valgrind's leak check, so it may not be worth worrying about. > > > > > Sounds a bit unlikely. It would be tricky because you'd have to chop > the vector up into pointer-sized chunks and inspect each of them. Jeremy is right in that we should also chase roots from registers. The discussion about which regs do and don't contain pointers is spurious, when viewed in a wider context: we are prepared to consider any aligned, accessible, defined word in memory as a potential pointer with no further questions asked. So why make a distinction for register-held values? In short, chase from all register sources, including vector registers. Note that advanced compilers (icc et al) generate vector loops for all kinds of code and you can't assume that vector registers won't contain the-only-copies-of-pointers-in-transit at some point. Chopping up registers etc is not a big deal. Valgrind already does not know the layout of guest-state (in Vex-world) and has to ask Vex even basic questions (where/how big is the PC? SP? etc). So it's not much of a leap to ask Vex to enumerate all the word-sized guest- state offsets pertaining to int/vector registers. J |
|
From: Tom H. <th...@cy...> - 2005-02-24 09:54:01
|
Nightly build on ginetta ( Red Hat 8.0 ) started at 2005-02-24 03:10:02 GMT Checking out source tree ... done Configuring ... done Building ... done Running regression tests ... done Last 20 lines of log.verbose follow as_mmap: valgrind ./as_mmap as_shm: valgrind ./as_shm erringfds: valgrind ./erringfds fdleak_cmsg: valgrind --track-fds=yes ./fdleak_cmsg < /dev/null fdleak_creat: valgrind --track-fds=yes ./fdleak_creat < /dev/null fdleak_dup: valgrind --track-fds=yes ./fdleak_dup < /dev/null fdleak_dup2: valgrind --track-fds=yes ./fdleak_dup2 < /dev/null fdleak_fcntl: valgrind --track-fds=yes ./fdleak_fcntl < /dev/null fdleak_ipv4: valgrind --track-fds=yes ./fdleak_ipv4 < /dev/null fdleak_open: valgrind --track-fds=yes ./fdleak_open < /dev/null fdleak_pipe: valgrind --track-fds=yes ./fdleak_pipe < /dev/null fdleak_socketpair: valgrind --track-fds=yes ./fdleak_socketpair < /dev/null pth_atfork1: valgrind ./pth_atfork1 pth_cancel1: valgrind ./pth_cancel1 pth_cancel2: valgrind ./pth_cancel2 pth_cvsimple: valgrind ./pth_cvsimple pth_empty: valgrind ./pth_empty pth_exit: valgrind ./pth_exit Could not read `pth_exit.stderr.exp' make: *** [regtest] Error 2 |
|
From: Tom H. <th...@cy...> - 2005-02-24 09:53:52
|
Nightly build on alvis ( Red Hat 7.3 ) started at 2005-02-24 03:05:03 GMT Checking out source tree ... done Configuring ... done Building ... done Running regression tests ... done Last 20 lines of log.verbose follow as_mmap: valgrind ./as_mmap as_shm: valgrind ./as_shm erringfds: valgrind ./erringfds fdleak_cmsg: valgrind --track-fds=yes ./fdleak_cmsg < /dev/null fdleak_creat: valgrind --track-fds=yes ./fdleak_creat < /dev/null fdleak_dup: valgrind --track-fds=yes ./fdleak_dup < /dev/null fdleak_dup2: valgrind --track-fds=yes ./fdleak_dup2 < /dev/null fdleak_fcntl: valgrind --track-fds=yes ./fdleak_fcntl < /dev/null fdleak_ipv4: valgrind --track-fds=yes ./fdleak_ipv4 < /dev/null fdleak_open: valgrind --track-fds=yes ./fdleak_open < /dev/null fdleak_pipe: valgrind --track-fds=yes ./fdleak_pipe < /dev/null fdleak_socketpair: valgrind --track-fds=yes ./fdleak_socketpair < /dev/null pth_atfork1: valgrind ./pth_atfork1 pth_cancel1: valgrind ./pth_cancel1 pth_cancel2: valgrind ./pth_cancel2 pth_cvsimple: valgrind ./pth_cvsimple pth_empty: valgrind ./pth_empty pth_exit: valgrind ./pth_exit Could not read `pth_exit.stderr.exp' make: *** [regtest] Error 2 |
|
From: Tom H. <th...@cy...> - 2005-02-24 09:53:52
|
Nightly build on standard ( Red Hat 7.2 ) started at 2005-02-24 03:00:02 GMT Checking out source tree ... done Configuring ... done Building ... done Running regression tests ... done Last 20 lines of log.verbose follow as_mmap: valgrind ./as_mmap as_shm: valgrind ./as_shm erringfds: valgrind ./erringfds fdleak_cmsg: valgrind --track-fds=yes ./fdleak_cmsg < /dev/null fdleak_creat: valgrind --track-fds=yes ./fdleak_creat < /dev/null fdleak_dup: valgrind --track-fds=yes ./fdleak_dup < /dev/null fdleak_dup2: valgrind --track-fds=yes ./fdleak_dup2 < /dev/null fdleak_fcntl: valgrind --track-fds=yes ./fdleak_fcntl < /dev/null fdleak_ipv4: valgrind --track-fds=yes ./fdleak_ipv4 < /dev/null fdleak_open: valgrind --track-fds=yes ./fdleak_open < /dev/null fdleak_pipe: valgrind --track-fds=yes ./fdleak_pipe < /dev/null fdleak_socketpair: valgrind --track-fds=yes ./fdleak_socketpair < /dev/null pth_atfork1: valgrind ./pth_atfork1 pth_cancel1: valgrind ./pth_cancel1 pth_cancel2: valgrind ./pth_cancel2 pth_cvsimple: valgrind ./pth_cvsimple pth_empty: valgrind ./pth_empty pth_exit: valgrind ./pth_exit Could not read `pth_exit.stderr.exp' make: *** [regtest] Error 2 |
|
From: Jeremy F. <je...@go...> - 2005-02-24 09:42:01
|
In this test, I'm starting Qt designer, hitting "cancel" on the initial
dialogue, then closing the window. It is invoked with "valgrind
--tool=memcheck --leak-check=yes designer".
The current leak checker reports:
[...]
==18142== 3536 bytes in 68 blocks are definitely lost in loss record 425 of 445
==18142== at 0x1B904284: malloc (in /usr/lib/valgrind/vgpreload_memcheck.so)
==18142== by 0x1C0F8132: XftDrawCreate (in /usr/X11R6/lib/libXft.so.2.1.2)
==18142== by 0x1BB2092A: QPixmap::convertFromImage(QImage const&, int) (in /usr/lib/qt-3.3/lib/libqt-mt.so.3.3.3)
==18142== by 0x1BB799C2: QImageDrag::decode(QMimeSource const*, QPixmap&) (in /usr/lib/qt-3.3/lib/libqt-mt.so.3.3.3)
==18142== by 0x1BBDB68A: QPixmap::fromMimeSource(QString const&) (in /usr/lib/qt-3.3/lib/libqt-mt.so.3.3.3)
==18142== by 0x80AC271: (within /usr/lib/qt-3.3/bin/designer)
==18142== by 0x80ACFD8: (within /usr/lib/qt-3.3/bin/designer)
==18142== by 0x8088A77: (within /usr/lib/qt-3.3/bin/designer)
==18142== by 0x8086775: (within /usr/lib/qt-3.3/bin/designer)
==18142== by 0x1C3ABAD3: __libc_start_main (in /lib/tls/libc-2.3.3.so)
==18142==
==18142== LEAK SUMMARY:
==18142== definitely lost: 10016 bytes in 202 blocks.
==18142== possibly lost: 272 bytes in 3 blocks.
==18142== still reachable: 371859 bytes in 9392 blocks.
==18142== suppressed: 0 bytes in 0 blocks.
The mark-sweep checker reports:
[...]
==18124== 128+4014 bytes in 1 blocks are definitely lost in loss record 461 of 483
==18124== at 0x1B9043F3: operator new(unsigned) (vg_replace_malloc.c:132)
==18124== by 0x1D2CC02B: KThemeStylePlugin::keys() const (in /usr/lib/kde3/plugins/styles/kthemestyle.so)
==18124== by 0x1BECFDFD: QStylePluginPrivate::featureList() const (in /usr/lib/qt-3.3/lib/libqt-mt.so.3.3.3)
==18124== by 0x1BE869A0: QGPluginManager::addLibrary(QLibrary*) (in /usr/lib/qt-3.3/lib/libqt-mt.so.3.3.3)
==18124== by 0x1BE85FCB: QGPluginManager::featureList() const (in /usr/lib/qt-3.3/lib/libqt-mt.so.3.3.3)
==18124== by 0x1BECF3A3: QStyleFactory::keys() (in /usr/lib/qt-3.3/lib/libqt-mt.so.3.3.3)
==18124== by 0x80B7EA1: (within /usr/lib/qt-3.3/bin/designer)
==18124== by 0x8088B65: (within /usr/lib/qt-3.3/bin/designer)
==18124== by 0x8086775: (within /usr/lib/qt-3.3/bin/designer)
==18124== by 0x1C3ABAD3: __libc_start_main (in /lib/tls/libc-2.3.3.so)
==18124==
==18124== LEAK SUMMARY:
==18124== definitely lost: 15646 bytes in 243 blocks.
==18124== indirectly lost: 149304 bytes in 4977 blocks.
==18124== possibly lost: 14197 bytes in 33 blocks.
==18124== still reachable: 198076 bytes in 4291 blocks.
==18124== suppressed: 0 bytes in 0 blocks.
The summary is the interesting bit: it shows that the old algorithm
thinks there's only 10k of leakage, and that the heap has 371k of
reachable stuff. The mark-sweep algorithm thinks there's 15k of
unreferenced memory, which in turn references 149k of lost memory, and
the living heap is only 200k.
I included a loss record for comparison; the 128+4014 means that the
reported block is only 128 bytes, but it references a further 4014 bytes
of heap.
Here's an before and after example of the same leak:
Original:
==18212== 224 bytes in 7 blocks are definitely lost in loss record 330 of 443
==18212== at 0x1B9043F3: operator new(unsigned) (in /usr/lib/valgrind/vgpreload_memcheck.so)
==18212== by 0x1CB7BC7E: ???
==18212== by 0x1CB7CF3C: ???
==18212== by 0x1BE6D424: QComLibrary::createInstanceInternal() (in /usr/lib/qt-3.3/lib/libqt-mt.so.3.3.3)
==18212== by 0x1BE6DA07: QComLibrary::qtVersion() (in /usr/lib/qt-3.3/lib/libqt-mt.so.3.3.3)
==18212== by 0x1BE85F24: QGPluginManager::featureList() const (in /usr/lib/qt-3.3/lib/libqt-mt.so.3.3.3)
==18212== by 0x80A4CCD: (within /usr/lib/qt-3.3/bin/designer)
==18212== by 0x8088897: (within /usr/lib/qt-3.3/bin/designer)
==18212== by 0x8086775: (within /usr/lib/qt-3.3/bin/designer)
==18212== by 0x1C3ABAD3: __libc_start_main (in /lib/tls/libc-2.3.3.so)
Mark-sweep:
==18202== 256+5724 bytes in 8 blocks are definitely lost in loss record 300 of 480
==18202== at 0x1B9043F3: operator new(unsigned) (vg_replace_malloc.c:132)
==18202== by 0x1CB7BC7E: ???
==18202== by 0x1CB7CF3C: ???
==18202== by 0x1BE6D424: QComLibrary::createInstanceInternal() (in /usr/lib/qt-3.3/lib/libqt-mt.so.3.3.3)
==18202== by 0x1BE6DA07: QComLibrary::qtVersion() (in /usr/lib/qt-3.3/lib/libqt-mt.so.3.3.3)
==18202== by 0x1BE85F24: QGPluginManager::featureList() const (in /usr/lib/qt-3.3/lib/libqt-mt.so.3.3.3)
==18202== by 0x80CE092: (within /usr/lib/qt-3.3/bin/designer)
==18202== by 0x80A4899: (within /usr/lib/qt-3.3/bin/designer)
==18202== by 0x8088897: (within /usr/lib/qt-3.3/bin/designer)
==18202== by 0x8086775: (within /usr/lib/qt-3.3/bin/designer)
Showing the leak is really losing ~750 bytes per occurrance rather than
the 32 bytes the first loss record suggests.
J
|
|
From: Jeremy F. <je...@go...> - 2005-02-24 08:09:30
|
Greg Parker wrote:
>On some architectures (e.g. Mac OS X on PowerPC) you need to
>scan registers other than the general-purpose ones. If you catch
>optimized, unrolled memmove() at the wrong point, a floating-point
>or vector register could contain the only extant copy of a pointer
>value. You'd have to be pretty unlucky to hit this while using
>Valgrind's leak check, so it may not be worth worrying about.
>
>
Sounds a bit unlikely. It would be tricky because you'd have to chop
the vector up into pointer-sized chunks and inspect each of them.
We also intercept mem*(), so they'd have to be using something fairly
specialized (or inlined).
>Something in Valgrind would need to know which architecture
>registers might hold pointer values. I don't know whether it
>should be the core or the tool, though.
>
>
It seems to me like it should be part of the core arch code. The
question "which registers might contain pointers?" doesn't seem like it
has a tool-specific answer.
J
|
|
From: Greg P. <gp...@us...> - 2005-02-24 03:45:25
|
Jeremy Fitzhardinge writes: > I prefer the callback approach to > passing around arrays, because it doesn't expose the representation of > the arch registers (since it only needs to look at general-purpose > registers which might possibly hold pointers, which might not be > contigious in memory for a particular architecture). On some architectures (e.g. Mac OS X on PowerPC) you need to scan registers other than the general-purpose ones. If you catch optimized, unrolled memmove() at the wrong point, a floating-point or vector register could contain the only extant copy of a pointer value. You'd have to be pretty unlucky to hit this while using Valgrind's leak check, so it may not be worth worrying about. There's also the very special G5 register that holds pthread_self, which is a heap block that can root other heap blocks via thread-specific data. I'm not sure whether that register is ever the only copy of any particular pthread_self, though. Something in Valgrind would need to know which architecture registers might hold pointer values. I don't know whether it should be the core or the tool, though. -- Greg Parker gp...@us... |
|
From: Tom H. <to...@co...> - 2005-02-24 03:29:06
|
Nightly build on dunsmere ( Fedora Core 3 ) started at 2005-02-24 03:20:04 GMT Checking out source tree ... done Configuring ... done Building ... done Running regression tests ... done Last 20 lines of log.verbose follow (cleanup operation failed: rm vgcore.pid*) pushpopseg: valgrind ./pushpopseg rcl_assert: valgrind ./rcl_assert seg_override: valgrind ./seg_override -- Finished tests in none/tests/x86 ------------------------------------ yield: valgrind ./yield -- Finished tests in none/tests ---------------------------------------- == 206 tests, 8 stderr failures, 1 stdout failure ================= helgrind/tests/allok (stderr) helgrind/tests/deadlock (stderr) helgrind/tests/inherit (stderr) helgrind/tests/race (stderr) helgrind/tests/race2 (stderr) helgrind/tests/readshared (stderr) memcheck/tests/scalar (stderr) memcheck/tests/scalar_supp (stderr) none/tests/sigcontext (stdout) make: *** [regtest] Error 1 |
|
From: Tom H. <th...@cy...> - 2005-02-24 03:23:30
|
Nightly build on audi ( Red Hat 9 ) started at 2005-02-24 03:15:03 GMT Checking out source tree ... done Configuring ... done Building ... done Running regression tests ... done Last 20 lines of log.verbose follow rm: cannot remove `vgcore.pid*': No such file or directory (cleanup operation failed: rm vgcore.pid*) pushpopseg: valgrind ./pushpopseg rcl_assert: valgrind ./rcl_assert seg_override: valgrind ./seg_override -- Finished tests in none/tests/x86 ------------------------------------ yield: valgrind ./yield -- Finished tests in none/tests ---------------------------------------- == 205 tests, 7 stderr failures, 1 stdout failure ================= helgrind/tests/allok (stderr) helgrind/tests/deadlock (stderr) helgrind/tests/inherit (stderr) helgrind/tests/race (stderr) helgrind/tests/race2 (stderr) helgrind/tests/readshared (stderr) none/tests/async-sigs (stderr) none/tests/sigcontext (stdout) make: *** [regtest] Error 1 |
|
From: Jeremy F. <je...@go...> - 2005-02-24 02:30:24
|
Nicholas Nethercote wrote:
> Seems reasonable, except that you've put lots of the code into the
> core. The leak checker is currently entirely within Memcheck
> (Addrcheck shares some of that code), and I think this distinction
> should be maintained.
The leak checker is entirely within memcheck, and is completely shared
with addrcheck. I added a couple of helper calls so that the
leakchecker can get information from the core about where mapped memory
is, and what the contents of each thread's registers are. Neither call
is arch-specific (though obviously fetching the register values has an
arch-specific component).
And I can't bring myself to see 44 lines of simple code as "lots".
> So eg. the code added in coregrind/x86/state.c should instead be in
> memcheck/x86/<something.c> (which currently doesn't exist). Or you
> could add an arch-specific function that returns some kind of GP
> register list, which would save you from having arch-specific code in
> Memcheck.
I guess. The code added to the core is pretty small and simple, and it
doesn't add any more arch-dependencies to memcheck. There's no existing
interface for a tool to get a thread's register state, so there would
need to be some extension to the tool interface anyway; I went for the
minimal change to get the job done. I prefer the callback approach to
passing around arrays, because it doesn't expose the representation of
the arch registers (since it only needs to look at general-purpose
registers which might possibly hold pointers, which might not be
contigious in memory for a particular architecture).
J
|
|
From: Nicholas N. <nj...@cs...> - 2005-02-24 02:05:35
|
On Wed, 23 Feb 2005, Jeremy Fitzhardinge wrote: > Nicholas Nethercote wrote: > >> Sounds good in principle. How complex is it, how much code has been >> added? > > Very little. It isn't much more complex than the original checker. > Patch attached; there's about 20 lines more code. This doesn't have the > full clique-grouping algorithm, which will add a bit more code (probably > <100 lines though). Seems reasonable, except that you've put lots of the code into the core. The leak checker is currently entirely within Memcheck (Addrcheck shares some of that code), and I think this distinction should be maintained. So eg. the code added in coregrind/x86/state.c should instead be in memcheck/x86/<something.c> (which currently doesn't exist). Or you could add an arch-specific function that returns some kind of GP register list, which would save you from having arch-specific code in Memcheck. N |
|
From: Jeremy F. <je...@go...> - 2005-02-24 01:30:24
|
Nicholas Nethercote wrote:
> Sounds good in principle. How complex is it, how much code has been
> added?
Very little. It isn't much more complex than the original checker.
Patch attached; there's about 20 lines more code. This doesn't have the
full clique-grouping algorithm, which will add a bit more code (probably
<100 lines though).
J
|
|
From: Greg P. <gp...@us...> - 2005-02-24 01:28:26
|
Julian Seward writes:
> Well, let's say a block descriptor involves { a start address,
> a length, a word containing flags, an int index into a filename
> table, and perhaps a couple more words to make administration
> a bit faster }. That's 40ish bytes. That gives 25k descriptors
> per megabyte. How many do you need? Even if you need 200k
> descriptors I don't care if we have to statically allocate 8M
> to make that work.
25k should be plenty for ordinary apps; my web browser is currently
around 2k. I don't know what a database or sci/tech program might
look like, but in the worst case the answer is "recompile Valgrind"
which should be perfectly acceptable.
> > Mac OS X has some entertaining address space requirements and
> > restrictions.
>
> Can you summarise what they are, so I can muse on the consequences
> thereof?
The primary issues are ones of specialized use of various parts of
the address space for 32-bit programs. 64-bit promises to be much
cleaner.
0x00000000..0x00001000: protected page zero (might be more than one page)
0x00001000..0xWhatever: typical executable; typically non-relocatable
0x8fe00000..0x90000000: dyld (dynamic linker); relocatable
0x90000000..0xb0000000: system libraries; non-relocatable in practice
0xbf800000..0xc0000000: default main stack; executable can specify
a different address or size; might be relocatable with some fancy footwork
0xfffec000..0xffffffff: libc special purpose; non-relocatable
No address space is reserved for the kernel proper. There's no
distinction between malloc-memory and mmap-memory; allocators
always use mmap-style heaps rather than sbrk.
The system libraries in 0x9..0xc are theoretically relocatable,
but in practice the kernel just slams them into their default
locations, ignoring anything else that's there. There's a bug
report filed against this, but it's unlikely to be fixed any
time soon.
Other special-use memory like thread stacks and window server
buffers have preferred locations, but will happily live
elsewhere as necessary. There might be additional special uses
in the last 32MB of address space (which has useful properties
in the PowerPC instruction set); it's easy enough just to give
that entire area over to the client just in case.
> > My Mac OS X launcher has this problem. gdb can see Valgrind's
> > symbols, but can't set any breakpoints, and is easily confused
> > about other things. gdb knows nothing about the client or its
> > libraries. It's all rather clumsy to work with, but not the
> > end of the world.
>
> Not good tho; if V crashes we really want GDB to be able to say
> what's going on.
gdb can help, but it takes some extra work. Other tools can
be used to interrogate the location of loaded client libraries,
and then gdb's add-symbol-file command can insert the proper
knowledge into gdb's state. Clumsy, but not hopeless.
> What object file format does MacOSX use? ELF? How lucky are we?
Mach-O, which is entirely unlike ELF in many aspects that Valgrind
cares about. On the other hand, Valgrind's handling of stabs worked
almost as-is for debug info, at least at the level I've tried so far.
--
Greg Parker gp...@us...
|
|
From: Nicholas N. <nj...@cs...> - 2005-02-24 01:05:41
|
On Wed, 23 Feb 2005, Jeremy Fitzhardinge wrote: > To fix all this, I changed the leak checker to use a standard mark-sweep > algorithm. It does a pass from the root set to find unleaked memory, > and then makes a pass over the leaked memory to group it into cliques > (connected graph of allocations); each clique is reported as a leak, > rather than each individual allocation. > > I'm still running it through its paces, but it seems to be a big > improvement over the existing checker. > > What do people think? Sounds good in principle. How complex is it, how much code has been added? N |
|
From: Jeremy F. <je...@go...> - 2005-02-24 00:38:08
|
I've been using the leak checker a bit lately, and it seems to have a
number of problems:
* It under-reports lost memory, by only pointing out completely
undereferenced allocations. This means that apparently small
leaks are actually large, if they refer to a lot of other memory.
* It completely fails to report leaked cycles. We have software
which uses refcounting, which also loses cycles; we were hoping
that Valgrind would point out deficiencies in the refcount management.
* It doesn't trace from registers, so it can report blocks as leaked
even if there's a register reference (not common, I admit).
To fix all this, I changed the leak checker to use a standard mark-sweep
algorithm. It does a pass from the root set to find unleaked memory,
and then makes a pass over the leaked memory to group it into cliques
(connected graph of allocations); each clique is reported as a leak,
rather than each individual allocation.
I'm still running it through its paces, but it seems to be a big
improvement over the existing checker.
What do people think?
J
|
|
From: Jeremy F. <je...@go...> - 2005-02-24 00:17:06
|
Nicholas Nethercote wrote:
> On Wed, 23 Feb 2005, Jeremy Fitzhardinge wrote:
>
>> A more general comment is that I'd really like to come up with a way of
>> dealing with ioctls which doesn't require adding more stuff into
>> vg_unsafe.h, and doesn't require duplicating structures into vki*.h.
>
>
> What version was the patch against? vg_unsafe.h doesn't exist in the
> CVS HEAD any more.
Oh, you're right. I guess the diff is against 2.2.
J
|
|
From: Jeremy F. <je...@go...> - 2005-02-24 00:16:12
|
Julian Seward wrote: >>Mac OS X has some entertaining address space requirements and >>restrictions. >> >> > >Can you summarise what they are, so I can muse on the consequences >thereof? > > There was a description in earlier mail. I think share libraries are at a fixed, unmovable address up high, and the application is down low, so Valgrind has to sit in the middle. By the sounds of it, the superblock scheme should be able to deal with it. The magic appearing mappings are a bit problematic, but I presume that if we pad out all the Valgrind superblocks they won't get molested. >This is an amd64-linux issue. It seems that at least some >AMD64 distros (SuSE) are built using a "small model" where >offsets in literal jumps are signed 32-bits and so that appears >to force all code below 2 G. Or at least into some contiguous >2 G range. > > No, no, no, this only applies for non-PIC/PIE code. PIC/PIE code can be anywhere (and is routinely). It just requires that one object file is less that 2G, which I think we can manage. Each PIC/PIE module uses full 64-bit pointers to refer to other modules, so there are no constraints on where they are relative to each other. >Well, the work is already done - here's one I made earlier: >http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/rts/Linker.c >It works for ELF on x86, amd64, ppc32, used to work for arm, >looks like it can be made to work for ia64, and (at least at >some point in the past) could also load/link Windows PEi386 >object files :-) It's a modular linker and so can be extended >to new formats (eg COFF) by writing the appropriate COFF-parsers. > > Well, that helps, but colour me not particularly keen. >What object file format does MacOSX use? ELF? How lucky are we? > > MachO. Not much like ELF. J |
|
From: Jeremy F. <je...@go...> - 2005-02-24 00:07:56
|
Julian Seward wrote:
>* There's a fundamental circularity which has caused segfaults
> at least twice in the past. The segment list manager needs
> the malloc/free manager to be operating, but the malloc/free
> manager may cause segment list entries to be allocated.
>
> In effect we have two competing low level memory managers,
> a situation which is nonsensical and should be fixed. The
> segment-list-manager (which we should really call the
> Address Space Manager, ASpaceMgr) is fundamental and should be
> self-contained. The malloc/free manager should be built on
> top of ASpaceMgr. The point at which debug info reading is
> done should be moved upwards in the services hierarchy
> to enable this split to be made.
>
>* Abstraction boundaries in vg_mylibc have been muddied. Once
> upon a time, VG_(mmap) and VG_(mprotect) simply passed requests
> through to the kernel. Now they are part of the segment-mapping
> game and make enquiries against the segment list. That functionality
> needs to exist somewhere, but it's confusing that it happens
> at that low a level.
>
>* I found the code hard to understand (== maintain) and there is
> no comprehensive statement of what it is and is not trying to
> achieve.
>
>
Yeah. One conceptual difference between what you're describing and what
exists is that the Segment list is intended to document what exists, but
it isn't actually responsible for managing it. Segments can be backed
by many different kinds of VM object (mmap, shared memory, etc), and the
Segment code doesn't really care about what backs each virtual address
range, and it certainly doesn't do anything to cause those ranges to
appear/disappear. It expects to be told about what does exist and when
it changes, and it can be queried by code which makes changes to
discover what exists.
You're describing something a bit more active, which knows how to create
and destroy VM objects. That's a wider mandate, and more complex
because there are so many ways that can happen.
Certainly we need to fix the Segment manager's cyclic dependency.
>I've also been considering how to rework address space management to
>support a 64-bit world. The following proposal builds on my proposal
>of a couple of weeks ago, which proposed chopping the address space
>up into 64M chunks and doing permissions checks at that granularity.
>
>* The 64M superblocks have 4 possible ownership states:
> Unallocated
> Valgrind's -- V's text, stack, static data, dynamic data
> Shadow
> Client
>
>
Is there any difference between Shadow and Valgrind? Do they behave
differently in any respect, or is it just that their contents mean
different things?
>* As mentioned above, the Address Space Manager is fundamental
> and self-contained. It is decoupled from the malloc/free manager.
> It no longer deals with debug info loading/unloading. It does
> nothing that requires dynamic memory allocation. The segment list
> is to be held in statically allocated storage to make that possible.
> That's not wonderful, but even a 1 Mb static area should hold
> enough info to track several thousand segments.
>
>
Well, you could still allocate them dynamically, so long as you do it
early enough (ie, before you're so short of space that there's no room
to describe the new space you're allocating).
>* At least for Linux, stage2 is loaded into a 64M superblock
> just below 0x4000'0000 (1 G). ASpaceMgr allocates superblocks
> on demand, above 1G if it can for shadow memory and for Valgrind's
> own use, and below 1G for the client, if possible. In particular
> it tries hard not to put Valgrind or Shadow data in the area
> below 1G.
>
> Why 0x4000'0000 ?
>
> - On a 32-bit machine, this gives the client nearly 1G of
> contiguous space, should it want to do large mmaps. If
> clients want to mmap more than 1G at a time, that's tough
> -- use 64-bit Valgrind instead.
>
> - On a 32-bit machine, even if the top 3/4 of the address space
> is given over to the kernel, we don't have to deal with
> different load addresses -- it will work as-is. Under those
> conditions ASpaceMgr will have to make inroads into the top
> of the 1G area, but that's unavoidable.
>
>
I'm not sure I follow you here.
> - In general, on a 32-bit machine, because memory is allocated
> in 64M superblocks to either shadow, client or V-internal, we get
> rid of all problems associated with the current hard partitioning
> scheme between client and shadow memory. Big-bang allocation is
> done away with. We know we can still protect V from wild writes
> by the client at fairly minimal expense.
>
>
What problems are there, and how are they avoided by this scheme?
> - On a 64-bit machine, all code is to be mapped in below 1 G, but
> apart from that ASpaceMgr can be fairly relaxed about fragmentation
> in the area above 1 G.
>
>
Er, why? We have terabytes of address space to play with. Why make
holes in it? There's no technical reason we need to put code down that low.
More generally, I don't see why these parameters need to be fixed
between different targets/platforms. Is there any reason the superblock
must be 64M, and the stage2 address is at 1G for everyone? It seems
pretty easy to make these target-specific parameters and still have all
the generic code cope.
Also, choosing a fixed address prevents Valgrind from running under
itself. I think we should keep that.
>Startup then looks like this:
>
>* Load stage2 at 0x4000'0000 - 64M
>
>* Copy command-line/env data into this area somewhere
>
>* Switch stacks and start stage2
>
>* stage2: initialise ASpaceMgr, read initial segments from
> /proc/self/maps
>
>* Initialise logging, so we can print debugging info
> early on. Note this means the logging mechanism cannot
> do dynamic memory allocation.
>
>* Nuke all segments except this one -- this gets the address
> space in a known starting state
>
>* Initialise the malloc/free manager
>
>* Initialise scheduler
>
>* Get signals in a known state; initialise signals subsystem
>
>* Initialise any other subsystems (Vex?)
>
>* Make the ume mechanism load the client
>
>* Run the client
>
>
>The only part of any difficulty is to get stage2 to a specific
>address. Three possibilities:
>
>(1) Link it to load at that address at build-time.
>
>(2) Build it as a PIE.
>
>(3) Use a standalone ELF .o loader/linker to load all the .o's,
> link and start them.
>
>At first (3) sounds insane, but it has a couple of advantages:
>
>- we don't need to screw around padding the space with mmap in
> stage1 to ensure stage2 and all its bits & pieces end up in
> the designated 64 M superblock
>
>- it gives us 100% control over V's linking and makes it easy to
> ensure we don't inadvertantly depend on anything from glibc.
> I like that.
>
>The main disadvantage is that gdb would not have a clue what it
>was looking at unless we found a way to convey debug info to it.
>
>
No, the main disadvantages are that this is fantastically complex, makes
us very dependent on the toolchain (particularly all the little
side-channels between gcc and binutils), and is architecture and OS
dependent (since not everyone uses ELF). By comparison, padding the
address space and using libc carefully are very simple and portable.
I'm still in favour of 1) use PIE where available (otherwise choose a
static loading address), 2) using a flexible address space configuration
which allows Valgrind to run under itself, 3) try not to get too
involved with the object file formats.
J
|