|
From: Graydon H. <gr...@po...> - 2006-07-13 23:08:33
Attachments:
leak-check-fixes.patch
|
Hi,
Attached is a patch that (I think) corrects two bugs in the leak checker
and provides leak-checking functionality to memory pools. Several people
have provided this latter functionality in the past, but their patches
have rotted against your SVN head. I say "I think" because I'm not sure
exactly what the correct leak-checking behavior for memory pools *is*,
but more on that below.
First, the bugs. I believe the leak checker's binary search (and backup
linear search) algorithms are both incorrect. They both assume that the
memory extents being searched are of the form [data, data+size], when I
believe their actual form is [data, data+size). That is: not including
the end-point byte at data+size. I'm hesitant to say these are true bugs
because if they are, they're quite serious, and I'm surprised they
haven't been identified already.
So I changed them to what I think is correct: [data, data+size). If I
left them as they were, I could not convince any of my tests to produce
the correct results. I'm pretty sure these were real bugs.
Next, pools. The issue of how to leak-check a memory pool is sort of
complicated. I'll use the following terminology below:
- A typical memory pool contains an administrative structure and
a set of superblocks.
- The administrative structure contains counts and pointiers to
superblocks.
- The superblocks are carved up by pool functions to serve user
allocations.
- The user marks the administrative structure with
VALGRIND_CREATE_MEMPOOL(pool, 0, 0).
- After malloc'ing a superblock, the user marks it with
VALGRIND_MAKE_NOACCESS(sb, sb_size).
- After allocating a chunk in a superblock, the user marks it with
VALGRIND_MEMPOOL_ALLOC(pool, chunk, chunk_sz). This puts the
chunk in a hashtable associated with the pool and marks the chunk's
bytes addressable.
- After returning a chunk to the superblock, the user marks it with
VALGRIND_MEMPOOL_FREE(pool, chunk). This removes the entry from
the internal hashtable and marks the chunk NOACCESS again.
- When done with the pool, the user calls
VALGRIND_DESTROY_MEMPOOL(pool). This frees any chunks still live
and deletes the hashtable.
- An important invariant is that the unused parts of superblocks
are NOACCESS. This keeps the pointer-scanning logic from touching
garbage / recycled memory inside a superblock.
There are several leak-related things a user might simultaneously be
interested in:
- How the chunks of the pool connect up the larger pointer graph. This
is of primary importance: if the whole pool is considered a single
extent, then any pointer into the pool connects to all pointers out
of the pool, and leak checking in general is defeated.
- Whether "the pool" (the administrative structure plus all its
superblocks) has been lost. This is relevant in user code that
explicitly manages pools. They might leak the whole pool.
- Whether some number of chunks within each pool's superblocks are
allocated-but-not-pointed-to. These chunks might then be considered
"leaked". They might also *not* be considered leaked if the
surrounding pool is still reachable: some pools are designed to be
freed all at once.
While I am sympathetic to the idea of building a complicated structure
that relates each pool to its chunks using some sort of "virtual edge"
during leak checking, I am also pressed for time and wanted to see if
there was a "90%" solution that solves the important case: tracking the
pool chunks individually in the pointer graph.
I believe I've accomplished this with a relatively minor change that
only affects the initial stage of the algorithm: selecting chunks to
scan. If we let:
M = {malloc chunks}
P = {mempool chunks}
X = {c | c in M such that forall p in P, p intersect c == null}
Y = X union P
Then where previously we ran the algorithm on M, we now run it on Y.
More intuitively, this is "the set of mempool chunks" plus "the set of
malloc chunks that don't intersect mempool chunks". It's reasonably easy
to calculate this set, and it produces what I think are "more desirable"
results. Specifically:
- Each chunk in a pool is independently considered as a pointer-graph
member. A pointer into a chunk connects only to the outgoing
pointers held in the chunk.
- The leak-checker independently detects leakage of each chunk in
the pool, except the first chunk in a superblock, for which the
reference from administrative structure to superblock is considered
a reference to the chunk as well.
- The proper chunk-allocation context is reported when leaks are
found involving a chunk; not the outer pool-allocation.
- If the administrative structure is leaked from user code, it is
still reported. The count of indirect bytes leaked along with it
includes the superblocks if they are empty / unallocated. If the
superblocks contain allocated chunks, a smaller number is reported
(the size of the administrative structure alone), but there is still
a leak report.
Anyone interested in merging this? I *think* the only affects I'm seeing
on the regression testsuite are either improvements in leak detection or
idiosyncrasies of my OS.
-graydon
|
|
From: Julian S. <js...@ac...> - 2006-07-17 11:41:06
|
Hi Graydon. Nice work. > memory extents being searched are of the form [data, data+size], when I > believe their actual form is [data, data+size). That is: not including > the end-point byte at data+size. I'm hesitant to say these are true bugs > because if they are, they're quite serious, and I'm surprised they > haven't been identified already. It does seem wrong. My guess as to why this hasn't been noticed before is that the leak checker only looks for pointers in word-aligned locations. Typically, if blocks have a word-aligned size, then looking at one byte too many means that last byte will not be the start of a naturally-aligned word. > Next, pools. The issue of how to leak-check a memory pool is sort of > complicated. I'll use the following terminology below: Useful side effect is, we've never had any documentation on mempools before, afaics, which is really bad. Just the description itself is useful. > Anyone interested in merging this? I *think* the only affects I'm seeing > on the regression testsuite are either improvements in leak detection or > idiosyncrasies of my OS. I'm in favour of merging it. One question. I was just looking to see what documentation of mem pools does currently exist. I fell across VALGRIND_MALLOCLIKE_BLOCK and VALGRIND_FREELIKE_BLOCK. Do you have any idea how these interact with leak checking, if at all (or even what they do?) Did you find any interesting leaks in Mozilla once you had this lot working? J |
|
From: Nicholas N. <nj...@cs...> - 2006-07-17 13:29:52
|
On Mon, 17 Jul 2006, Julian Seward wrote: > One question. I was just looking to see what documentation of > mem pools does currently exist. I fell across VALGRIND_MALLOCLIKE_BLOCK > and VALGRIND_FREELIKE_BLOCK. Do you have any idea how these interact > with leak checking, if at all (or even what they do?) Any block marked as a heap block with VALGRIND_MALLOCLIKE_BLOCK gets leak-checked just like any other. Nick |
|
From: Ashley P. <as...@qu...> - 2006-07-19 16:13:26
|
On Monday, Jul 17, 2006, at 12:40 Europe/London, Julian Seward wrote: > > Hi Graydon. Nice work. > >> memory extents being searched are of the form [data, data+size], when >> I >> believe their actual form is [data, data+size). That is: not including >> the end-point byte at data+size. I'm hesitant to say these are true >> bugs >> because if they are, they're quite serious, and I'm surprised they >> haven't been identified already. > > It does seem wrong. My guess as to why this hasn't been noticed before > is that the leak checker only looks for pointers in word-aligned > locations. Typically, if blocks have a word-aligned size, then looking > at one byte too many means that last byte will not be the start of a > naturally-aligned word. I was wondering if the reason was more to do with red-zones? I found a couple of places where it fell over if these were non-zero in size. Ashley, |
|
From: Graydon H. <gr...@po...> - 2006-07-19 18:06:12
|
Ashley Pittman wrote:
> I was wondering if the reason was more to do with red-zones? I found a
> couple of places where it fell over if these were non-zero in size.
A couple cases with the leak-checker before my patch, or after?
I'm pretty sure it's just a broken binary search. The problem is pretty
well defined: you have a sorted list of non-overlapping half-open [...)
extents, and you want to get the extent that contains a point:
[aaaaaaaa) [bbbbbbbbbbb) [dddddddd)
| [cccccccccccc)
| |
p1 p2
The previous code would correctly identify p1 as being in extent a; but
it would incorrectly identify p2 as being in extent b: p2 is at the
beginning point of extent c, so p2 is *in* extent c. When you give an
extent in "begin, size" form, you're explicitly not including the
terminal point.
I think the reason it didn't trigger so often before is that it used to
be more rare for "exactly abutting" extents to exist in the malloc heap
(especially if they had redzones). Using mempools, you get abutting
extents all the time: every chunk in the mempool abuts another. So the
search goes "more obviously wrong" in that case.
I didn't even notice it until I constructed a testcase that was supposed
to leak a specific number of objects, and it only leaked a
random-seeming subset of them.
-graydon
|
|
From: Julian S. <js...@ac...> - 2006-07-28 10:58:48
|
> I'm pretty sure it's just a broken binary search. The problem is pretty > well defined: you have a sorted list of non-overlapping half-open [...) > extents, and you want to get the extent that contains a point: > > [aaaaaaaa) [bbbbbbbbbbb) [dddddddd) > > | [cccccccccccc) > > p1 p2 > > The previous code would correctly identify p1 as being in extent a; but > it would incorrectly identify p2 as being in extent b: p2 is at the > beginning point of extent c, so p2 is *in* extent c. When you give an > extent in "begin, size" form, you're explicitly not including the > terminal point. Euh, ok. There's a complexity shown up by memcheck/tests/leak-0.vgtest though - this change causes it to fail. Problem is if a block is zero sized, it is now regarded as covering zero bytes of address space, so no pointers are ever regarded as pointing at it, so blocks of size zero are now incorrectly reported as definitely lost. Not sure how to fix this. Maybe treat zero-sized blocks as if they have size 1 - special case them, iow. > I didn't even notice it until I constructed a testcase that was supposed > to leak a specific number of objects, and it only leaked a > random-seeming subset of them. Can you post this pls? If suitable I'd like to add it to the repo. J |
|
From: Julian S. <js...@ac...> - 2006-07-27 23:49:31
|
Thanks. Committed (svn r5991).
J
On Friday 14 July 2006 00:08, Graydon Hoare wrote:
> Hi,
>
> Attached is a patch that (I think) corrects two bugs in the leak checker
> and provides leak-checking functionality to memory pools. Several people
> have provided this latter functionality in the past, but their patches
> have rotted against your SVN head. I say "I think" because I'm not sure
> exactly what the correct leak-checking behavior for memory pools *is*,
> but more on that below.
>
> First, the bugs. I believe the leak checker's binary search (and backup
> linear search) algorithms are both incorrect. They both assume that the
> memory extents being searched are of the form [data, data+size], when I
> believe their actual form is [data, data+size). That is: not including
> the end-point byte at data+size. I'm hesitant to say these are true bugs
> because if they are, they're quite serious, and I'm surprised they
> haven't been identified already.
>
> So I changed them to what I think is correct: [data, data+size). If I
> left them as they were, I could not convince any of my tests to produce
> the correct results. I'm pretty sure these were real bugs.
>
> Next, pools. The issue of how to leak-check a memory pool is sort of
> complicated. I'll use the following terminology below:
>
> - A typical memory pool contains an administrative structure and
> a set of superblocks.
>
> - The administrative structure contains counts and pointiers to
> superblocks.
>
> - The superblocks are carved up by pool functions to serve user
> allocations.
>
> - The user marks the administrative structure with
> VALGRIND_CREATE_MEMPOOL(pool, 0, 0).
>
> - After malloc'ing a superblock, the user marks it with
> VALGRIND_MAKE_NOACCESS(sb, sb_size).
>
> - After allocating a chunk in a superblock, the user marks it with
> VALGRIND_MEMPOOL_ALLOC(pool, chunk, chunk_sz). This puts the
> chunk in a hashtable associated with the pool and marks the chunk's
> bytes addressable.
>
> - After returning a chunk to the superblock, the user marks it with
> VALGRIND_MEMPOOL_FREE(pool, chunk). This removes the entry from
> the internal hashtable and marks the chunk NOACCESS again.
>
> - When done with the pool, the user calls
> VALGRIND_DESTROY_MEMPOOL(pool). This frees any chunks still live
> and deletes the hashtable.
>
> - An important invariant is that the unused parts of superblocks
> are NOACCESS. This keeps the pointer-scanning logic from touching
> garbage / recycled memory inside a superblock.
>
> There are several leak-related things a user might simultaneously be
> interested in:
>
> - How the chunks of the pool connect up the larger pointer graph. This
> is of primary importance: if the whole pool is considered a single
> extent, then any pointer into the pool connects to all pointers out
> of the pool, and leak checking in general is defeated.
>
> - Whether "the pool" (the administrative structure plus all its
> superblocks) has been lost. This is relevant in user code that
> explicitly manages pools. They might leak the whole pool.
>
> - Whether some number of chunks within each pool's superblocks are
> allocated-but-not-pointed-to. These chunks might then be considered
> "leaked". They might also *not* be considered leaked if the
> surrounding pool is still reachable: some pools are designed to be
> freed all at once.
>
> While I am sympathetic to the idea of building a complicated structure
> that relates each pool to its chunks using some sort of "virtual edge"
> during leak checking, I am also pressed for time and wanted to see if
> there was a "90%" solution that solves the important case: tracking the
> pool chunks individually in the pointer graph.
>
> I believe I've accomplished this with a relatively minor change that
> only affects the initial stage of the algorithm: selecting chunks to
> scan. If we let:
>
> M = {malloc chunks}
> P = {mempool chunks}
> X = {c | c in M such that forall p in P, p intersect c == null}
> Y = X union P
>
> Then where previously we ran the algorithm on M, we now run it on Y.
> More intuitively, this is "the set of mempool chunks" plus "the set of
> malloc chunks that don't intersect mempool chunks". It's reasonably easy
> to calculate this set, and it produces what I think are "more desirable"
> results. Specifically:
>
> - Each chunk in a pool is independently considered as a pointer-graph
> member. A pointer into a chunk connects only to the outgoing
> pointers held in the chunk.
>
> - The leak-checker independently detects leakage of each chunk in
> the pool, except the first chunk in a superblock, for which the
> reference from administrative structure to superblock is considered
> a reference to the chunk as well.
>
> - The proper chunk-allocation context is reported when leaks are
> found involving a chunk; not the outer pool-allocation.
>
> - If the administrative structure is leaked from user code, it is
> still reported. The count of indirect bytes leaked along with it
> includes the superblocks if they are empty / unallocated. If the
> superblocks contain allocated chunks, a smaller number is reported
> (the size of the administrative structure alone), but there is still
> a leak report.
>
> Anyone interested in merging this? I *think* the only affects I'm seeing
> on the regression testsuite are either improvements in leak detection or
> idiosyncrasies of my OS.
>
> -graydon
|