Re: [Valgrind-users] Using mempool client requests

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Edwin,

On Jul 20, 2010, at 10:32 AM CDT, Török Edwin wrote:

> While it is possible to teach valgrind about ClamAV's pools (see
> attached mpool.patch for example), I think valgrind will be more
> effective if I just disable ClamAV's pools (and let it use malloc).

This sounds like a practical solution to me.  There is a RUNNING_ON_VALGRIND client request so that you can do this at runtime instead of having to reconfigure/recompile.

> The problem is that ClamAV's allocator doesn't have redzones, and
> sometimes it allocates more memory than strictly requested.
> All this makes it hard for valgrind to detect off-by-one errors, even
> if I teach it about the pools.
> 
> Here is a rough idea of how ClamAV's pools look like:
> mmap chunk_0 = | pool header | allocatable space |
> mmap chunk_1 .. chunk_N = | block header | allocatable space |
> 
> Each allocation block looks like this:
> | padding1| information about allocation | allocated data | padding2 |
> 
> Freed data is reusable, so freed blocks look like:
> | ptr next free block | undefined data |
> 
> Also reallocs will reuse the allocation block
> (when shrinking).
> 
> What I did was mark the entire allocation block with
> VALGRIND_MEMPOOL_ALLOC, however this has several shortcomings.
> 1. Code can corrupt the allocation information (located just before the
>   allocated data), I could fix this by marking them as NOACCESS (ideal
>   would be readonly but valgrind doesn't have that notion).

I do wish that memcheck had a readonly status, but that doesn't seem to be what you want here.  It should be erroneous for the client code to read or write the bookkeeping information.  NOACCESS is exactly what you want, even if it does make the allocator code a bit more complicated.

> 2. If I fix 1), then I get warnings in the allocator itself (when it
> wants to reuse freed data, or when it wants to realloc), I can fix by
> marking the memory zones I want to access as UNDEFINED, however this
> leads to problem 3

This is a little tedious, but not really that bad.  Depending on what you want to do, marking it DEFINED might make more sense.

> 3. If I get passed an invalid pointer (one that the pool allocator
> didn't allocate for example), then if I mark my allocation information
> block as UNDEFINED I actually mark random memory and valgrind can't
> help me

Can't you tell if the given pointer is valid based on the address value?  Don't you track the valid address ranges anyway in order to do cleanup at the end?

Otherwise, how about "magic number" in the allocation header that you can assert() is correct after you mark it DEFINED?  Do you actually need the code to keep operating after the client code passes the allocator an invalid pointer?

> 4. Realloc doesn't know the size of original allocation, it only knows 
> size of original allocation + padding2. So again imprecision and lack
> of redzone at end of allocation. Realloc sometimes needs to memcpy() so
> that would need me to temporarely mark the data as accessible

Again, tedious but not particularly difficult.  Just mark (allocation+padding2) as DEFINED before reuse.  You don't need to restore the NOACCESS status on the original region of padding2 during a realloc, right?  Then the new region of padding2 (post-resize) can be marked NOACCESS.

> All this is complicated and error prone (I might call valgrind with the
> wrong information).
> 
> I think a simple client request addition would solve this:
> VALGRIND_MEMPOOL_ALLOCATIONSIZE(pool, addr) -> size of allocation if
> addr was declared with VALGRIND_MEMPOOL_ALLOC(pool, addr, size), or 0
> if addr belongs to a different pool, or no pool at all (it can be
> noaccess/defined/undefined doesn't matter)
> 
> Then I could implement this in ClamAV like so:
> - mark the paddings and allocator's allocation information as NOACCESS
>   after I fill it, and mark as allocated just exactly what
>   my mpool_malloc() call requested
> - when I free/realloc I query valgrind whether the pointer is an
>   allocation base for the specified pool. If it is, I temporarely mark
>   the private allocation information readable and defined, I
>   read/modify it, then mark it as NOACCESS again. For realloc I would
>   know the exact allocation size and I would memcpy() only up to that
>   amount
> 
> Actually I could implement this already by abusing the meaning of
> MPOOL_CREATE. 
> I could pretend each allocation is a pool in itself, mark only the
> real allocation with MEMPOOL_ALLOC(), and use MEMPOOL_EXISTS() to check
> whether the pointer really belongs to my pool, and is the start of an
> allocation. However this might create millions of pool, which I guess
> is not good for valgrind's performance.

I don't know what the performance implications of this strategy would be.  Perhaps someone else on the list can comment.  However, it's not clear that this approach requires any less work than the more conventional approach without the new client request.

-Dave