|
From: Ruurd B. <Ruu...@in...> - 2016-08-05 14:39:31
|
Hi,
I am a software developer at Infor, where we maintain a complex application (30+ years old, many millions of lines), most of it written in C/C++.
I have used valgrind with memcheck to find and fix memory related issues and have become a great fan of the product.
However, we use a custom allocator that caused me considerable problems because it has memory pool features not supported by the "loose model" of valgrind.
1. Specifically, it allows me to create a memory pool, allocate many items from that pool and then destroy the pool.
The applications know that all pool items are automatically freed when the pool is destroyed, so it saves time and code by not doing so explicitly.
Valgrind reports all items in such a pool as memory leaks, because that is the model it assumes.
I understand that this is a design choice: Either such application memory pools are considered "auto-freed" or not, and when not, they are considered leaks.
2. Another problem is that our allocator uses itself to allocate large chunks for the memory pools.
Those chunks are used to dole out smaller pieces for the applications.
Valgrind sees that as an error: Overlapping memory blocks because both types of blocks (memory pool and allocations from the pool) are marked as VALGRIND_MALLOCLIKE_BLOCK.
That triggers an error:
Block 0x%lx..0x%lx overlaps with block 0x%lx..0x%lx, this is usually caused by using VALGRIND_MALLOCLIKE_BLOCK in an inappropriate way
plus an assert in memcheck.
3. Our (admittedly ancient) allocator uses sbrk() to get the memory (and not mmap).
Valgrind (on linux) limits this to 8MB. That is not enough for our applications. The 8MB is hardcoded in valgrind.
4. We use Oracle as a database, which executes as setuid-to-oracle on Linux (we have our own database wrapper software layers for Oracle, DB2, MySql, Microsoft SQL and so on).
To be able to valgrind such executables, I've created a setuid-oracle copy of Valgrind.
That works, but the reports valgrind creates are owned by Oracle in such cases and our test framework got "Permission denied" when it wanted to analyze and modify the valgrind reports.
So I have modified valgrind to support our model, address the problems, and tried not to break anything in the process:
1. Added a VALGRIND_CREATE_META_MEMPOOL macro in valgrind.h, modelled after VALGRIND_CREATE_MEMPOOL.
It takes a flag parameter, with 2 (or-able) options: MEMPOOL_AUTO_FREE and MEMPOOL_METABLOCKS.
When AUTO_FREE is set when the pool is created, valgrind will free all allocations in a memory pool block when MEMPOOL_FREE is used on a block.
For a non-auto-free pool, everything is as before. This prevents the false memory-leak reports.
2. When METABLOCKS is used, it will not complain about overlapping blocks as long as the overlap is with a memory-pool chunk from a METABLOCKS pool.
Also, when reporting the location of a problem, the "describe_addr" function favored custom memory pool blocks (our 64 KB chunks for the pool) over all else.
That caused almost all reports to say "Address XXX is many bytes in a block of 64K alloc'd", and the alloc location-stack would be the place where the pool was extended.
Not very useful.
So I've modified the describe_addr function to take the "meta-blocks" into account and report the underlying smaller allocation.
When no such meta-blocks exists, everything is as before.
3. For the sbrk problem, I've added a new command line option, --main-sbrksize, patterned after -main-stacksize. The default is the old (hard-coded) 8MB.
In the initimg modules for Linux and Solaris I have changed the code to use the command line value.
So the behavior is modified only when the new command line option is used. Out test framework passes 1GB and that works well.
That change cause a few regression tests of valgrind to fail that check the "help" output of valgrind, I've fixed those as well.
4. I've added group-write permissions to the default file-creation mask for the valgrind reports.
BTW: Those reports are altered because I could not figure out how to write suppression rules that are based only on the allocation stack of a problem.
For example, we link against OpenSSL crypto libraries which (intentionally) do all sorts of things with uninitialized memory (for randomness).
Valgrind spots that, but I want to suppress those messages.
The numerous different error-locations all have the same allocation spot, but suppressions insist on using the location of the error (use of uninitialized memory).
Those are far too many and often change when a new OpenSSL version is released.
So I've written as Perl post-processor to delete (suppress) the OpenSSL stuff based on arbitrary patterns in a valgrind error message.
The "permission denied" occurred when it wanted to write the edited report back.
Suggestion: It would be nice to be able to write suppression rules for this kind of problem, with regular expressions on the complete valgrind message.
I've created a new version of valgrind for this: 3.11.1.
I've attached a patch file to alter a 3.11.0 tree to a 3.11.1. Apply to a 3.11.0 tree by going to the root of the tree and do "patch -p0 < metamempool.patch".
I've tried to do all the changes in the style of the existing code.
I've run all the regression tests of valgrind and the results of 3.11.0 and 3.11.1 are identical.
The patch also includes altered manual pages, I've been unable to build those on my system, so I hope they're OK.
I've installed this altered valgrind on various development and test systems at Infor and used it for a few months to make sure I have not broken anything.
This version 3.11.1 is used on both normal programs and ones using our custom allocator (or both). Everything works the way it should.
I'd appreciate it if this patch could be applied to the standard distribution so I will not have to maintain a separate version of valgrind/memcheck for Infor.
Comment / feedback appreciated,
Regards,
Ruurd Beerstra
[Infor]<http://www.infor.com/>
Ruurd Beerstra | Software Engineer, Sr.
office: 0342-427289 | mobile: +31 22 42 7478 | Ruu...@in... | http://www.infor.com
|