|
From: Ruurd B. <Ruu...@in...> - 2016-08-05 14:39:31
Attachments:
metamempool.patch
image001.gif
|
Hi,
I am a software developer at Infor, where we maintain a complex application (30+ years old, many millions of lines), most of it written in C/C++.
I have used valgrind with memcheck to find and fix memory related issues and have become a great fan of the product.
However, we use a custom allocator that caused me considerable problems because it has memory pool features not supported by the "loose model" of valgrind.
1. Specifically, it allows me to create a memory pool, allocate many items from that pool and then destroy the pool.
The applications know that all pool items are automatically freed when the pool is destroyed, so it saves time and code by not doing so explicitly.
Valgrind reports all items in such a pool as memory leaks, because that is the model it assumes.
I understand that this is a design choice: Either such application memory pools are considered "auto-freed" or not, and when not, they are considered leaks.
2. Another problem is that our allocator uses itself to allocate large chunks for the memory pools.
Those chunks are used to dole out smaller pieces for the applications.
Valgrind sees that as an error: Overlapping memory blocks because both types of blocks (memory pool and allocations from the pool) are marked as VALGRIND_MALLOCLIKE_BLOCK.
That triggers an error:
Block 0x%lx..0x%lx overlaps with block 0x%lx..0x%lx, this is usually caused by using VALGRIND_MALLOCLIKE_BLOCK in an inappropriate way
plus an assert in memcheck.
3. Our (admittedly ancient) allocator uses sbrk() to get the memory (and not mmap).
Valgrind (on linux) limits this to 8MB. That is not enough for our applications. The 8MB is hardcoded in valgrind.
4. We use Oracle as a database, which executes as setuid-to-oracle on Linux (we have our own database wrapper software layers for Oracle, DB2, MySql, Microsoft SQL and so on).
To be able to valgrind such executables, I've created a setuid-oracle copy of Valgrind.
That works, but the reports valgrind creates are owned by Oracle in such cases and our test framework got "Permission denied" when it wanted to analyze and modify the valgrind reports.
So I have modified valgrind to support our model, address the problems, and tried not to break anything in the process:
1. Added a VALGRIND_CREATE_META_MEMPOOL macro in valgrind.h, modelled after VALGRIND_CREATE_MEMPOOL.
It takes a flag parameter, with 2 (or-able) options: MEMPOOL_AUTO_FREE and MEMPOOL_METABLOCKS.
When AUTO_FREE is set when the pool is created, valgrind will free all allocations in a memory pool block when MEMPOOL_FREE is used on a block.
For a non-auto-free pool, everything is as before. This prevents the false memory-leak reports.
2. When METABLOCKS is used, it will not complain about overlapping blocks as long as the overlap is with a memory-pool chunk from a METABLOCKS pool.
Also, when reporting the location of a problem, the "describe_addr" function favored custom memory pool blocks (our 64 KB chunks for the pool) over all else.
That caused almost all reports to say "Address XXX is many bytes in a block of 64K alloc'd", and the alloc location-stack would be the place where the pool was extended.
Not very useful.
So I've modified the describe_addr function to take the "meta-blocks" into account and report the underlying smaller allocation.
When no such meta-blocks exists, everything is as before.
3. For the sbrk problem, I've added a new command line option, --main-sbrksize, patterned after -main-stacksize. The default is the old (hard-coded) 8MB.
In the initimg modules for Linux and Solaris I have changed the code to use the command line value.
So the behavior is modified only when the new command line option is used. Out test framework passes 1GB and that works well.
That change cause a few regression tests of valgrind to fail that check the "help" output of valgrind, I've fixed those as well.
4. I've added group-write permissions to the default file-creation mask for the valgrind reports.
BTW: Those reports are altered because I could not figure out how to write suppression rules that are based only on the allocation stack of a problem.
For example, we link against OpenSSL crypto libraries which (intentionally) do all sorts of things with uninitialized memory (for randomness).
Valgrind spots that, but I want to suppress those messages.
The numerous different error-locations all have the same allocation spot, but suppressions insist on using the location of the error (use of uninitialized memory).
Those are far too many and often change when a new OpenSSL version is released.
So I've written as Perl post-processor to delete (suppress) the OpenSSL stuff based on arbitrary patterns in a valgrind error message.
The "permission denied" occurred when it wanted to write the edited report back.
Suggestion: It would be nice to be able to write suppression rules for this kind of problem, with regular expressions on the complete valgrind message.
I've created a new version of valgrind for this: 3.11.1.
I've attached a patch file to alter a 3.11.0 tree to a 3.11.1. Apply to a 3.11.0 tree by going to the root of the tree and do "patch -p0 < metamempool.patch".
I've tried to do all the changes in the style of the existing code.
I've run all the regression tests of valgrind and the results of 3.11.0 and 3.11.1 are identical.
The patch also includes altered manual pages, I've been unable to build those on my system, so I hope they're OK.
I've installed this altered valgrind on various development and test systems at Infor and used it for a few months to make sure I have not broken anything.
This version 3.11.1 is used on both normal programs and ones using our custom allocator (or both). Everything works the way it should.
I'd appreciate it if this patch could be applied to the standard distribution so I will not have to maintain a separate version of valgrind/memcheck for Infor.
Comment / feedback appreciated,
Regards,
Ruurd Beerstra
[Infor]<http://www.infor.com/>
Ruurd Beerstra | Software Engineer, Sr.
office: 0342-427289 | mobile: +31 22 42 7478 | Ruu...@in... | http://www.infor.com
|
|
From: Ivo R. <iv...@iv...> - 2016-08-10 03:25:07
|
Ruurd, Thank you for describing your problems and attaching the patch. Please split the patch into several pieces by functionality so they can be more easily handled and reviewed. Then submit a bug report for each of them as per: http://valgrind.org/support/bug_reports.html I am in no way able to tell you whether any of these will make it in the repository. But I have a few comments whose addressing could make that more likely to happen: 1. Please drop version 3.11.1. This is reserved for a maintenance release. 2. Include test case(s) which can be integrated in the repository. 3. Specify which systems (OS/arch) you tested on. 4. As regards sbrk limit, Linux and Solaris differ here as they use different virtual address space management strategy. I cannot speak about Darwin but I guess it follows Linux here. As you determined, Linux reserves very small data segment for sbrk growth. Subsequent segments for anonymous mappings (typically for shared libraries) are placed next to it, preventing further growth. Solaris uses different strategy - client space for anonymous mappings is allocated top down, like Solaris kernel does. This means that data segment has plenty of address space to grow because there is nothing preventing it. See comments in coregrind/m_aspacemgr/aspacemgr-linux.c. Examples: Situation on Linux with /bin/true: --17954:1: aspacem 0: RSVN 0000000000-00003fffff 4194304 ----- SmFixed --17954:1: aspacem 1: file 0000400000-0000405fff 24576 r-xT- d=0x801 i=17301669 o=0 (1,56) --17954:1: aspacem 2: RSVN 0000406000-0000604fff 2093056 ----- SmFixed --17954:1: aspacem 3: file 0000605000-0000605fff 4096 r---- d=0x801 i=17301669 o=20480 (1,56) --17954:1: aspacem 4: file 0000606000-0000606fff 4096 rw--- d=0x801 i=17301669 o=24576 (1,56) --17954:1: aspacem 5: RSVN 0000607000-0003ffffff 57m ----- SmFixed --17954:1: aspacem 6: file 0004000000-0004025fff 155648 r-xT- d=0x801 i=5505210 o=0 (2,70) --17954:1: aspacem 7: anon 0004026000-0004027fff 8192 rw--- --17954:1: aspacem 8: 0004028000-0004046fff 126976 --17954:1: aspacem 9: anon 0004047000-0004048fff 8192 rw--- --17954:1: aspacem 10: 0004049000-0004224fff 1949696 --17954:1: aspacem 11: file 0004225000-0004225fff 4096 r---- d=0x801 i=5505210 o=151552 (2,70) --17954:1: aspacem 12: file 0004226000-0004226fff 4096 rw--- d=0x801 i=5505210 o=155648 (2,70) --17954:1: aspacem 13: anon 0004227000-0004227fff 4096 rw--- --17954:1: aspacem 14: anon 0004228000-0004228fff 4096 rwx-- --17954:1: aspacem 15: RSVN 0004229000-0004a27fff 8384512 ----- SmLower --17954:1: aspacem 16: file 0004a28000-0004a28fff 4096 r-xT- d=0x801 i=23464128 o=0 (4,165) --17954:1: aspacem 17: file 0004a29000-0004c27fff 2093056 ----- d=0x801 i=23464128 o=4096 (4,165) --17954:1: aspacem 18: file 0004c28000-0004c28fff 4096 r---- d=0x801 i=23464128 o=0 (4,165) --17954:1: aspacem 19: file 0004c29000-0004c29fff 4096 rw--- d=0x801 i=23464128 o=4096 (4,165) ... Here segment 14 is data segment and 15 is its reservation, allowing for 8 MB growth. However segments 16+ for anonymous mappings (shared libraries) prevent any further growth. Situation on Solaris with /bin/true: --15522:1: aspacem 0: RSVN 0000000000-00000fffff 1048576 ----- SmFixed --15522:1: aspacem 1: 0000100000-0000107fff 32768 --15522:1: aspacem 2: file 0000108000-0000108fff 4096 r-xT- d=0xe700010002 i=38842 o=0 (1,70) --15522:1: aspacem 3: 0000109000-0000208fff 1048576 --15522:1: aspacem 4: file 0000209000-0000209fff 4096 rw--- d=0xe700010002 i=38842 o=4096 (1,70) --15522:1: aspacem 5: anon 000020a000-0000a09fff 8388608 rw--- --15522:1: aspacem 6: RSVN 0000a0a000-0000a0afff 4096 ----- SmLower --15522:1: aspacem 7: 0000a0b000-00377effff 877m --15522:1: aspacem 8: RSVN 00377f0000-0037fecfff 8376320 ----- SmUpper --15522:1: aspacem 9: anon 0037fed000-0037feffff 12288 rwx-- --15522:1: aspacem 10: 0037ff0000-0037ffffff 65536 --15522:1: aspacem 11: FILE 0038000000-00380ebfff 966656 r-x-- d=0xe70001000a i=65450 o=4096 (0,4) --15522:1: aspacem 12: file 00380ec000-00380ecfff 4096 r-xT- d=0xe70001000a i=65450 o=970752 (0,4) ... Here segment 5 is data segment (with initial size of 8 MB) and segment 6 is a reservation segment to separate it from other anonymous mappings. Segment 7 is a free virtual address space where data segment can grow. Starting at segment 11, valgrind executable segments follow. Therefore I'd say that to properly fix a problem with data segment on Linux, it will have to follow Solaris lead here. Introducing another command line option "main-sbrksize" does not help much because you'd have to know it exists and how much memory a priori to specify. Valgrind should give you that functionality automatically. Kind regards, I. |