You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
1
(15) |
2
(17) |
3
(23) |
4
(13) |
5
(7) |
6
(8) |
7
(9) |
|
8
(8) |
9
(31) |
10
(31) |
11
(19) |
12
(11) |
13
(38) |
14
(14) |
|
15
(8) |
16
(11) |
17
(7) |
18
(17) |
19
(12) |
20
(12) |
21
(17) |
|
22
(19) |
23
(33) |
24
(42) |
25
(37) |
26
(23) |
27
(27) |
28
(27) |
|
29
(16) |
30
(52) |
31
(33) |
|
|
|
|
|
From: Eric E. <eri...@fr...> - 2004-08-11 22:12:32
|
Tom Hughes wrote: > We have enough address space problems as it is without keeping all > the debug information mapped all the time. Indeed, keeping them mapped can be painful for that obvious reason, and has never been needed in my mind. Just a potential memory optimization. > It isn't quite that simple because if the library is opened again > it may be at a different address so the symbols need a bit of fixing > up. Which can be done through a simple address translation, no ? There is no relocation to be done, afaik for dwarf2, and likely for other formats. > The main point I'm trying to make is that rather than constructing > one giant project it is better to try and separate things out into > separate tasks which don't depend on each other. Constructing tasks step by step and making them modular is one thing, architecturing the whole so that the integration will be painless is another... The 'giant' project is just a few design ideas I had during a sleepless night, which seemed coherent for me. Rather than putting it with thousands other ideas I'll never have the time to use, I decided to share it because I felt it could be useful in that precise case, and provide other interesting benefits. > Yes, lazy symbol loading would be a benefit. Yes, handling dclose > better would be a benefit. Each can be tackled without the other > however so there is little in trying to join them together into > one large change - stepwise refinement is generally better than > large leaps. Such ideas need to maturate a little. So far nobody has typed a line of code and that's better ;-) Anyway the tasks can always be properly separated. If you want, I could have a try at the implementation one day or the other. Sorry if I bother you, I didn't mind. Just trying to help a bit. Cheers -- Eric |
|
From: Tom H. <th...@cy...> - 2004-08-11 21:35:56
|
In message <411...@fr...>
Eric Estievenart <eri...@fr...> wrote:
> Tom Hughes wrote:
>
> > The current system would also work if we knew which module was
> > involved though, because we would re-read the debug info when we
> > opened it to print the trace.
>
> Why would you need to read debug infos twice ? To reduce memory usage ?
> I feel that once the debug infos are loaded into memory because
> they are needed, they are likely to be needed after and should not be
> discarded until prog exit... For memory usage, the best could be,
> instead of allocating strings in the dbginf arena, to keep the memory
> mappings for the debug infos and just have pointers to the interesting
> parts, like strings & co, which do not need computation.
We have enough address space problems as it is without keeping all
the debug information mapped all the time.
At least for the DWARF reader the idea is to drop memory mapping and
just read the bits we need when we need them, which won't be very much
at all generally.
> > The point is that although it might make lazy loading even more
> > desirable it certainly doesn't require it.
>
> Yes, we can just avoid discarding the symbols on dlclose...
> That's an obvious quick fix ;-) But it won't make startup time faster
> nor solve the potential issue with conflicting map ranges on
> consecutive dlopens....
It isn't quite that simple because if the library is opened again
it may be at a different address so the symbols need a bit of fixing
up.
The main point I'm trying to make is that rather than constructing
one giant project it is better to try and separate things out into
separate tasks which don't depend on each other.
Yes, lazy symbol loading would be a benefit. Yes, handling dclose
better would be a benefit. Each can be tackled without the other
however so there is little in trying to join them together into
one large change - stepwise refinement is generally better than
large leaps.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Eric E. <eri...@fr...> - 2004-08-11 21:10:19
|
Tom Hughes wrote: > The current system would also work if we knew which module was > involved though, because we would re-read the debug info when we > opened it to print the trace. Why would you need to read debug infos twice ? To reduce memory usage ? I feel that once the debug infos are loaded into memory because they are needed, they are likely to be needed after and should not be discarded until prog exit... For memory usage, the best could be, instead of allocating strings in the dbginf arena, to keep the memory mappings for the debug infos and just have pointers to the interesting parts, like strings & co, which do not need computation. > The point is that although it might make lazy loading even more > desirable it certainly doesn't require it. Yes, we can just avoid discarding the symbols on dlclose... That's an obvious quick fix ;-) But it won't make startup time faster nor solve the potential issue with conflicting map ranges on consecutive dlopens.... Eric |
|
From: Tom H. <th...@cy...> - 2004-08-11 17:31:52
|
In message <411...@fr...>
Eric Estievenart <eri...@fr...> wrote:
> If we do lazy-loading and allow loading the debug infos after a module
> has been unloaded, the side-effect is that we will find the syms
> for a module which has been dlclose'd, and users do not need to
> comment their dlclose calls for Valgrind to find info for normally
> unloaded modules.
The current system would also work if we knew which module was
involved though, because we would re-read the debug info when we
opened it to print the trace.
The point is that although it might make lazy loading even more
desirable it certainly doesn't require it.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Eric E. <eri...@fr...> - 2004-08-11 16:57:32
|
Tom Hughes wrote: >>Could we in the future delay debug infos loading until they are >>really necessary ? > > I don't see what this has to do with the dlopen/dlclose issue? I would > have though it was orthogonal. > > Lazy reading of debug info will obviously help startup time and it's > something we'd like to do. It has little to do with the dlclose > problem however. If we do lazy-loading and allow loading the debug infos after a module has been unloaded, the side-effect is that we will find the syms for a module which has been dlclose'd, and users do not need to comment their dlclose calls for Valgrind to find info for normally unloaded modules. > There was an extensive discussion on the developer list recently about > possible ways of doing something like this but no firm conclusion was > reached about the best approach - there are space/time tradeoffs to > the various possibilities. Sorry I missed it :-P; will have a look in the archives. Time tradeoffs are ok if we can delay loading; even more if we have a granularity lower than module. Space issues is how the dbg infos are extracted and stored in mem, which is another point and needs optimizing independently. > For stabs I'm not sure that lazy loading is possible beyond just not > reading anything until we need it and then reading everything in one > go. For DWARF a much more sophisticated approach is possible. Well, I was just thinking of reading all at once in a first time... For dwarf you can read each compile unit independently, so it could be really fast indeed. For stabs, I don't know the format, but if we read all dbg infos at once it is still ok... -- Eric |
|
From: Tom H. <th...@cy...> - 2004-08-11 15:40:05
|
In message <411...@fr...>
Eric Estievenart <eri...@fr...> wrote:
> Which of course is ok if you have control on the code doing the
> dlclose and can comment it...
Well obviously. I was only offering it as a quick hack...
> Could we in the future delay debug infos loading until they are
> really necessary ?
I don't see what this has to do with the dlopen/dlclose issue? I would
have though it was orthogonal.
> I feel that we could record dlopen/close events in a list, which has
> a sequential index, the mapping address(es) and the file which was
> mapped.
>
> When we save a stack record internally for further processing
> (e.g. allocators and freeers of memory blocks), we save the code
> addresses along with the current index in previous array.
>
> When there is a need to output a previously recorded stack, we walk
> the array to find the modules which were loaded at that time, and
> load the symbols for them if not already done.
There was an extensive discussion on the developer list recently about
possible ways of doing something like this but no firm conclusion was
reached about the best approach - there are space/time tradeoffs to
the various possibilities.
> This would solve definitely the issue raised, fix potential issues
> in case a new .so is loaded at an address overlapping a previous
> unloaded one, and potentially reduce startup time for people which
> are running valgrind with a lot of debug libraries. (I didn't time
> that, but I felt that running with debug libc + X11 + Qt is times
> slower...)
Well doing something about remembering which libraries were loaded
where and when will obviously resolve the dlopen/dlclose issues.
Lazy reading of debug info will obviously help startup time and it's
something we'd like to do. It has little to do with the dlclose
problem however.
For stabs I'm not sure that lazy loading is possible beyond just not
reading anything until we need it and then reading everything in one
go. For DWARF a much more sophisticated approach is possible.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Eric E. <eri...@fr...> - 2004-08-11 15:29:13
|
Moving that discussion to valgrind-developpers, since there are a few more things worth investigating... >> If you can stop your program unloading the plugins (ie just don't >> do the dlclose) then that should allow you to see where the problem >> is coming from on. Which of course is ok if you have control on the code doing the dlclose and can comment it... Could we in the future delay debug infos loading until they are really necessary ? I feel that we could record dlopen/close events in a list, which has a sequential index, the mapping address(es) and the file which was mapped. When we save a stack record internally for further processing (e.g. allocators and freeers of memory blocks), we save the code addresses along with the current index in previous array. When there is a need to output a previously recorded stack, we walk the array to find the modules which were loaded at that time, and load the symbols for them if not already done. This would solve definitely the issue raised, fix potential issues in case a new .so is loaded at an address overlapping a previous unloaded one, and potentially reduce startup time for people which are running valgrind with a lot of debug libraries. (I didn't time that, but I felt that running with debug libc + X11 + Qt is times slower...) Should I fill an ER so that these small ideas are not lost ? My 2 cents of euro -- Eric |
|
From: Nicholas N. <nj...@ca...> - 2004-08-11 09:41:23
|
CVS commit by nethercote:
Big overhaul of the allocator. Much of the structure is the same, but
lots of the details changed. Made the following generalisations:
- Recast everything to be entirely terms of bytes, instead of a mixture
of (32-bit) words and bytes. This is a bit easier to understand, and
made the following generalisations possible...
- Almost 64-bit clean; no longer assuming 32-bit words/pointers. Only
(I think) non-64-bit clean part is that VG_(malloc)() et al take an
Int as the size arg, and size_t is 64-bits on 64-bit machines.
- Made the alignment of blocks returned by malloc() et al completely
controlled by a single value, VG_MIN_MALLOC_SZB. (Previously there
were various magic numbers and assumptions about block alignment
scattered throughout.) I tested this, all the regression tests pass
with VG_MIN_MALLOC_SZB of 4, 8, 16, 32, 64. One thing required for
this was to make redzones elastic; the asked-for redzone size is now
the minimum size; it will use bigger ones if necessary to get the
required alignment.
Some other specific changes:
- Made use of types a bit more; ie. actually using the type 'Block',
rather than just having everything as arrays of words, so that should
be a bit safer.
- Removed the a->rz_check field, which was redundant wrt. a->clientmem.
- Fixed up the decision about which list to use so the 4 lists which
weren't ever being used now are -- the problem was that this hasn't
been properly updated when alignment changed from 4 to 8 bytes.
- Added a regression test for memalign() and posix_memalign().
memalign() was aborting if passed a bad alignment argument.
- Added some high-level comments in various places, explaining how the
damn thing works.
A memcheck/tests/memalign2.c 1.1 [no copyright]
A memcheck/tests/memalign2.stderr.exp 1.1
A memcheck/tests/memalign2.vgtest 1.1
M +1 -1 coregrind/vg_default.c 1.23
M +3 -2 coregrind/vg_dwarf.c 1.4
M +8 -0 coregrind/vg_include.h 1.232
M +2 -1 coregrind/vg_ldt.c 1.15
M +752 -807 coregrind/vg_malloc2.c 1.31
M +7 -1 coregrind/vg_replace_malloc.c.base 1.4
M +1 -1 coregrind/demangle/cp-demangle.c 1.6
M +2 -1 coregrind/demangle/cplus-dem.c 1.6
M +1 -1 coregrind/demangle/dyn-string.c 1.5
M +1 -2 coregrind/docs/coregrind_core.html 1.32
M +1 -0 memcheck/tests/.cvsignore 1.14
M +4 -1 memcheck/tests/Makefile.am 1.40
--- valgrind/coregrind/vg_default.c #1.22:1.23
@@ -83,5 +83,5 @@ void* SK_(malloc)( Int size )
{
if (VG_(sk_malloc_called_by_scheduler))
- return VG_(cli_malloc)(4, size);
+ return VG_(cli_malloc)(VG_MIN_MALLOC_SZB, size);
else
malloc_panic(__PRETTY_FUNCTION__);
--- valgrind/coregrind/vg_dwarf.c #1.3:1.4
@@ -216,5 +216,5 @@ int process_extended_line_op( SegInfo *s
else
*fnames = VG_(arena_realloc)(
- VG_AR_SYMTAB, *fnames, /*alignment*/4,
+ VG_AR_SYMTAB, *fnames, VG_MIN_MALLOC_SZB,
sizeof(UInt)
* (state_machine_regs.last_file_entry + 1));
@@ -368,5 +368,6 @@ void VG_(read_debuginfo_dwarf2) ( SegInf
fnames = VG_(arena_malloc)(VG_AR_SYMTAB, sizeof (UInt) * 2);
else
- fnames = VG_(arena_realloc)(VG_AR_SYMTAB, fnames, /*alignment*/4,
+ fnames = VG_(arena_realloc)(VG_AR_SYMTAB, fnames,
+ VG_MIN_MALLOC_SZB,
sizeof(UInt)
* (state_machine_regs.last_file_entry + 1));
--- valgrind/coregrind/vg_include.h #1.231:1.232
@@ -366,4 +366,12 @@ typedef Int ArenaId;
#define VG_AR_TRANSIENT 8
+// This is both the minimum payload size of a malloc'd block, and its
+// minimum alignment. Must be a power of 2 greater than 4, and should be
+// greater than 8.
+#define VG_MIN_MALLOC_SZB 8
+
+// Round-up size for --sloppy-malloc=yes.
+#define VG_SLOPPY_MALLOC_SZB 4
+
extern void* VG_(arena_malloc) ( ArenaId arena, Int nbytes );
extern void VG_(arena_free) ( ArenaId arena, void* ptr );
--- valgrind/coregrind/vg_ldt.c #1.14:1.15
@@ -101,5 +101,6 @@ VgLdtEntry* VG_(allocate_LDT_for_thread)
if (parent_ldt == NULL) {
/* Allocate a new zeroed-out one. */
- ldt = (VgLdtEntry*)VG_(arena_calloc)(VG_AR_CORE, /*align*/4, nbytes, 1);
+ ldt = (VgLdtEntry*)VG_(arena_calloc)(VG_AR_CORE, VG_MIN_MALLOC_SZB,
+ nbytes, 1);
} else {
ldt = (VgLdtEntry*)VG_(arena_malloc)(VG_AR_CORE, nbytes);
--- valgrind/coregrind/vg_malloc2.c #1.30:1.31
@@ -33,99 +33,90 @@
#include "vg_include.h"
-/* Define to turn on (heavyweight) debugging machinery. */
-/* #define DEBUG_MALLOC */
+//#define DEBUG_MALLOC // turn on heavyweight debugging machinery
+//#define VERBOSE_MALLOC // make verbose, esp. in debugging machinery
/*------------------------------------------------------------*/
-/*--- Command line options ---*/
+/*--- Main types ---*/
/*------------------------------------------------------------*/
-/* Round malloc sizes upwards to integral number of words? default: NO */
-Bool VG_(clo_sloppy_malloc) = False;
-
-/* DEBUG: print malloc details? default: NO */
-Bool VG_(clo_trace_malloc) = False;
-
-/* Minimum alignment in functions that don't specify alignment explicitly.
- default: 0, i.e. use default of the machine (== 8) */
-Int VG_(clo_alignment) = 8;
-
-
-Bool VG_(replacement_malloc_process_cmd_line_option)(Char* arg)
-{
- if (VG_CLO_STREQN(12, arg, "--alignment=")) {
- VG_(clo_alignment) = (Int)VG_(atoll)(&arg[12]);
-
- if (VG_(clo_alignment) < 8
- || VG_(clo_alignment) > 4096
- || VG_(log2)( VG_(clo_alignment) ) == -1 /* not a power of 2 */) {
- VG_(message)(Vg_UserMsg, "");
- VG_(message)(Vg_UserMsg,
- "Invalid --alignment= setting. "
- "Should be a power of 2, >= 8, <= 4096.");
- VG_(bad_option)("--alignment");
- }
- }
-
- else VG_BOOL_CLO("--sloppy-malloc", VG_(clo_sloppy_malloc))
- else VG_BOOL_CLO("--trace-malloc", VG_(clo_trace_malloc))
- else
- return False;
-
- return True;
-}
-
-void VG_(replacement_malloc_print_usage)(void)
-{
- VG_(printf)(
-" --sloppy-malloc=no|yes round malloc sizes to next word? [no]\n"
-" --alignment=<number> set minimum alignment of allocations [8]\n"
- );
-}
+#define VG_N_MALLOC_LISTS 16 // do not change this
-void VG_(replacement_malloc_print_debug_usage)(void)
-{
- VG_(printf)(
-" --trace-malloc=no|yes show client malloc details? [no]\n"
- );
-}
+// On 64-bit systems size_t is 64-bits, so bigger than this is possible.
+// We can worry about that when it happens...
+#define MAX_PSZB 0x7ffffff0
+typedef UChar UByte;
-/*------------------------------------------------------------*/
-/*--- Structs n stuff ---*/
-/*------------------------------------------------------------*/
+/* Block layout:
-#define VG_REDZONE_LO_MASK 0x31415927
-#define VG_REDZONE_HI_MASK 0x14141356
+ this block total szB (sizeof(Int) bytes)
+ freelist previous ptr (sizeof(void*) bytes)
+ red zone bytes (depends on .rz_szB field of Arena)
+ (payload bytes)
+ red zone bytes (depends on .rz_szB field of Arena)
+ freelist next ptr (sizeof(void*) bytes)
+ this block total szB (sizeof(Int) bytes)
-#define VG_N_MALLOC_LISTS 16 /* do not change this */
+ Total size in bytes (bszB) and payload size in bytes (pszB)
+ are related by:
+ bszB == pszB + 2*sizeof(Int) + 2*sizeof(void*) + 2*a->rz_szB
-typedef UInt Word;
-typedef Word WordF;
-typedef Word WordL;
+ Furthermore, both size fields in the block are negative if it is
+ not in use, and positive if it is in use. A block size of zero
+ is not possible, because a block always has at least two Ints and two
+ pointers of overhead.
+ Nb: All Block payloads must be VG_MIN_MALLOC_SZB-aligned. This is
+ achieved by ensuring that Superblocks are VG_MIN_MALLOC_SZB-aligned
+ (see newSuperblock() for how), and that the lengths of the following
+ things are a multiple of VG_MIN_MALLOC_SZB:
+ - Superblock admin section lengths (due to elastic padding)
+ - Block admin section (low and high) lengths (due to elastic redzones)
+ - Block payload lengths (due to req_pszB rounding up)
+*/
+typedef
+ struct {
+ // No fields are actually used in this struct, because a Block has
+ // loads of variable sized fields and so can't be accessed
+ // meaningfully with normal fields. So we use access functions all
+ // the time. This struct gives us a type to use, though. Also, we
+ // make sizeof(Block) 1 byte so that we can do arithmetic with the
+ // Block* type in increments of 1!
+ UByte dummy;
+ }
+ Block;
-/* A superblock. */
+// A superblock. 'padding' is never used, it just ensures that if the
+// entire Superblock is aligned to VG_MIN_MALLOC_SZB, then payload_bytes[]
+// will be too. It can add small amounts of padding unnecessarily -- eg.
+// 8-bytes on 32-bit machines with an 8-byte VG_MIN_MALLOC_SZB -- because
+// it's too hard to make a constant expression that works perfectly in all
+// cases.
+// payload_bytes[] is made a single big Block when the Superblock is
+// created, and then can be split and the splittings remerged, but Blocks
+// always cover its entire length -- there's never any unused bytes at the
+// end, for example.
typedef
struct _Superblock {
struct _Superblock* next;
- /* number of payload words in this superblock. */
- Int n_payload_words;
- Word payload_words[0];
+ Int n_payload_bytes;
+ UByte padding[ VG_MIN_MALLOC_SZB -
+ ((sizeof(void*) + sizeof(Int)) % VG_MIN_MALLOC_SZB) ];
+ UByte payload_bytes[0];
}
Superblock;
-
-/* An arena. */
+// An arena. 'freelist' is a circular, doubly-linked list. 'rz_szB' is
+// elastic, in that it can be bigger than asked-for to ensure alignment.
typedef
struct {
Char* name;
- Bool clientmem; /* allocates in the client address space */
- Int rz_szW; /* Red zone size in words */
- Bool rz_check; /* Check red-zone on free? */
- Int min_sblockW; /* Minimum superblock size */
- WordF* freelist[VG_N_MALLOC_LISTS];
+ Bool clientmem; // Allocates in the client address space?
+ Int rz_szB; // Red zone size in bytes
+ Int min_sblock_szB; // Minimum superblock size in bytes
+ Block* freelist[VG_N_MALLOC_LISTS];
Superblock* sblocks;
- /* Stats only. */
+ // Stats only.
UInt bytes_on_loan;
UInt bytes_mmaped;
@@ -135,64 +126,208 @@ typedef
-/* Block layout:
+/*------------------------------------------------------------*/
+/*--- Low-level functions for working with Blocks. ---*/
+/*------------------------------------------------------------*/
- this block total sizeW (1 word)
- freelist previous ptr (1 word)
- red zone words (depends on .rz_szW field of Arena)
- (payload words)
- red zone words (depends on .rz_szW field of Arena)
- freelist next ptr (1 word)
- this block total sizeW (1 word)
+// Mark a bszB as in-use, and not in-use.
+static __inline__
+Int mk_inuse_bszB ( Int bszB )
+{
+ vg_assert(bszB != 0);
+ return (bszB < 0) ? -bszB : bszB;
+}
+static __inline__
+Int mk_free_bszB ( Int bszB )
+{
+ vg_assert(bszB != 0);
+ return (bszB < 0) ? bszB : -bszB;
+}
- Total size in words (bszW) and payload size in words (pszW)
- are related by
- bszW == pszW + 4 + 2 * a->rz_szW
+// Remove the in-use/not-in-use attribute from a bszB, leaving just
+// the size.
+static __inline__
+Int mk_plain_bszB ( Int bszB )
+{
+ vg_assert(bszB != 0);
+ return (bszB < 0) ? -bszB : bszB;
+}
- Furthermore, both size fields in the block are negative if it is
- not in use, and positive if it is in use. A block size of zero
- is not possible, because a block always has at least four words
- of overhead.
+// Does this bszB have the in-use attribute?
+static __inline__
+Bool is_inuse_bszB ( Int bszB )
+{
+ vg_assert(bszB != 0);
+ return (bszB < 0) ? False : True;
+}
- 8-byte payload alignment is ensured by requiring the number
- of words in the red zones and the number of payload words
- to both be even (% 2 == 0).
-*/
-typedef
- struct {
- Int bszW_lo;
- Word* prev;
- Word* next;
- Word redzone[0];
- }
- BlockHeader;
+// Set and get the lower size field of a block.
+static __inline__
+void set_bszB_lo ( Block* b, Int bszB )
+{
+ *(Int*)&b[0] = bszB;
+}
+static __inline__
+Int get_bszB_lo ( Block* b )
+{
+ return *(Int*)&b[0];
+}
-/*------------------------------------------------------------*/
-/*--- Forwardses ... and misc ... ---*/
-/*------------------------------------------------------------*/
+// Get the address of the last byte in a block
+static __inline__
+UByte* last_byte ( Block* b )
+{
+ UByte* b2 = (UByte*)b;
+ return &b2[mk_plain_bszB(get_bszB_lo(b)) - 1];
+}
-static Bool blockSane ( Arena* a, Word* b );
+// Set and get the upper size field of a block.
+static __inline__
+void set_bszB_hi ( Block* b, Int bszB )
+{
+ UByte* b2 = (UByte*)b;
+ UByte* lb = last_byte(b);
+ vg_assert(lb == &b2[mk_plain_bszB(bszB) - 1]);
+ *(Int*)&lb[-sizeof(Int) + 1] = bszB;
+}
+static __inline__
+Int get_bszB_hi ( Block* b )
+{
+ UByte* lb = last_byte(b);
+ return *(Int*)&lb[-sizeof(Int) + 1];
+}
-/* Align ptr p upwards to an align-sized boundary. */
-static
-void* align_upwards ( void* p, Int align )
+
+// Given the addr of a block, return the addr of its payload.
+static __inline__
+UByte* get_block_payload ( Arena* a, Block* b )
{
- Addr a = (Addr)p;
- if ((a % align) == 0) return (void*)a;
- return (void*)(a - (a % align) + align);
+ UByte* b2 = (UByte*)b;
+ return & b2[sizeof(Int) + sizeof(void*) + a->rz_szB];
+}
+// Given the addr of a block's payload, return the addr of the block itself.
+static __inline__
+Block* get_payload_block ( Arena* a, UByte* payload )
+{
+ return (Block*)&payload[-sizeof(Int) - sizeof(void*) - a->rz_szB];
+}
+
+
+// Set and get the next and previous link fields of a block.
+static __inline__
+void set_prev_b ( Block* b, Block* prev_p )
+{
+ UByte* b2 = (UByte*)b;
+ *(Block**)&b2[sizeof(Int)] = prev_p;
+}
+static __inline__
+void set_next_b ( Block* b, Block* next_p )
+{
+ UByte* lb = last_byte(b);
+ *(Block**)&lb[-sizeof(Int) - sizeof(void*) + 1] = next_p;
+}
+static __inline__
+Block* get_prev_b ( Block* b )
+{
+ UByte* b2 = (UByte*)b;
+ return *(Block**)&b2[sizeof(Int)];
+}
+static __inline__
+Block* get_next_b ( Block* b )
+{
+ UByte* lb = last_byte(b);
+ return *(Block**)&lb[-sizeof(Int) - sizeof(void*) + 1];
+}
+
+
+// Get the block immediately preceding this one in the Superblock.
+static __inline__
+Block* get_predecessor_block ( Block* b )
+{
+ UByte* b2 = (UByte*)b;
+ Int bszB = mk_plain_bszB( (*(Int*)&b2[-sizeof(Int)]) );
+ return (Block*)&b2[-bszB];
+}
+
+// Read and write the lower and upper red-zone bytes of a block.
+static __inline__
+void set_rz_lo_byte ( Arena* a, Block* b, Int rz_byteno, UByte v )
+{
+ UByte* b2 = (UByte*)b;
+ b2[sizeof(Int) + sizeof(void*) + rz_byteno] = v;
+}
+static __inline__
+void set_rz_hi_byte ( Arena* a, Block* b, Int rz_byteno, UByte v )
+{
+ UByte* lb = last_byte(b);
+ lb[-sizeof(Int) - sizeof(void*) - rz_byteno] = v;
+}
+static __inline__
+UByte get_rz_lo_byte ( Arena* a, Block* b, Int rz_byteno )
+{
+ UByte* b2 = (UByte*)b;
+ return b2[sizeof(Int) + sizeof(void*) + rz_byteno];
+}
+static __inline__
+UByte get_rz_hi_byte ( Arena* a, Block* b, Int rz_byteno )
+{
+ UByte* lb = last_byte(b);
+ return lb[-sizeof(Int) - sizeof(void*) - rz_byteno];
+}
+
+
+/* Return the lower, upper and total overhead in bytes for a block.
+ These are determined purely by which arena the block lives in. */
+static __inline__
+Int overhead_szB_lo ( Arena* a )
+{
+ return sizeof(Int) + sizeof(void*) + a->rz_szB;
+}
+static __inline__
+Int overhead_szB_hi ( Arena* a )
+{
+ return sizeof(void*) + sizeof(Int) + a->rz_szB;
+}
+static __inline__
+Int overhead_szB ( Arena* a )
+{
+ return overhead_szB_lo(a) + overhead_szB_hi(a);
+}
+
+// Return the minimum bszB for a block in this arena. Can have zero-length
+// payloads, so it's the size of the admin bytes.
+static __inline__
+Int min_useful_bszB ( Arena* a )
+{
+ return overhead_szB(a);
+}
+
+// Convert payload size <--> block size (both in bytes).
+static __inline__
+Int pszB_to_bszB ( Arena* a, Int pszB )
+{
+ vg_assert(pszB >= 0);
+ return pszB + overhead_szB(a);
+}
+static __inline__
+Int bszB_to_pszB ( Arena* a, Int bszB )
+{
+ Int pszB = bszB - overhead_szB(a);
+ vg_assert(pszB >= 0);
+ return pszB;
}
/*------------------------------------------------------------*/
-/*--- Arena management stuff ---*/
+/*--- Arena management ---*/
/*------------------------------------------------------------*/
-#define CORE_ARENA_MIN_SZW 262144
+#define CORE_ARENA_MIN_SZB 1048576
-/* The arena structures themselves. */
+// The arena structures themselves.
static Arena vg_arena[VG_N_ARENAS];
-/* Functions external to this module identify arenas using ArenaIds,
- not Arena*s. This fn converts the former to the latter. */
+// Functions external to this module identify arenas using ArenaIds,
+// not Arena*s. This fn converts the former to the latter.
static Arena* arenaId_to_ArenaP ( ArenaId arena )
{
@@ -201,19 +336,25 @@ static Arena* arenaId_to_ArenaP ( ArenaI
}
-
-/* Initialise an arena. */
+// Initialise an arena. rz_szB is the minimum redzone size; it might be
+// made bigger to ensure that VG_MIN_MALLOC_ALIGNMENT is observed.
static
-void arena_init ( Arena* a, Char* name,
- Int rz_szW, Bool rz_check, Int min_sblockW, Bool client )
+void arena_init ( ArenaId aid, Char* name, Int rz_szB, Int min_sblock_szB )
{
Int i;
- vg_assert(rz_szW >= 0);
- vg_assert(rz_szW % 2 == 0);
- vg_assert((min_sblockW % VKI_WORDS_PER_PAGE) == 0);
+ Arena* a = arenaId_to_ArenaP(aid);
+
+ vg_assert(rz_szB >= 0);
+ vg_assert((min_sblock_szB % VKI_BYTES_PER_PAGE) == 0);
a->name = name;
- a->clientmem = client;
- a->rz_szW = rz_szW;
- a->rz_check = rz_check;
- a->min_sblockW = min_sblockW;
+ a->clientmem = ( VG_AR_CLIENT == aid ? True : False );
+
+ // The size of the low and high admin sections in a block must be a
+ // multiple of VG_MIN_MALLOC_ALIGNMENT. So we round up the asked-for
+ // redzone size if necessary to achieve this.
+ a->rz_szB = rz_szB;
+ while (0 != overhead_szB_lo(a) % VG_MIN_MALLOC_SZB) a->rz_szB++;
+ vg_assert(overhead_szB_lo(a) == overhead_szB_hi(a));
+
+ a->min_sblock_szB = min_sblock_szB;
for (i = 0; i < VG_N_MALLOC_LISTS; i++) a->freelist[i] = NULL;
a->sblocks = NULL;
@@ -229,25 +369,20 @@ void VG_(print_all_arena_stats) ( void )
Int i;
for (i = 0; i < VG_N_ARENAS; i++) {
+ Arena* a = arenaId_to_ArenaP(i);
VG_(message)(Vg_DebugMsg,
"AR %8s: %8d mmap'd, %8d/%8d max/curr",
- vg_arena[i].name,
- vg_arena[i].bytes_mmaped,
- vg_arena[i].bytes_on_loan_max,
- vg_arena[i].bytes_on_loan
+ a->name, a->bytes_mmaped, a->bytes_on_loan_max, a->bytes_on_loan
);
}
}
-
-/* It is important that this library is self-initialising, because it
- may get called very early on -- as a result of C++ static
- constructor initialisations -- before Valgrind itself is
- initialised. Hence VG_(arena_malloc)() and VG_(arena_free)() below always
- call ensure_mm_init() to ensure things are correctly initialised. */
-
+/* This library is self-initialising, as it makes this more self-contained,
+ less coupled with the outside world. Hence VG_(arena_malloc)() and
+ VG_(arena_free)() below always call ensure_mm_init() to ensure things are
+ correctly initialised. */
static
void ensure_mm_init ( void )
{
- static Int client_rz_szW;
+ static Int client_rz_szB;
static Bool init_done = False;
@@ -256,8 +391,13 @@ void ensure_mm_init ( void )
// happen if VG_(arena_malloc) was called too early, ie. before the
// tool was loaded.
- vg_assert(client_rz_szW == VG_(vg_malloc_redzone_szB)/4);
+ vg_assert(client_rz_szB == VG_(vg_malloc_redzone_szB));
return;
}
+ /* No particular reason for this figure, it's just smallish */
+ sk_assert(VG_(vg_malloc_redzone_szB) < 128);
+ sk_assert(VG_(vg_malloc_redzone_szB) >= 0);
+ client_rz_szB = VG_(vg_malloc_redzone_szB);
+
/* Use checked red zones (of various sizes) for our internal stuff,
and an unchecked zone of arbitrary size for the client. Of
@@ -265,30 +405,20 @@ void ensure_mm_init ( void )
by using addressibility maps, but not by the mechanism implemented
here, which merely checks at the time of freeing that the red
- zone words are unchanged. */
-
- arena_init ( &vg_arena[VG_AR_CORE], "core", 2, True, CORE_ARENA_MIN_SZW, False );
-
- arena_init ( &vg_arena[VG_AR_TOOL], "tool", 2, True, 262144, False );
-
- arena_init ( &vg_arena[VG_AR_SYMTAB], "symtab", 2, True, 262144, False );
-
- arena_init ( &vg_arena[VG_AR_JITTER], "JITter", 2, True, 8192, False );
-
- /* No particular reason for this figure, it's just smallish */
- sk_assert(VG_(vg_malloc_redzone_szB) < 128);
- sk_assert(VG_(vg_malloc_redzone_szB) >= 0);
- client_rz_szW = VG_(vg_malloc_redzone_szB)/4;
-
- arena_init ( &vg_arena[VG_AR_CLIENT], "client",
- client_rz_szW, False, 262144, True );
-
- arena_init ( &vg_arena[VG_AR_DEMANGLE], "demangle", 4 /*paranoid*/,
- True, 16384, False );
-
- arena_init ( &vg_arena[VG_AR_EXECTXT], "exectxt", 2, True, 16384, False );
-
- arena_init ( &vg_arena[VG_AR_ERRORS], "errors", 2, True, 16384, False );
+ zone bytes are unchanged.
- arena_init ( &vg_arena[VG_AR_TRANSIENT], "transien", 2, True, 16384, False );
+ Nb: redzone sizes are *minimums*; they could be made bigger to ensure
+ alignment. Eg. on 32-bit machines, 4 becomes 8, and 12 becomes 16;
+ but on 64-bit machines 4 stays as 4, and 12 stays as 12 --- the extra
+ 4 bytes in both are accounted for by the larger prev/next ptr.
+ */
+ arena_init ( VG_AR_CORE, "core", 4, CORE_ARENA_MIN_SZB );
+ arena_init ( VG_AR_TOOL, "tool", 4, 1048576 );
+ arena_init ( VG_AR_SYMTAB, "symtab", 4, 1048576 );
+ arena_init ( VG_AR_JITTER, "JITter", 4, 32768 );
+ arena_init ( VG_AR_CLIENT, "client", client_rz_szB, 1048576 );
+ arena_init ( VG_AR_DEMANGLE, "demangle", 12/*paranoid*/, 65536 );
+ arena_init ( VG_AR_EXECTXT, "exectxt", 4, 65536 );
+ arena_init ( VG_AR_ERRORS, "errors", 4, 65536 );
+ arena_init ( VG_AR_TRANSIENT, "transien", 4, 65536 );
init_done = True;
@@ -300,22 +430,31 @@ void ensure_mm_init ( void )
/*------------------------------------------------------------*/
-/*--- Superblock management stuff ---*/
+/*--- Superblock management ---*/
/*------------------------------------------------------------*/
+// Align ptr p upwards to an align-sized boundary.
+static
+void* align_upwards ( void* p, Int align )
+{
+ Addr a = (Addr)p;
+ if ((a % align) == 0) return (void*)a;
+ return (void*)(a - (a % align) + align);
+}
+
// If not enough memory available, either aborts (for non-client memory)
// or returns 0 (for client memory).
static
-Superblock* newSuperblock ( Arena* a, Int cszW )
+Superblock* newSuperblock ( Arena* a, Int cszB )
{
+ // The extra VG_MIN_MALLOC_SZB bytes are for possible alignment up.
+ static UByte bootstrap_superblock[CORE_ARENA_MIN_SZB+VG_MIN_MALLOC_SZB];
static Bool called_before = False;
- static Word bootstrap_superblock[CORE_ARENA_MIN_SZW];
- Int cszB;
Superblock* sb;
- cszW += 2; /* Take into account sb->next and sb->n_words fields */
- if (cszW < a->min_sblockW) cszW = a->min_sblockW;
- while ((cszW % VKI_WORDS_PER_PAGE) > 0) cszW++;
+ // Take into account admin bytes in the Superblock.
+ cszB += sizeof(Superblock);
- cszB = cszW * sizeof(Word);
+ if (cszB < a->min_sblock_szB) cszB = a->min_sblock_szB;
+ while ((cszB % VKI_BYTES_PER_PAGE) > 0) cszB++;
if (!called_before) {
@@ -323,8 +462,9 @@ Superblock* newSuperblock ( Arena* a, In
// superblock (see comment at top of main() for details).
called_before = True;
- vg_assert(a == &vg_arena[VG_AR_CORE]);
- vg_assert(CORE_ARENA_MIN_SZW*sizeof(Word) >= cszB);
- sb = (Superblock*)bootstrap_superblock;
-
+ vg_assert(a == arenaId_to_ArenaP(VG_AR_CORE));
+ vg_assert(CORE_ARENA_MIN_SZB >= cszB);
+ // Ensure sb is suitably aligned.
+ sb = (Superblock*)align_upwards( bootstrap_superblock,
+ VG_MIN_MALLOC_SZB );
} else if (a->clientmem) {
// client allocation -- return 0 to client if it fails
@@ -332,32 +472,31 @@ Superblock* newSuperblock ( Arena* a, In
VG_(client_alloc)(0, cszB,
VKI_PROT_READ|VKI_PROT_WRITE|VKI_PROT_EXEC, 0);
- if (NULL == sb) {
+ if (NULL == sb)
return 0;
- }
} else {
- // non-client allocation -- abort if it fails
+ // non-client allocation -- aborts if it fails
sb = VG_(get_memory_from_mmap) ( cszB, "newSuperblock" );
}
- sb->n_payload_words = cszW - 2;
+ vg_assert(NULL != sb);
+ vg_assert(0 == (Addr)sb % VG_MIN_MALLOC_SZB);
+ sb->n_payload_bytes = cszB - sizeof(Superblock);
a->bytes_mmaped += cszB;
if (0)
- VG_(message)(Vg_DebugMsg, "newSuperblock, %d payload words",
- sb->n_payload_words);
+ VG_(message)(Vg_DebugMsg, "newSuperblock, %d payload bytes",
+ sb->n_payload_bytes);
return sb;
}
-
-/* Find the superblock containing the given chunk. */
+// Find the superblock containing the given chunk.
static
-Superblock* findSb ( Arena* a, UInt* ch )
+Superblock* findSb ( Arena* a, Block* b )
{
Superblock* sb;
for (sb = a->sblocks; sb; sb = sb->next)
- if (&sb->payload_words[0] <= ch
- && ch < &sb->payload_words[sb->n_payload_words])
+ if ((Block*)&sb->payload_bytes[0] <= b
+ && b < (Block*)&sb->payload_bytes[sb->n_payload_bytes])
return sb;
- VG_(printf)("findSb: can't find pointer %p in arena `%s'\n",
- ch, a->name );
- VG_(core_panic)("findSb: vg_free() in wrong arena?");
+ VG_(printf)("findSb: can't find pointer %p in arena `%s'\n", b, a->name );
+ VG_(core_panic)("findSb: VG_(arena_free)() in wrong arena?");
return NULL; /*NOTREACHED*/
}
@@ -365,194 +504,61 @@ Superblock* findSb ( Arena* a, UInt* ch
/*------------------------------------------------------------*/
-/*--- Low-level functions for working with blocks. ---*/
+/*--- Command line options ---*/
/*------------------------------------------------------------*/
-/* Add the not-in-use attribute to a bszW. */
-static __inline__
-Int mk_free_bszW ( Int bszW )
-{
- vg_assert(bszW != 0);
- return (bszW < 0) ? bszW : -bszW;
-}
-
-/* Add the in-use attribute to a bszW. */
-static __inline__
-Int mk_inuse_bszW ( Int bszW )
-{
- vg_assert(bszW != 0);
- return (bszW < 0) ? -bszW : bszW;
-}
-
-/* Remove the in-use/not-in-use attribute from a bszW, leaving just
- the size. */
-static __inline__
-Int mk_plain_bszW ( Int bszW )
-{
- vg_assert(bszW != 0);
- return (bszW < 0) ? -bszW : bszW;
-}
-
-/* Does this bszW have the in-use attribute ? */
-static __inline__
-Bool is_inuse_bszW ( Int bszW )
-{
- vg_assert(bszW != 0);
- return (bszW < 0) ? False : True;
-}
-
-
-/* Given the addr of the first word of a block, return the addr of the
- last word. */
-static __inline__
-WordL* first_to_last ( WordF* fw )
-{
- return fw + mk_plain_bszW(fw[0]) - 1;
-}
-
-/* Given the addr of the last word of a block, return the addr of the
- first word. */
-static __inline__
-WordF* last_to_first ( WordL* lw )
-{
- return lw - mk_plain_bszW(lw[0]) + 1;
-}
+/* Round malloc sizes up to a multiple of VG_SLOPPY_MALLOC_SZB bytes?
+ default: NO
+ Nb: the allocator always rounds blocks up to a multiple of
+ VG_MIN_MALLOC_SZB. VG_(clo_sloppy_malloc) is relevant eg. for
+ Memcheck, which will be byte-precise with addressability maps on its
+ malloc allocations unless --sloppy-malloc=yes. */
+Bool VG_(clo_sloppy_malloc) = False;
+/* DEBUG: print malloc details? default: NO */
+Bool VG_(clo_trace_malloc) = False;
-/* Given the addr of the first word of a block, return the addr of the
- first word of its payload. */
-static __inline__
-Word* first_to_payload ( Arena* a, WordF* fw )
-{
- return & fw[2 + a->rz_szW];
-}
+/* Minimum alignment in functions that don't specify alignment explicitly.
+ default: 0, i.e. use VG_MIN_MALLOC_SZB. */
+Int VG_(clo_alignment) = VG_MIN_MALLOC_SZB;
-/* Given the addr of the first word of the payload of a block,
- return the addr of the first word of the block. */
-static __inline__
-Word* payload_to_first ( Arena* a, WordF* payload )
-{
- return & payload[- (2 + a->rz_szW)];
-}
-/* Set and get the lower size field of a block. */
-static __inline__
-void set_bszW_lo ( WordF* fw, Int bszW ) {
- fw[0] = bszW;
-}
-static __inline__
-Int get_bszW_lo ( WordF* fw )
+Bool VG_(replacement_malloc_process_cmd_line_option)(Char* arg)
{
- return fw[0];
-}
-
-
-/* Set and get the next and previous link fields of a block. */
-static __inline__
-void set_prev_p ( WordF* fw, Word* prev_p ) {
- fw[1] = (Word)prev_p;
-}
-static __inline__
-void set_next_p ( WordF* fw, Word* next_p ) {
- WordL* lw = first_to_last(fw);
- lw[-1] = (Word)next_p;
-}
-static __inline__
-Word* get_prev_p ( WordF* fw ) {
- return (Word*)(fw[1]);
-}
-static __inline__
-Word* get_next_p ( WordF* fw ) {
- WordL* lw = first_to_last(fw);
- return (Word*)(lw[-1]);
-}
-
-
-/* Set and get the upper size field of a block. */
-static __inline__
-void set_bszW_hi ( WordF* fw, Int bszW ) {
- WordL* lw = first_to_last(fw);
- vg_assert(lw == fw + mk_plain_bszW(bszW) - 1);
- lw[0] = bszW;
-}
-static __inline__
-Int get_bszW_hi ( WordF* fw ) {
- WordL* lw = first_to_last(fw);
- return lw[0];
-}
-
-/* Get the upper size field of a block, given a pointer to the last
- word of it. */
-static __inline__
-Int get_bszW_hi_from_last_word ( WordL* lw ) {
- WordF* fw = last_to_first(lw);
- return get_bszW_lo(fw);
-}
-
+ if (VG_CLO_STREQN(12, arg, "--alignment=")) {
+ VG_(clo_alignment) = (Int)VG_(atoll)(&arg[12]);
-/* Read and write the lower and upper red-zone words of a block. */
-static __inline__
-void set_rz_lo_word ( Arena* a, WordF* fw, Int rz_wordno, Word w )
-{
- fw[2 + rz_wordno] = w;
-}
-static __inline__
-void set_rz_hi_word ( Arena* a, WordF* fw, Int rz_wordno, Word w )
-{
- WordL* lw = first_to_last(fw);
- lw[-2-rz_wordno] = w;
-}
-static __inline__
-Word get_rz_lo_word ( Arena* a, WordF* fw, Int rz_wordno )
-{
- return fw[2 + rz_wordno];
-}
-static __inline__
-Word get_rz_hi_word ( Arena* a, WordF* fw, Int rz_wordno )
-{
- WordL* lw = first_to_last(fw);
- return lw[-2-rz_wordno];
-}
+ if (VG_(clo_alignment) < VG_MIN_MALLOC_SZB
+ || VG_(clo_alignment) > 4096
+ || VG_(log2)( VG_(clo_alignment) ) == -1 /* not a power of 2 */) {
+ VG_(message)(Vg_UserMsg, "");
+ VG_(message)(Vg_UserMsg,
+ "Invalid --alignment= setting. "
+ "Should be a power of 2, >= %d, <= 4096.", VG_MIN_MALLOC_SZB);
+ VG_(bad_option)("--alignment");
+ }
+ }
+ else VG_BOOL_CLO("--sloppy-malloc", VG_(clo_sloppy_malloc))
+ else VG_BOOL_CLO("--trace-malloc", VG_(clo_trace_malloc))
+ else
+ return False;
-/* Return the lower, upper and total overhead in words for a block.
- These are determined purely by which arena the block lives in. */
-static __inline__
-Int overhead_szW_lo ( Arena* a )
-{
- return 2 + a->rz_szW;
-}
-static __inline__
-Int overhead_szW_hi ( Arena* a )
-{
- return 2 + a->rz_szW;
-}
-static __inline__
-Int overhead_szW ( Arena* a )
-{
- return overhead_szW_lo(a) + overhead_szW_hi(a);
+ return True;
}
-
-/* Convert payload size in words to block size in words, and back. */
-static __inline__
-Int pszW_to_bszW ( Arena* a, Int pszW )
-{
- vg_assert(pszW >= 0);
- return pszW + overhead_szW(a);
-}
-static __inline__
-Int bszW_to_pszW ( Arena* a, Int bszW )
+void VG_(replacement_malloc_print_usage)(void)
{
- Int pszW = bszW - overhead_szW(a);
- vg_assert(pszW >= 0);
- return pszW;
+ VG_(printf)(
+" --sloppy-malloc=no|yes round malloc sizes to multiple of %d? [no]\n"
+" --alignment=<number> set minimum alignment of allocations [%d]\n",
+ VG_SLOPPY_MALLOC_SZB, VG_MIN_MALLOC_SZB
+ );
}
-Int VG_(arena_payload_szB) ( ArenaId aid, void* ptr )
+void VG_(replacement_malloc_print_debug_usage)(void)
{
- Arena* a = arenaId_to_ArenaP(aid);
- Word* fw = payload_to_first(a, (WordF*)ptr);
- Int pszW = bszW_to_pszW(a, get_bszW_lo(fw));
- return VKI_BYTES_PER_WORD * pszW;
+ VG_(printf)(
+" --trace-malloc=no|yes show client malloc details? [no]\n"
+ );
}
@@ -562,51 +568,51 @@ Int VG_(arena_payload_szB) ( ArenaId aid
/*------------------------------------------------------------*/
-/* Determination of which freelist a block lives on is based on the
- payload size, not block size, in words. */
-
-/* Convert a payload size in words to a freelist number. */
+// Nb: Determination of which freelist a block lives on is based on the
+// payload size, not block size.
+// Convert a payload size in bytes to a freelist number.
static
-Int pszW_to_listNo ( Int pszW )
+Int pszB_to_listNo ( Int pszB )
{
- vg_assert(pszW >= 0);
- if (pszW <= 3) return 0;
- if (pszW <= 4) return 1;
- if (pszW <= 5) return 2;
- if (pszW <= 6) return 3;
- if (pszW <= 7) return 4;
- if (pszW <= 8) return 5;
- if (pszW <= 9) return 6;
- if (pszW <= 10) return 7;
- if (pszW <= 11) return 8;
- if (pszW <= 12) return 9;
- if (pszW <= 16) return 10;
- if (pszW <= 32) return 11;
- if (pszW <= 64) return 12;
- if (pszW <= 128) return 13;
- if (pszW <= 256) return 14;
+ vg_assert(pszB >= 0);
+ vg_assert(0 == pszB % VG_MIN_MALLOC_SZB);
+ pszB /= VG_MIN_MALLOC_SZB;
+ if (pszB <= 2) return 0;
+ if (pszB <= 3) return 1;
+ if (pszB <= 4) return 2;
+ if (pszB <= 5) return 3;
+ if (pszB <= 6) return 4;
+ if (pszB <= 7) return 5;
+ if (pszB <= 8) return 6;
+ if (pszB <= 9) return 7;
+ if (pszB <= 10) return 8;
+ if (pszB <= 11) return 9;
+ if (pszB <= 12) return 10;
+ if (pszB <= 16) return 11;
+ if (pszB <= 32) return 12;
+ if (pszB <= 64) return 13;
+ if (pszB <= 128) return 14;
return 15;
}
-
-/* What are the minimum and maximum payload sizes for a given list? */
-
+// What is the minimum payload size for a given list?
static
-Int listNo_to_pszW_min ( Int listNo )
+Int listNo_to_pszB_min ( Int listNo )
{
- Int pszW = 0;
+ Int pszB = 0;
vg_assert(listNo >= 0 && listNo <= VG_N_MALLOC_LISTS);
- while (pszW_to_listNo(pszW) < listNo) pszW++;
- return pszW;
+ while (pszB_to_listNo(pszB) < listNo) pszB += VG_MIN_MALLOC_SZB;
+ return pszB;
}
+// What is the maximum payload size for a given list?
static
-Int listNo_to_pszW_max ( Int listNo )
+Int listNo_to_pszB_max ( Int listNo )
{
vg_assert(listNo >= 0 && listNo <= VG_N_MALLOC_LISTS);
if (listNo == VG_N_MALLOC_LISTS-1) {
- return 999999999;
+ return MAX_PSZB;
} else {
- return listNo_to_pszW_min(listNo+1) - 1;
+ return listNo_to_pszB_min(listNo+1) - 1;
}
}
@@ -621,7 +626,7 @@ static
void swizzle ( Arena* a, Int lno )
{
- UInt* p_best;
- UInt* pp;
- UInt* pn;
+ Block* p_best;
+ Block* pp;
+ Block* pn;
Int i;
@@ -631,13 +636,12 @@ void swizzle ( Arena* a, Int lno )
pn = pp = p_best;
for (i = 0; i < 20; i++) {
- pn = get_next_p(pn);
- pp = get_prev_p(pp);
+ pn = get_next_b(pn);
+ pp = get_prev_b(pp);
if (pn < p_best) p_best = pn;
if (pp < p_best) p_best = pp;
}
if (p_best < a->freelist[lno]) {
-# ifdef DEBUG_MALLOC
- VG_(printf)("retreat by %d\n",
- ((Char*)(a->freelist[lno])) - ((Char*)p_best));
+# ifdef VERBOSE_MALLOC
+ VG_(printf)("retreat by %d\n", a->freelist[lno] - p_best);
# endif
a->freelist[lno] = p_best;
@@ -647,125 +651,25 @@ void swizzle ( Arena* a, Int lno )
/*------------------------------------------------------------*/
-/*--- Creating and deleting blocks. ---*/
-/*------------------------------------------------------------*/
-
-/* Mark the words at b .. b+bszW-1 as not in use, and add them to the
- relevant free list. */
-
-static
-void mkFreeBlock ( Arena* a, Word* b, Int bszW, Int b_lno )
-{
- Int pszW = bszW_to_pszW(a, bszW);
- vg_assert(pszW >= 0);
- vg_assert(b_lno == pszW_to_listNo(pszW));
- /* Set the size fields and indicate not-in-use. */
- set_bszW_lo(b, mk_free_bszW(bszW));
- set_bszW_hi(b, mk_free_bszW(bszW));
-
- /* Add to the relevant list. */
- if (a->freelist[b_lno] == NULL) {
- set_prev_p(b, b);
- set_next_p(b, b);
- a->freelist[b_lno] = b;
- } else {
- Word* b_prev = get_prev_p(a->freelist[b_lno]);
- Word* b_next = a->freelist[b_lno];
- set_next_p(b_prev, b);
- set_prev_p(b_next, b);
- set_next_p(b, b_next);
- set_prev_p(b, b_prev);
- }
-# ifdef DEBUG_MALLOC
- (void)blockSane(a,b);
-# endif
-}
-
-
-/* Mark the words at b .. b+bszW-1 as in use, and set up the block
- appropriately. */
-static
-void mkInuseBlock ( Arena* a, UInt* b, UInt bszW )
-{
- Int i;
- set_bszW_lo(b, mk_inuse_bszW(bszW));
- set_bszW_hi(b, mk_inuse_bszW(bszW));
- set_prev_p(b, NULL);
- set_next_p(b, NULL);
- if (a->rz_check) {
- for (i = 0; i < a->rz_szW; i++) {
- set_rz_lo_word(a, b, i, (UInt)b ^ VG_REDZONE_LO_MASK);
- set_rz_hi_word(a, b, i, (UInt)b ^ VG_REDZONE_HI_MASK);
- }
- }
-# ifdef DEBUG_MALLOC
- (void)blockSane(a,b);
-# endif
-}
-
-
-/* Remove a block from a given list. Does no sanity checking. */
-static
-void unlinkBlock ( Arena* a, UInt* b, Int listno )
-{
- vg_assert(listno >= 0 && listno < VG_N_MALLOC_LISTS);
- if (get_prev_p(b) == b) {
- /* Only one element in the list; treat it specially. */
- vg_assert(get_next_p(b) == b);
- a->freelist[listno] = NULL;
- } else {
- UInt* b_prev = get_prev_p(b);
- UInt* b_next = get_next_p(b);
- a->freelist[listno] = b_prev;
- set_next_p(b_prev, b_next);
- set_prev_p(b_next, b_prev);
- swizzle ( a, listno );
- }
- set_prev_p(b, NULL);
- set_next_p(b, NULL);
-}
-
-
-/* Split an existing free block into two pieces, and put the fragment
- (the second one along in memory) onto the relevant free list.
- req_bszW is the required size of the block which isn't the
- fragment. */
-static
-void splitChunk ( Arena* a, UInt* b, Int b_listno, UInt req_bszW )
-{
- UInt b_bszW;
- Int frag_bszW;
- b_bszW = (UInt)mk_plain_bszW(get_bszW_lo(b));
- vg_assert(req_bszW < b_bszW);
- frag_bszW = b_bszW - req_bszW;
- vg_assert(frag_bszW >= overhead_szW(a));
- /*
- printf( "split %d into %d and %d\n",
- b_bszW,req_bszW,frag_bszW );
- */
- vg_assert(bszW_to_pszW(a, frag_bszW) > 0);
- unlinkBlock(a, b, b_listno);
- mkInuseBlock(a, b, req_bszW);
- mkFreeBlock(a, &b[req_bszW], frag_bszW,
- pszW_to_listNo(bszW_to_pszW(a, frag_bszW)));
-}
-
-
-/*------------------------------------------------------------*/
/*--- Sanity-check/debugging machinery. ---*/
/*------------------------------------------------------------*/
-/* Do some crude sanity checks on a chunk. */
+#define VG_REDZONE_LO_MASK 0x31
+#define VG_REDZONE_HI_MASK 0x7c
+
+// Do some crude sanity checks on a chunk.
static
-Bool blockSane ( Arena* a, Word* b )
+Bool blockSane ( Arena* a, Block* b )
{
# define BLEAT(str) VG_(printf)("blockSane: fail -- %s\n",str)
Int i;
- if (get_bszW_lo(b) != get_bszW_hi(b))
+ if (get_bszB_lo(b) != get_bszB_hi(b))
{BLEAT("sizes");return False;}
- if (a->rz_check && is_inuse_bszW(get_bszW_lo(b))) {
- for (i = 0; i < a->rz_szW; i++) {
- if (get_rz_lo_word(a, b, i) != ((Word)b ^ VG_REDZONE_LO_MASK))
+ if (!a->clientmem && is_inuse_bszB(get_bszB_lo(b))) {
+ for (i = 0; i < a->rz_szB; i++) {
+ if (get_rz_lo_byte(a, b, i) !=
+ (UByte)(((Addr)b&0xff) ^ VG_REDZONE_LO_MASK))
{BLEAT("redzone-lo");return False;}
- if (get_rz_hi_word(a, b, i) != ((Word)b ^ VG_REDZONE_HI_MASK))
+ if (get_rz_hi_byte(a, b, i) !=
+ (UByte)(((Addr)b&0xff) ^ VG_REDZONE_HI_MASK))
{BLEAT("redzone-hi");return False;}
}
@@ -775,11 +679,10 @@ Bool blockSane ( Arena* a, Word* b )
}
-
-/* Print superblocks (only for debugging). */
+// Print superblocks (only for debugging).
static
void ppSuperblocks ( Arena* a )
{
- Int i, ch_bszW, blockno;
- UInt* ch;
+ Int i, b_bszB, blockno;
+ Block* b;
Superblock* sb = a->sblocks;
blockno = 1;
@@ -787,18 +690,14 @@ void ppSuperblocks ( Arena* a )
while (sb) {
VG_(printf)( "\n" );
- VG_(printf)( "superblock %d at %p, sb->n_pl_ws = %d, next = %p\n",
- blockno++, sb, sb->n_payload_words, sb->next );
- i = 0;
- while (True) {
- if (i >= sb->n_payload_words) break;
- ch = &sb->payload_words[i];
- ch_bszW = get_bszW_lo(ch);
- VG_(printf)( " block at %d, bszW %d: ", i, mk_plain_bszW(ch_bszW) );
- VG_(printf)( "%s, ", is_inuse_bszW(ch_bszW) ? "inuse" : "free" );
- VG_(printf)( "%s\n", blockSane(a,ch) ? "ok" : "BAD" );
- i += mk_plain_bszW(ch_bszW);
+ VG_(printf)( "superblock %d at %p, sb->n_pl_bs = %d, next = %p\n",
+ blockno++, sb, sb->n_payload_bytes, sb->next );
+ for (i = 0; i < sb->n_payload_bytes; i += mk_plain_bszB(b_bszB)) {
+ b = (Block*)&sb->payload_bytes[i];
+ b_bszB = get_bszB_lo(b);
+ VG_(printf)( " block at %d, bszB %d: ", i, mk_plain_bszB(b_bszB) );
+ VG_(printf)( "%s, ", is_inuse_bszB(b_bszB) ? "inuse" : "free");
+ VG_(printf)( "%s\n", blockSane(a, b) ? "ok" : "BAD" );
}
- if (i > sb->n_payload_words)
- VG_(printf)( " last block overshoots end of SB\n");
+ vg_assert(i == sb->n_payload_bytes); // no overshoot at end of Sb
sb = sb->next;
}
@@ -806,14 +705,13 @@ void ppSuperblocks ( Arena* a )
}
-
-/* Sanity check both the superblocks and the chains. */
+// Sanity check both the superblocks and the chains.
static void sanity_check_malloc_arena ( ArenaId aid )
{
- Int i, superblockctr, b_bszW, b_pszW, blockctr_sb, blockctr_li;
- Int blockctr_sb_free, listno, list_min_pszW, list_max_pszW;
+ Int i, superblockctr, b_bszB, b_pszB, blockctr_sb, blockctr_li;
+ Int blockctr_sb_free, listno, list_min_pszB, list_max_pszB;
Superblock* sb;
Bool thisFree, lastWasFree;
- Word* b;
- Word* b_prev;
+ Block* b;
+ Block* b_prev;
UInt arena_bytes_on_loan;
Arena* a;
@@ -823,5 +721,5 @@ static void sanity_check_malloc_arena (
a = arenaId_to_ArenaP(aid);
- /* First, traverse all the superblocks, inspecting the chunks in each. */
+ // First, traverse all the superblocks, inspecting the Blocks in each.
superblockctr = blockctr_sb = blockctr_sb_free = 0;
arena_bytes_on_loan = 0;
@@ -830,30 +728,26 @@ static void sanity_check_malloc_arena (
lastWasFree = False;
superblockctr++;
- i = 0;
- while (True) {
- if (i >= sb->n_payload_words) break;
+ for (i = 0; i < sb->n_payload_bytes; i += mk_plain_bszB(b_bszB)) {
blockctr_sb++;
- b = &sb->payload_words[i];
- b_bszW = get_bszW_lo(b);
+ b = (Block*)&sb->payload_bytes[i];
+ b_bszB = get_bszB_lo(b);
if (!blockSane(a, b)) {
- VG_(printf)("sanity_check_malloc_arena: sb %p, block %d (bszW %d): "
- " BAD\n",
- sb, i, b_bszW );
+ VG_(printf)("sanity_check_malloc_arena: sb %p, block %d (bszB %d): "
+ " BAD\n", sb, i, b_bszB );
BOMB;
}
- thisFree = !is_inuse_bszW(b_bszW);
+ thisFree = !is_inuse_bszB(b_bszB);
if (thisFree && lastWasFree) {
- VG_(printf)("sanity_check_malloc_arena: sb %p, block %d (bszW %d): "
+ VG_(printf)("sanity_check_malloc_arena: sb %p, block %d (bszB %d): "
"UNMERGED FREES\n",
- sb, i, b_bszW );
+ sb, i, b_bszB );
BOMB;
}
- lastWasFree = thisFree;
if (thisFree) blockctr_sb_free++;
if (!thisFree)
- arena_bytes_on_loan += sizeof(Word) * bszW_to_pszW(a, b_bszW);
- i += mk_plain_bszW(b_bszW);
+ arena_bytes_on_loan += bszB_to_pszB(a, b_bszB);
+ lastWasFree = thisFree;
}
- if (i > sb->n_payload_words) {
+ if (i > sb->n_payload_bytes) {
VG_(printf)( "sanity_check_malloc_arena: sb %p: last block "
"overshoots end\n", sb);
@@ -864,8 +758,9 @@ static void sanity_check_malloc_arena (
if (arena_bytes_on_loan != a->bytes_on_loan) {
- VG_(printf)(
- "sanity_check_malloc_arena: a->bytes_on_loan %d, "
+# ifdef VERBOSE_MALLOC
+ VG_(printf)( "sanity_check_malloc_arena: a->bytes_on_loan %d, "
"arena_bytes_on_loan %d: "
"MISMATCH\n", a->bytes_on_loan, arena_bytes_on_loan);
+# endif
ppSuperblocks(a);
BOMB;
@@ -877,12 +772,12 @@ static void sanity_check_malloc_arena (
blockctr_li = 0;
for (listno = 0; listno < VG_N_MALLOC_LISTS; listno++) {
- list_min_pszW = listNo_to_pszW_min(listno);
- list_max_pszW = listNo_to_pszW_max(listno);
+ list_min_pszB = listNo_to_pszB_min(listno);
+ list_max_pszB = listNo_to_pszB_max(listno);
b = a->freelist[listno];
if (b == NULL) continue;
while (True) {
b_prev = b;
- b = get_next_p(b);
- if (get_prev_p(b) != b_prev) {
+ b = get_next_b(b);
+ if (get_prev_b(b) != b_prev) {
VG_(printf)( "sanity_check_malloc_arena: list %d at %p: "
"BAD LINKAGE\n",
@@ -890,10 +785,10 @@ static void sanity_check_malloc_arena (
BOMB;
}
- b_pszW = bszW_to_pszW(a, mk_plain_bszW(get_bszW_lo(b)));
- if (b_pszW < list_min_pszW || b_pszW > list_max_pszW) {
+ b_pszB = bszB_to_pszB(a, mk_plain_bszB(get_bszB_lo(b)));
+ if (b_pszB < list_min_pszB || b_pszB > list_max_pszB) {
VG_(printf)(
"sanity_check_malloc_arena: list %d at %p: "
- "WRONG CHAIN SIZE %d (%d, %d)\n",
- listno, b, b_pszW, list_min_pszW, list_max_pszW );
+ "WRONG CHAIN SIZE %dB (%dB, %dB)\n",
+ listno, b, b_pszB, list_min_pszB, list_max_pszB );
BOMB;
}
@@ -904,8 +799,9 @@ static void sanity_check_malloc_arena (
if (blockctr_sb_free != blockctr_li) {
- VG_(printf)(
- "sanity_check_malloc_arena: BLOCK COUNT MISMATCH "
+# ifdef VERBOSE_MALLOC
+ VG_(printf)( "sanity_check_malloc_arena: BLOCK COUNT MISMATCH "
"(via sbs %d, via lists %d)\n",
blockctr_sb_free, blockctr_li );
+# endif
ppSuperblocks(a);
BOMB;
@@ -935,24 +830,23 @@ void VG_(sanity_check_malloc_all) ( void
out if an arena is empty -- currently has no bytes on loan. This
is useful for checking for memory leaks (of valgrind, not the
- client.)
-*/
+ client.) */
Bool VG_(is_empty_arena) ( ArenaId aid )
{
Arena* a;
Superblock* sb;
- WordF* b;
- Int b_bszW;
+ Block* b;
+ Int b_bszB;
ensure_mm_init();
a = arenaId_to_ArenaP(aid);
for (sb = a->sblocks; sb != NULL; sb = sb->next) {
- /* If the superblock is empty, it should contain a single free
- block, of the right size. */
- b = &(sb->payload_words[0]);
- b_bszW = get_bszW_lo(b);
- if (is_inuse_bszW(b_bszW)) return False;
- if (mk_plain_bszW(b_bszW) != sb->n_payload_words) return False;
- /* So this block is not in use and is of the right size. Keep
- going. */
+ // If the superblock is empty, it should contain a single free
+ // block, of the right size.
+ b = (Block*)&sb->payload_bytes[0];
+ b_bszB = get_bszB_lo(b);
+ if (is_inuse_bszB(b_bszB)) return False;
+ if (mk_plain_bszB(b_bszB) != sb->n_payload_bytes) return False;
+ // If we reach here, this block is not in use and is of the right
+ // size, so keep going around the loop...
}
return True;
@@ -960,12 +854,80 @@ Bool VG_(is_empty_arena) ( ArenaId aid )
-/* Turn a request size in bytes into a payload request size in
- words. This means 8-aligning the request size.
-*/
-static __inline__
-Int req_pszB_to_req_pszW ( Int req_pszB )
+/*------------------------------------------------------------*/
+/*--- Creating and deleting blocks. ---*/
+/*------------------------------------------------------------*/
+
+// Mark the bytes at b .. b+bszB-1 as not in use, and add them to the
+// relevant free list.
+
+static
+void mkFreeBlock ( Arena* a, Block* b, Int bszB, Int b_lno )
{
- return ((req_pszB + 7) / 8) /* # of 64-bit units */
- * 2; /* # of 32-bit units */
+ Int pszB = bszB_to_pszB(a, bszB);
+ vg_assert(pszB >= 0);
+ vg_assert(b_lno == pszB_to_listNo(pszB));
+ // Set the size fields and indicate not-in-use.
+ set_bszB_lo(b, mk_free_bszB(bszB));
+ set_bszB_hi(b, mk_free_bszB(bszB));
+
+ // Add to the relevant list.
+ if (a->freelist[b_lno] == NULL) {
+ set_prev_b(b, b);
+ set_next_b(b, b);
+ a->freelist[b_lno] = b;
+ } else {
+ Block* b_prev = get_prev_b(a->freelist[b_lno]);
+ Block* b_next = a->freelist[b_lno];
+ set_next_b(b_prev, b);
+ set_prev_b(b_next, b);
+ set_next_b(b, b_next);
+ set_prev_b(b, b_prev);
+ }
+# ifdef DEBUG_MALLOC
+ (void)blockSane(a,b);
+# endif
+}
+
+// Mark the bytes at b .. b+bszB-1 as in use, and set up the block
+// appropriately.
+static
+void mkInuseBlock ( Arena* a, Block* b, UInt bszB )
+{
+ Int i;
+ vg_assert(bszB >= min_useful_bszB(a));
+ set_bszB_lo(b, mk_inuse_bszB(bszB));
+ set_bszB_hi(b, mk_inuse_bszB(bszB));
+ set_prev_b(b, NULL); // Take off freelist
+ set_next_b(b, NULL); // ditto
+ if (!a->clientmem) {
+ for (i = 0; i < a->rz_szB; i++) {
+ set_rz_lo_byte(a, b, i, (UByte)(((Addr)b&0xff) ^ VG_REDZONE_LO_MASK));
+ set_rz_hi_byte(a, b, i, (UByte)(((Addr)b&0xff) ^ VG_REDZONE_HI_MASK));
+ }
+ }
+# ifdef DEBUG_MALLOC
+ (void)blockSane(a,b);
+# endif
+}
+
+// Remove a block from a given list. Does no sanity checking.
+static
+void unlinkBlock ( Arena* a, Block* b, Int listno )
+{
+ vg_assert(listno >= 0 && listno < VG_N_MALLOC_LISTS);
+ if (get_prev_b(b) == b) {
+ // Only one element in the list; treat it specially.
+ vg_assert(get_next_b(b) == b);
+ a->freelist[listno] = NULL;
+ } else {
+ Block* b_prev = get_prev_b(b);
+ Block* b_next = get_next_b(b);
+ a->freelist[listno] = b_prev;
+ set_next_b(b_prev, b_next);
+ set_prev_b(b_next, b_prev);
+ swizzle ( a, listno );
+ }
+ set_prev_b(b, NULL);
+ set_next_b(b, NULL);
}
@@ -975,9 +937,17 @@ Int req_pszB_to_req_pszW ( Int req_pszB
/*------------------------------------------------------------*/
+// Align the request size.
+static __inline__
+Int align_req_pszB ( Int req_pszB )
+{
+ Int n = VG_MIN_MALLOC_SZB-1;
+ return ((req_pszB + n) & (~n));
+}
+
void* VG_(arena_malloc) ( ArenaId aid, Int req_pszB )
{
- Int req_pszW, req_bszW, frag_bszW, b_bszW, lno;
+ Int req_bszB, frag_bszB, b_bszB, lno;
Superblock* new_sb;
- Word* b;
+ Block* b = NULL;
Arena* a;
void* v;
@@ -988,45 +958,23 @@ void* VG_(arena_malloc) ( ArenaId aid, I
a = arenaId_to_ArenaP(aid);
- vg_assert(req_pszB >= 0);
- vg_assert(req_pszB < 0x7FFFFFF0);
-
- req_pszW = req_pszB_to_req_pszW(req_pszB);
-
- /* Keep gcc -O happy: */
- b = NULL;
-
- /* Start searching at this list. */
- lno = pszW_to_listNo(req_pszW);
+ vg_assert(0 <= req_pszB && req_pszB < MAX_PSZB);
+ req_pszB = align_req_pszB(req_pszB);
+ req_bszB = pszB_to_bszB(a, req_pszB);
- /* This loop finds a list which has a block big enough, or sets
- req_listno to N_LISTS if no such block exists. */
- while (True) {
- if (lno == VG_N_MALLOC_LISTS) break;
- /* If this list is empty, try the next one. */
- if (a->freelist[lno] == NULL) {
- lno++;
- continue;
- }
- /* Scan a->list[lno] to find a big-enough chunk. */
+ // Scan through all the big-enough freelists for a block.
+ for (lno = pszB_to_listNo(req_pszB); lno < VG_N_MALLOC_LISTS; lno++) {
b = a->freelist[lno];
- b_bszW = mk_plain_bszW(get_bszW_lo(b));
+ if (NULL == b) continue; // If this list is empty, try the next one.
while (True) {
- if (bszW_to_pszW(a, b_bszW) >= req_pszW) break;
- b = get_next_p(b);
- b_bszW = mk_plain_bszW(get_bszW_lo(b));
- if (b == a->freelist[lno]) break;
+ b_bszB = mk_plain_bszB(get_bszB_lo(b));
+ if (b_bszB >= req_bszB) goto obtained_block; // success!
+ b = get_next_b(b);
+ if (b == a->freelist[lno]) break; // traversed entire freelist
}
- if (bszW_to_pszW(a, b_bszW) >= req_pszW) break;
- /* No luck? Try a larger list. */
- lno++;
}
- /* Either lno < VG_N_MALLOC_LISTS and b points to the selected
- block, or lno == VG_N_MALLOC_LISTS, and we have to allocate a
- new superblock. */
-
- if (lno == VG_N_MALLOC_LISTS) {
- req_bszW = pszW_to_bszW(a, req_pszW);
- new_sb = newSuperblock(a, req_bszW);
+ // If we reach here, no suitable block found, allocate a new superblock
+ vg_assert(lno == VG_N_MALLOC_LISTS);
+ new_sb = newSuperblock(a, req_bszB);
if (NULL == new_sb) {
// Should only fail if for client, otherwise, should have aborted
@@ -1037,39 +985,38 @@ void* VG_(arena_malloc) ( ArenaId aid, I
new_sb->next = a->sblocks;
a->sblocks = new_sb;
- b = &(new_sb->payload_words[0]);
- lno = pszW_to_listNo(bszW_to_pszW(a, new_sb->n_payload_words));
- mkFreeBlock ( a, b, new_sb->n_payload_words, lno);
- }
+ b = (Block*)&new_sb->payload_bytes[0];
+ lno = pszB_to_listNo(bszB_to_pszB(a, new_sb->n_payload_bytes));
+ mkFreeBlock ( a, b, new_sb->n_payload_bytes, lno);
+ // fall through
- /* Ok, we can allocate from b, which lives in list req_listno. */
+ obtained_block:
+ // Ok, we can allocate from b, which lives in list lno.
vg_assert(b != NULL);
vg_assert(lno >= 0 && lno < VG_N_MALLOC_LISTS);
vg_assert(a->freelist[lno] != NULL);
- b_bszW = mk_plain_bszW(get_bszW_lo(b));
- req_bszW = pszW_to_bszW(a, req_pszW);
- /* req_bszW is the size of the block we are after. b_bszW is the
- size of what we've actually got. */
- vg_assert(b_bszW >= req_bszW);
+ b_bszB = mk_plain_bszB(get_bszB_lo(b));
+ // req_bszB is the size of the block we are after. b_bszB is the
+ // size of what we've actually got. */
+ vg_assert(b_bszB >= req_bszB);
- /* Could we split this block and still get a useful fragment?
- Where "useful" means that the payload size of the frag is at
- least one word. */
- frag_bszW = b_bszW - req_bszW;
- if (frag_bszW > overhead_szW(a)) {
- splitChunk(a, b, lno, req_bszW);
+ // Could we split this block and still get a useful fragment?
+ frag_bszB = b_bszB - req_bszB;
+ if (frag_bszB >= min_useful_bszB(a)) {
+ // Yes, split block in two, put the fragment on the appropriate free
+ // list, and update b_bszB accordingly.
+ // printf( "split %dB into %dB and %dB\n", b_bszB, req_bszB, frag_bszB );
+ unlinkBlock(a, b, lno);
+ mkInuseBlock(a, b, req_bszB);
+ mkFreeBlock(a, &b[req_bszB], frag_bszB,
+ pszB_to_listNo(bszB_to_pszB(a, frag_bszB)));
+ b_bszB = mk_plain_bszB(get_bszB_lo(b));
} else {
- /* No, mark as in use and use as-is. */
+ // No, mark as in use and use as-is.
unlinkBlock(a, b, lno);
- /*
- set_bszW_lo(b, mk_inuse_bszW(b_bszW));
- set_bszW_hi(b, mk_inuse_bszW(b_bszW));
- */
- mkInuseBlock(a, b, b_bszW);
+ mkInuseBlock(a, b, b_bszB);
}
- vg_assert(req_bszW <= mk_plain_bszW(get_bszW_lo(b)));
- a->bytes_on_loan
- += sizeof(Word)
- * bszW_to_pszW(a, mk_plain_bszW(get_bszW_lo(b)));
+ // Update stats
+ a->bytes_on_loan += bszB_to_pszB(a, b_bszB);
if (a->bytes_on_loan > a->bytes_on_loan_max)
a->bytes_on_loan_max = a->bytes_on_loan;
@@ -1080,6 +1027,6 @@ void* VG_(arena_malloc) ( ArenaId aid, I
VGP_POPCC(VgpMalloc);
- v = first_to_payload(a, b);
- vg_assert( (((UInt)v) & 7) == 0 );
+ v = get_block_payload(a, b);
+ vg_assert( (((Addr)v) & (VG_MIN_MALLOC_SZB-1)) == 0 );
return v;
}
@@ -1089,9 +1036,9 @@ void VG_(arena_free) ( ArenaId aid, void
{
Superblock* sb;
- UInt* sb_payl_firstw;
- UInt* sb_payl_lastw;
- UInt* other;
- UInt* ch;
- Int ch_bszW, ch_pszW, other_bszW, ch_listno;
+ UByte* sb_start;
+ UByte* sb_end;
+ Block* other;
+ Block* b;
+ Int b_bszB, b_pszB, other_bszB, b_listno;
Arena* a;
@@ -1106,59 +1053,68 @@ void VG_(arena_free) ( ArenaId aid, void
}
- ch = payload_to_first(a, ptr);
+ b = get_payload_block(a, ptr);
# ifdef DEBUG_MALLOC
- vg_assert(blockSane(a,ch));
+ vg_assert(blockSane(a, b));
# endif
- a->bytes_on_loan
- -= sizeof(Word)
- * bszW_to_pszW(a, mk_plain_bszW(get_bszW_lo(ch)));
+ a->bytes_on_loan -= bszB_to_pszB(a, mk_plain_bszB(get_bszB_lo(b)));
- sb = findSb( a, ch );
- sb_payl_firstw = &(sb->payload_words[0]);
- sb_payl_lastw = &(sb->payload_words[sb->n_payload_words-1]);
+ sb = findSb( a, b );
+ sb_start = &sb->payload_bytes[0];
+ sb_end = &sb->payload_bytes[sb->n_payload_bytes - 1];
- /* Put this chunk back on a list somewhere. */
- ch_bszW = get_bszW_lo(ch);
- ch_pszW = bszW_to_pszW(a, ch_bszW);
- ch_listno = pszW_to_listNo(ch_pszW);
- mkFreeBlock( a, ch, ch_bszW, ch_listno );
+ // Put this chunk back on a list somewhere.
+ b_bszB = get_bszB_lo(b);
+ b_pszB = bszB_to_pszB(a, b_bszB);
+ b_listno = pszB_to_listNo(b_pszB);
+ mkFreeBlock( a, b, b_bszB, b_listno );
- /* See if this block can be merged with the following one. */
- other = ch + ch_bszW;
- /* overhead_szW(a) is the smallest possible bszW for this arena.
- So the nearest possible end to the block beginning at other is
- other+o...
[truncated message content] |
|
From: Nicholas N. <nj...@ca...> - 2004-08-11 08:32:21
|
On Tue, 10 Aug 2004, Jeremy Fitzhardinge wrote: > If we can come up with a way to > 1. simplify the syscall/signal handling, AND > 2. make use of multiple CPUs WITHOUT > 3. compromising the things Valgrind is good for > then I would be very happy. But 3) is the tricky part. Yep. One of the best things about Valgrind, that many users have reported, is that "it just works". I think this covers ease of use, plus the reliability of error messages. Any changes that increase the false positive/false negative rate are bad. Any changes that make Valgrind more difficult to use (eg. different modes) are bad. It's a very difficult set of constraints we want to meet. N |
|
From: Jeremy F. <je...@go...> - 2004-08-11 06:07:42
|
On Wed, 2004-08-11 at 05:09 +0200, Eric Estievenart wrote:
> >>- All memory access are considered non-concurrent, i.e.
> >> when a write is made, we can set the memory value and update the
> >> 1-1 corresponding V bits. Testing them (except for helgrind)
> >> can be done without particular synchronization.
> >
> > Why?
>
> Basically the idea is that we could enforce data synchronization
> through sfence-like opcodes when we detect that the program is
> using inter-thread synchronization mechanisms. This would flush
> the data caches and ensure that updates on data and meta-data
> will be seen by the other threads.
That only applies in the SMP case; you still need to deal with a context
switch at any moment in the UP case.
> There could be an option to ensure that the RMW ops are
> synchronized, but for most programs I don't think this is
> what users want (provided they have a supposed good
> synchronization model and are only looking for small glitches...)
I'm not prepared to make any guesses about what users really want, other
than they have a program which is broken in some way (or perhaps it
isn't obviously broken at all, and they want to see what's going on in
the dark corners). Either way
> The problem is to detect these mechanisms, when they are not
> done through libpthread or syscalls, i.e. through lock + test&set
> opcodes...
Yes, so we need to make sure those instructions work with the correct
atomicity properties from the perspective of the target application.
(Note that all bets are off if the memory is shared with another non-
Valgrind process - we don't make any assurances that the memory access
patterns are really atomic with respect to a non-virtualized instruction
stream).
> Valgrind will behave undeterministically for undeterministic
> programs, but they will behave normally, and most errors will
> be caught. Maybe a bit late; maybe some will be missed, but they
> should be caught if run under helgrind, or with extra synchronization.
But its for the non-deterministic cases when you want V to behave in the
most reliable way. If you get a glitch in run 1, why shouldn't you see
it in run 2, if it doesn't depend on external factors?
> Which means that either helgrind generated code should contain extra
> synchronizations, or should use the internal scheduler
> and not rely on a 1:1 user/kernel thread mapping....
> (so the current signal/threading code must be kept...)
I don't think helgrind is really very different from the other tools.
It is explicitly for looking for concurrency problems, but I find that
memcheck is pretty good at turning up concurrency problems too (thread 2
uses memory before thread 1 has initialized it; and the ever-popular
thread 2 freeing memory before thread 1 has finished with it).
> If memory doesn't (for performance reasons), validity bits
> don't need to for same reasons. Of course it is logical to
> have a mode where they are, if you want extra checks.
Not too keen on lots of different modes with subtle differences like
this.
> > That's hard to determine in general - you can't easily tell if a process
> > has left the handler with longjmp.
>
> If it is not statically-linked, you can hook longjmp, but obviously
> painful if it is. How are sig(set/long)jmp handled in valgrind ?
It isn't - there hasn't been much need.
> But as I think of it, this can be seen easily by other ways,
> because you will never see the thread return to the sighandler
> wrapper, nor to the syscall wrapper. So longjmp can be late-detected
> during the next syscall, and we can restore the current "in signal
> handler" state if the following syscall stack does not contain
> the sighandler wrapper for which we are waiting for the return.
Yes, or you could observe %ESP being incremented over the signal state
on the stack... assuming the longjmp isn't actually a user-level thread
switch to another stack...
> What ? Translating a read or a write to a read or a write ?
> The jit'ed code will (should?) always be atomic concerning the
> user parts, even if run in simultaneous threads, because the
> instructions should be translated to something equivalent;
> just the instructions managing the V bits should not always need
> to be atomic.
The code sequences V generates are not atomic. inc (%eax) is translated
to a read/inc/write sequence, and so doesn't have the same atomicity
properties. Depending on what instrumentation the tool is doing, it may
not be possible to do so.
> > This is the biggest concern - these false/missed reports would be very
> > non-deterministic.
>
> Which should not be the case if the program is deterministic.
> If it is non-deterministic, using the kernel scheduler is likely
> to introduce randomness, which may be good compared to the current mode,
> where threads are always executed in a deterministic way.
>
> For now if you have one thread which:
> - invalidates memory
> - writes valid data to it
>
> and another which reads it, if the first thread code
> is always run in the same code block, the other thread will
> never see it invalid; whereas in real world it will see some garbage
> at some time....
>
> With randomness, you may have a chance to see the error...
The nice thing about Valgrind's complete insight into the program's
operation is that you don't need to use these kinds of stochastic
techniques so much.
If Valgrind controls the scheduling, then you can get it to schedule
threads in any order you want, at any level of granularity. In practice
the kernel scheduler tries to switch as infrequently as possible, but we
switch every 50000 basic blocks (which could easily be adjustable).
So, I think the determinism is an advantage which helps, because we can
control all these aspects to investigate a problem (someone was thinking
about having adjustable scheduling policies so that you could do this
stuff).
> > Basically, it would mean that a tool's model of the state of the system
> > would start to drift away from the real state of the system, in a non-
> > deterministic way. For something like memcheck, where V state is copied
> > around, it could mean there's a large explosion in state drift.
>
> That one is a good point. But if we manage to sync data buses when
> we detect a synchronization primitive, there will be no false positives,
> and the state drift is likely to generate errors only when a
> synchronization is missing. Which should mean "Your mt prog seems to
> lack inter-thread synchro, run it with extra memory synchro (slower),
> and do not forget to try helgrind !"
One of the problems with Helgrind is that it isn't very easy to use. It
has a much too high false error rate, and it can be very complex to
evaluate each error message to see if its false or not. If you write a
program to be helgrind-clean from the start, then it's probably going to
be very useful, but otherwise its hard to use.
Therefore, we need to make sure the other more dependable tools, like
memcheck, are still useful in concurrent programs, even if there are
synchronisation problems.
> > Anything which is dealing with instrumented code will need to be thread-
> > aware.
>
> Well... Did I miss something ? I don't feel it's too much compared
> to what is done actually to manage threads, signals and syscalls...
Well, Tools are already pretty complex to write, and in principle they
should be user-servicable parts. I think its OK for the core to be more
complex, because it is common to all tools, but tools should be as
simple as possible, so we can encourage people to write more of them.
Also, Valgrind as a tool is useless unless you can rely on it. It's
already pretty complex (perhaps too complex), and any additional
complexity really have to prove its worth.
If we can come up with a way to
1. simplify the syscall/signal handling, AND
2. make use of multiple CPUs WITHOUT
3. compromising the things Valgrind is good for
then I would be very happy. But 3) is the tricky part.
J
|
|
From: Eric E. <eri...@fr...> - 2004-08-11 03:09:15
|
Jeremy Fitzhardinge wrote: >>- Only memory allocations/freeing are a bit synchronized, >> through thread-safe wrapping allocators, which initialize >> the A and initial V bits. >> There is no write concurrency on the A bits, >> and all threads read them without locking. > > > This would mean that the RMW operations on the Tool's metadata would not > be atomic with respect to other threads. This means that your results > would be non-deterministic if there is contention. You could argue that > any code which has non-interlocked shared memory accesses is not > deterministic anyway, but on a UP system, RMW instructions are atomic > with respect to interrupts, which is not something we can enforce after > they've been through Valgrind's instruction wringer. On SMP systems you > need the LOCK prefix to get atomic operations, but we don't really > implement it (though we could). >>- All memory access are considered non-concurrent, i.e. >> when a write is made, we can set the memory value and update the >> 1-1 corresponding V bits. Testing them (except for helgrind) >> can be done without particular synchronization. > > Why? Basically the idea is that we could enforce data synchronization through sfence-like opcodes when we detect that the program is using inter-thread synchronization mechanisms. This would flush the data caches and ensure that updates on data and meta-data will be seen by the other threads. There could be an option to ensure that the RMW ops are synchronized, but for most programs I don't think this is what users want (provided they have a supposed good synchronization model and are only looking for small glitches...) The problem is to detect these mechanisms, when they are not done through libpthread or syscalls, i.e. through lock + test&set opcodes... Valgrind will behave undeterministically for undeterministic programs, but they will behave normally, and most errors will be caught. Maybe a bit late; maybe some will be missed, but they should be caught if run under helgrind, or with extra synchronization. Which means that either helgrind generated code should contain extra synchronizations, or should use the internal scheduler and not rely on a 1:1 user/kernel thread mapping.... (so the current signal/threading code must be kept...) >>- Atomic instructions are jit'ed to atomic instructions; >> the test/updates of the V bits does not need to be serialized, >> only helgrind will handle the translations of these instructions >> a bit differently. > > Why wouldn't V state need serializing? If memory doesn't (for performance reasons), validity bits don't need to for same reasons. Of course it is logical to have a mode where they are, if you want extra checks. > That's hard to determine in general - you can't easily tell if a process > has left the handler with longjmp. If it is not statically-linked, you can hook longjmp, but obviously painful if it is. How are sig(set/long)jmp handled in valgrind ? But as I think of it, this can be seen easily by other ways, because you will never see the thread return to the sighandler wrapper, nor to the syscall wrapper. So longjmp can be late-detected during the next syscall, and we can restore the current "in signal handler" state if the following syscall stack does not contain the sighandler wrapper for which we are waiting for the return. >>- Error logging and code instrumentation requests are done >> through a very restricted set of thread-safe calls, >> which perform adequate locking and optionally delegate >> that to a dedicated non-user thread, which can have its >> own signal mask, so as not to interfere with user threads. > > Error reporting is already done through a fairly standard interface. Mmmmmm.... As long as you don't have to parse the error reports ;-) Indeed they are; what maybe misses is just a generic error object which could be given to the core once created by the user thread, without returning to the skin code for formatting and printing it. But I agree it is not the main issue. >>This seems fairly simple, and (if I'm not wrong and it works...) would >>ensure that: >>- applications behave exactly as if they were not instrumented > > Hm. All RMW instructions are atomic on a single-processor machine, so > any code which assumes this would break unless we're careful to maintain > that property - which could be expensive. What ? Translating a read or a write to a read or a write ? The jit'ed code will (should?) always be atomic concerning the user parts, even if run in simultaneous threads, because the instructions should be translated to something equivalent; just the instructions managing the V bits should not always need to be atomic. > This is the biggest concern - these false/missed reports would be very > non-deterministic. Which should not be the case if the program is deterministic. If it is non-deterministic, using the kernel scheduler is likely to introduce randomness, which may be good compared to the current mode, where threads are always executed in a deterministic way. For now if you have one thread which: - invalidates memory - writes valid data to it and another which reads it, if the first thread code is always run in the same code block, the other thread will never see it invalid; whereas in real world it will see some garbage at some time.... With randomness, you may have a chance to see the error... > Basically, it would mean that a tool's model of the state of the system > would start to drift away from the real state of the system, in a non- > deterministic way. For something like memcheck, where V state is copied > around, it could mean there's a large explosion in state drift. That one is a good point. But if we manage to sync data buses when we detect a synchronization primitive, there will be no false positives, and the state drift is likely to generate errors only when a synchronization is missing. Which should mean "Your mt prog seems to lack inter-thread synchro, run it with extra memory synchro (slower), and do not forget to try helgrind !" >>- instrumented apps will certainly run really faster, due to >> fewer (in fact none) context switches when doing syscalls > > It probably isn't a good idea to make sweeping statements like that - > Valgrind has been very good at behaving in very unexpected ways > performance-wise. It isn't likely to hurt performance though. Yes, sorry. I know speed is not a major concern, and it is really acceptable and impressive. I'm a bit enthousiastic ;-) >>- most valgrind code can be written non thread-safe, as for >> most tools code, since they are protected by the restricted >> thread-safe apis > > Anything which is dealing with instrumented code will need to be thread- > aware. Well... Did I miss something ? I don't feel it's too much compared to what is done actually to manage threads, signals and syscalls... -- Eric |
|
From: <js...@ac...> - 2004-08-11 02:58:06
|
Nightly build on nemesis ( SuSE 9.1 ) started at 2004-08-11 03:50:00 BST Checking out source tree ... done Configuring ... done Building ... done Running regression tests ... done Last 20 lines of log.verbose follow sem: valgrind ./sem semlimit: valgrind ./semlimit sha1_test: valgrind ./sha1_test shortpush: valgrind ./shortpush shorts: valgrind ./shorts smc1: valgrind ./smc1 susphello: valgrind ./susphello syscall-restart1: valgrind ./syscall-restart1 syscall-restart2: valgrind ./syscall-restart2 system: valgrind ./system yield: valgrind ./yield -- Finished tests in none/tests ---------------------------------------- == 170 tests, 4 stderr failures, 0 stdout failures ================= corecheck/tests/as_mmap (stderr) corecheck/tests/fdleak_fcntl (stderr) memcheck/tests/writev (stderr) memcheck/tests/zeropage (stderr) make: *** [regtest] Error 1 |
|
From: Tom H. <to...@co...> - 2004-08-11 02:25:48
|
Nightly build on dunsmere ( Fedora Core 2 ) started at 2004-08-11 03:20:02 BST Checking out source tree ... done Configuring ... done Building ... done Running regression tests ... done Last 20 lines of log.verbose follow smc1: valgrind ./smc1 susphello: valgrind ./susphello syscall-restart1: valgrind ./syscall-restart1 syscall-restart2: valgrind ./syscall-restart2 system: valgrind ./system yield: valgrind ./yield -- Finished tests in none/tests ---------------------------------------- == 175 tests, 8 stderr failures, 1 stdout failure ================= corecheck/tests/fdleak_cmsg (stderr) corecheck/tests/fdleak_fcntl (stderr) corecheck/tests/fdleak_ipv4 (stderr) corecheck/tests/fdleak_socketpair (stderr) memcheck/tests/buflen_check (stderr) memcheck/tests/execve (stderr) memcheck/tests/execve2 (stderr) memcheck/tests/writev (stderr) none/tests/exec-sigmask (stdout) make: *** [regtest] Error 1 |
|
From: Tom H. <th...@cy...> - 2004-08-11 02:19:24
|
Nightly build on audi ( Red Hat 9 ) started at 2004-08-11 03:15:03 BST Checking out source tree ... done Configuring ... done Building ... done Running regression tests ... done Last 20 lines of log.verbose follow shorts: valgrind ./shorts smc1: valgrind ./smc1 susphello: valgrind ./susphello syscall-restart1: valgrind ./syscall-restart1 syscall-restart2: valgrind ./syscall-restart2 system: valgrind ./system yield: valgrind ./yield -- Finished tests in none/tests ---------------------------------------- == 175 tests, 8 stderr failures, 0 stdout failures ================= corecheck/tests/fdleak_cmsg (stderr) corecheck/tests/fdleak_fcntl (stderr) corecheck/tests/fdleak_ipv4 (stderr) corecheck/tests/fdleak_socketpair (stderr) memcheck/tests/buflen_check (stderr) memcheck/tests/execve (stderr) memcheck/tests/execve2 (stderr) memcheck/tests/writev (stderr) make: *** [regtest] Error 1 |
|
From: Tom H. <th...@cy...> - 2004-08-11 02:13:19
|
Nightly build on ginetta ( Red Hat 8.0 ) started at 2004-08-11 03:10:02 BST Checking out source tree ... done Configuring ... done Building ... done Running regression tests ... done Last 20 lines of log.verbose follow seg_override: valgrind ./seg_override sem: valgrind ./sem semlimit: valgrind ./semlimit sha1_test: valgrind ./sha1_test shortpush: valgrind ./shortpush shorts: valgrind ./shorts smc1: valgrind ./smc1 susphello: valgrind ./susphello syscall-restart1: valgrind ./syscall-restart1 syscall-restart2: valgrind ./syscall-restart2 system: valgrind ./system yield: valgrind ./yield -- Finished tests in none/tests ---------------------------------------- == 175 tests, 3 stderr failures, 0 stdout failures ================= helgrind/tests/race (stderr) helgrind/tests/race2 (stderr) memcheck/tests/writev (stderr) make: *** [regtest] Error 1 |
|
From: Tom H. <th...@cy...> - 2004-08-11 02:08:15
|
Nightly build on alvis ( Red Hat 7.3 ) started at 2004-08-11 03:05:02 BST Checking out source tree ... done Configuring ... done Building ... done Running regression tests ... done Last 20 lines of log.verbose follow susphello: valgrind ./susphello syscall-restart1: valgrind ./syscall-restart1 syscall-restart2: valgrind ./syscall-restart2 system: valgrind ./system yield: valgrind ./yield -- Finished tests in none/tests ---------------------------------------- == 175 tests, 9 stderr failures, 1 stdout failure ================= addrcheck/tests/toobig-allocs (stderr) helgrind/tests/deadlock (stderr) helgrind/tests/race (stderr) helgrind/tests/race2 (stderr) memcheck/tests/badjump (stderr) memcheck/tests/brk (stderr) memcheck/tests/error_counts (stdout) memcheck/tests/new_nothrow (stderr) memcheck/tests/toobig-allocs (stderr) memcheck/tests/writev (stderr) make: *** [regtest] Error 1 |
|
From: Tom H. <th...@cy...> - 2004-08-11 02:06:59
|
Nightly build on standard ( Red Hat 7.2 ) started at 2004-08-11 03:00:01 BST Checking out source tree ... done Configuring ... done Building ... done Running regression tests ... done Last 20 lines of log.verbose follow rcrl: valgrind ./rcrl readline1: valgrind ./readline1 resolv: valgrind ./resolv rlimit_nofile: valgrind ./rlimit_nofile seg_override: valgrind ./seg_override sem: valgrind ./sem semlimit: valgrind ./semlimit sha1_test: valgrind ./sha1_test shortpush: valgrind ./shortpush shorts: valgrind ./shorts smc1: valgrind ./smc1 susphello: valgrind ./susphello syscall-restart1: valgrind ./syscall-restart1 syscall-restart2: valgrind ./syscall-restart2 system: valgrind ./system yield: valgrind ./yield -- Finished tests in none/tests ---------------------------------------- == 175 tests, 0 stderr failures, 0 stdout failures ================= |
|
From: Jeremy F. <je...@go...> - 2004-08-11 01:25:13
|
On Wed, 2004-08-11 at 02:51 +0200, Eric Estievenart wrote: > Hi guys, > I'm looking at that interesting discussion, and wonder if it is not > possible to do the following thing (but I'm not completely aware > of all V internals): > > - Have each user thread run the instrumented code, and perform all > the syscalls as-is (of course wrapped, but with no delegation > to other thread) > > - Only memory allocations/freeing are a bit synchronized, > through thread-safe wrapping allocators, which initialize > the A and initial V bits. > There is no write concurrency on the A bits, > and all threads read them without locking. This would mean that the RMW operations on the Tool's metadata would not be atomic with respect to other threads. This means that your results would be non-deterministic if there is contention. You could argue that any code which has non-interlocked shared memory accesses is not deterministic anyway, but on a UP system, RMW instructions are atomic with respect to interrupts, which is not something we can enforce after they've been through Valgrind's instruction wringer. On SMP systems you need the LOCK prefix to get atomic operations, but we don't really implement it (though we could). > - All memory access are considered non-concurrent, i.e. > when a write is made, we can set the memory value and update the > 1-1 corresponding V bits. Testing them (except for helgrind) > can be done without particular synchronization. Why? > - Atomic instructions are jit'ed to atomic instructions; > the test/updates of the V bits does not need to be serialized, > only helgrind will handle the translations of these instructions > a bit differently. Why wouldn't V state need serializing? > - Signals are handled normally. > When a syscall blocks in the kernel, that's ok, it blocks > the current thread. > Of course, all POSIX signal handling functions (or syscalls) > may need to be wrapped, so V knows which threads are interested > in which signals, and may update some internal stuff in > sig handler wrappers (like setting a flag telling that the > thread is in a sig handler and must not do certain calls...) That's hard to determine in general - you can't easily tell if a process has left the handler with longjmp. > - When a user thread requires some code to be instrumented, it > synchronizes with other threads, so that only one > performs the instrumentation of that chunk of code. I think we can put a big fat lock around the whole inside of the core without much problem (ie, all the translation machinery). > - Error logging and code instrumentation requests are done > through a very restricted set of thread-safe calls, > which perform adequate locking and optionally delegate > that to a dedicated non-user thread, which can have its > own signal mask, so as not to interfere with user threads. Error reporting is already done through a fairly standard interface. > This seems fairly simple, and (if I'm not wrong and it works...) would > ensure that: > - applications behave exactly as if they were not instrumented Hm. All RMW instructions are atomic on a single-processor machine, so any code which assumes this would break unless we're careful to maintain that property - which could be expensive. > - simplifies a lot signal handling stuff > - simplifies a lot the thread structure > - removes syscall delegation > - ensure that there are few false errors (if we update the V bits > after the memory) or few missed errors (if updated before), > and only for threaded processes This is the biggest concern - these false/missed reports would be very non-deterministic. Basically, it would mean that a tool's model of the state of the system would start to drift away from the real state of the system, in a non- deterministic way. For something like memcheck, where V state is copied around, it could mean there's a large explosion in state drift. > - instrumented apps will certainly run really faster, due to > fewer (in fact none) context switches when doing syscalls It probably isn't a good idea to make sweeping statements like that - Valgrind has been very good at behaving in very unexpected ways performance-wise. It isn't likely to hurt performance though. > - most valgrind code can be written non thread-safe, as for > most tools code, since they are protected by the restricted > thread-safe apis Anything which is dealing with instrumented code will need to be thread- aware. > - it should work on all platforms... Um, maybe. J |
|
From: Eric E. <eri...@fr...> - 2004-08-11 00:51:20
|
Hi guys, I'm looking at that interesting discussion, and wonder if it is not possible to do the following thing (but I'm not completely aware of all V internals): - Have each user thread run the instrumented code, and perform all the syscalls as-is (of course wrapped, but with no delegation to other thread) - Only memory allocations/freeing are a bit synchronized, through thread-safe wrapping allocators, which initialize the A and initial V bits. There is no write concurrency on the A bits, and all threads read them without locking. - All memory access are considered non-concurrent, i.e. when a write is made, we can set the memory value and update the 1-1 corresponding V bits. Testing them (except for helgrind) can be done without particular synchronization. - Atomic instructions are jit'ed to atomic instructions; the test/updates of the V bits does not need to be serialized, only helgrind will handle the translations of these instructions a bit differently. - Signals are handled normally. When a syscall blocks in the kernel, that's ok, it blocks the current thread. Of course, all POSIX signal handling functions (or syscalls) may need to be wrapped, so V knows which threads are interested in which signals, and may update some internal stuff in sig handler wrappers (like setting a flag telling that the thread is in a sig handler and must not do certain calls...) - When a user thread requires some code to be instrumented, it synchronizes with other threads, so that only one performs the instrumentation of that chunk of code. - Error logging and code instrumentation requests are done through a very restricted set of thread-safe calls, which perform adequate locking and optionally delegate that to a dedicated non-user thread, which can have its own signal mask, so as not to interfere with user threads. This seems fairly simple, and (if I'm not wrong and it works...) would ensure that: - applications behave exactly as if they were not instrumented - simplifies a lot signal handling stuff - simplifies a lot the thread structure - removes syscall delegation - ensure that there are few false errors (if we update the V bits after the memory) or few missed errors (if updated before), and only for threaded processes - instrumented apps will certainly run really faster, due to fewer (in fact none) context switches when doing syscalls - most valgrind code can be written non thread-safe, as for most tools code, since they are protected by the restricted thread-safe apis - it should work on all platforms... Just tell me if I missed something, or if I should review my unix manuals... Cheers -- Eric |