|
From: Christoph B. <bar...@or...> - 2007-11-22 09:15:03
|
Hi,
I am currently debugging an application that needs more than 32GB memory.
Valgrind however does not allow me to allocate more. I get the following
message:
Valgrind's memory management: out of memory:
newSuperblock's request for 660615168 bytes failed.
34312749056 bytes have already been allocated.
Valgrind cannot continue. Sorry.
There are several possible reasons for this.
- You have some kind of memory limit in place. Look at the
output of 'ulimit -a'. Is there a limit on the size of
virtual memory or address space?
- You have run out of swap space.
- Valgrind has a bug. If you think this is the case or you are
not sure, please let us know and we'll try to fix it.
Please note that programs can take substantially more memory than
normal when running under Valgrind tools, eg. up to twice or
more, depending on the tool. On a 64-bit machine, Valgrind
should be able to make use of up 32GB memory. On a 32-bit
machine, Valgrind should be able to use all the memory available
to a single process, up to 4GB if that's how you have your
kernel configured. Most 32-bit Linux setups allow a maximum of
3GB per process.
Is there a way to increase the memory allocation limit for valgrind? 200 GB
would be enough.
Greetings
Christoph
|
|
From: Julian S. <js...@ac...> - 2007-11-22 12:39:29
|
> Is there a way to increase the memory allocation limit for valgrind? 200 GB
> would be enough.
You have a machine with 200GB of memory? Is it water cooled? In a lead box?
Try this:
coregrind/m_aspacemgr/aspacemgr-linux.c:1546
aspacem_maxAddr = (Addr)0x800000000 - 1; // 32G
change to
aspacem_maxAddr = (Addr)(0x800000000ULL << 1) - 1; // 64G
Rebuild. See if this allows you to get to 64G without stability
problems. If OK, shift left twice more.
In memcheck/mc_main.c:160
# define N_PRIMARY_BITS 19
Every time you <<= 1 aspacem_maxAddr, add 1 to N_PRIMARY_BITS.
Let us know if it works/fails.
J
|
|
From: Christoph B. <bar...@or...> - 2007-11-22 21:39:03
|
Am Donnerstag, 22. November 2007 schrieb Julian Seward: > > Is there a way to increase the memory allocation limit for valgrind? 200 > > GB would be enough. > > You have a machine with 200GB of memory? Is it water cooled? In a lead > box? Only 128 GB. But we are willing to let it use swap space to get a bug fixed. Normally the program uses 64 GB of memory. The 200GB takes the valgrind overhead into account. > Try this: > > coregrind/m_aspacemgr/aspacemgr-linux.c:1546 > > aspacem_maxAddr = (Addr)0x800000000 - 1; // 32G > > change to > > aspacem_maxAddr = (Addr)(0x800000000ULL << 1) - 1; // 64G > > Rebuild. See if this allows you to get to 64G without stability > problems. If OK, shift left twice more. > > In memcheck/mc_main.c:160 > > # define N_PRIMARY_BITS 19 > > Every time you <<= 1 aspacem_maxAddr, add 1 to N_PRIMARY_BITS. > > Let us know if it works/fails. First test programms show that it works for them. We are trying it on the main program. Expected runtime: 2 weeks. Greetings Christoph |
|
From: Julian S. <js...@ac...> - 2007-11-23 11:46:07
|
> Only 128 GB. But we are willing to let it use swap space to get a bug > fixed. Normally the program uses 64 GB of memory. The 200GB takes the > valgrind overhead into account. You have --freelist-vol= set to a very large number, yes? J |
|
From: Christoph B. <bar...@or...> - 2007-11-23 23:11:57
|
Am Freitag, 23. November 2007 schrieb Julian Seward: > > Only 128 GB. But we are willing to let it use swap space to get a bug > > fixed. Normally the program uses 64 GB of memory. The 200GB takes the > > valgrind overhead into account. > > You have --freelist-vol= set to a very large number, yes? No I use the default value. But I guess that I should use 10GB for our program. Thanks for the hint to set it. Christoph |
|
From: Christoph B. <bar...@or...> - 2007-11-23 17:10:04
|
Hi, I am now running our program and see excessive memory usage by valgrind compared to the normal run. Printing arena statistics I get the following stats: Allocations = 40000000 --00:00:07:04.784 7075-- core : 1048576 mmap'd, 28168/ 28104 max/curr --00:00:07:04.784 7075-- tool : 2038431744 mmap'd, 1294871976/ 1294871976 max/curr --00:00:07:04.784 7075-- symtab : 11395072 mmap'd, 8939048/ 8744624 max/curr --00:00:07:04.784 7075-- client : 1958924288 mmap'd, 1942031576/ 1942031576 max/curr --00:00:07:04.784 7075-- demangle: 0 mmap'd, 0/ 0 max/curr --00:00:07:04.784 7075-- exectxt : 1048576 mmap'd, 384712/ 384712 max/curr --00:00:07:04.784 7075-- errors : 65536 mmap'd, 576/ 576 max/curr --00:00:07:04.784 7075-- ttaux : 65536 mmap'd, 21936/ 21840 max/curr Allocations = 50000000 --00:00:07:44.021 7075-- core : 1048576 mmap'd, 28168/ 28104 max/curr --00:00:07:44.021 7075-- tool : 2747269120 mmap'd, 1745673512/ 915226960 max/curr --00:00:07:44.021 7075-- symtab : 11395072 mmap'd, 8939048/ 8744624 max/curr --00:00:07:44.021 7075-- client : 2088947712 mmap'd, 2072819304/ 2072819304 max/curr --00:00:07:44.021 7075-- demangle: 0 mmap'd, 0/ 0 max/curr --00:00:07:44.021 7075-- exectxt : 1048576 mmap'd, 384712/ 384712 max/curr --00:00:07:44.021 7075-- errors : 65536 mmap'd, 576/ 576 max/curr --00:00:07:44.021 7075-- ttaux : 65536 mmap'd, 21936/ 21840 max/curr Is it normal that the relation (tool memory usage/client memory usage) drops from 0.66 to 0.44? I also wonder why the utilisation of the arena is so poor for the tool. The max allocated memory is 1745673512 and mmaped are 2747269120. Only 64% utilisation. Greetings Christoph |
|
From: Julian S. <js...@ac...> - 2007-11-23 18:18:42
|
> I also wonder why the utilisation of the arena is so poor for the tool. > The max allocated memory is 1745673512 and mmaped are 2747269120. Only 64% > utilisation. Hard to say without further data. Are you using standard 4k pages? J |
|
From: Christoph B. <bar...@or...> - 2007-11-23 23:11:59
|
Am Freitag, 23. November 2007 schrieb Julian Seward: > > I also wonder why the utilisation of the arena is so poor for the tool. > > The max allocated memory is 1745673512 and mmaped are 2747269120. Only > > 64% utilisation. > > Hard to say without further data. Are you using standard 4k pages? yes Christoph |
|
From: Christoph B. <bar...@or...> - 2007-11-24 09:58:27
|
Hi, I have made some experiments with the malloc implementation of valgrind and the implementation I wrote for big programms. Both are running on the latest trunk. I've set the virtual memory limit to 100 GiB. Both implementations hit the limit too early in my opinion. Here is the arena statistic just before the limit is reached and the message when no more memory can be allocated: ------------------------------------------------------------------------------ Valgrind trunk: Runtime: 04:46:48.302 Valgrind's memory management: out of memory: newSuperblock's request for 2628407296 bytes failed. 107358945280 bytes have already been allocated. Valgrind cannot continue. Sorry. --00:04:46:48.302 7075-- core : 1048576 mmap'd, 28168/ 28104 max/curr tool : 46948982784 mmap'd, 28312014304/ 21759622632 max/curr symtab : 11395072 mmap'd, 8939048/ 8744624 max/curr client : 48017752064 mmap'd, 31079618200/ 31079618200 max/curr demangle: 0 mmap'd, 0/ 0 max/curr exectxt : 1048576 mmap'd, 606984/ 606984 max/curr errors : 65536 mmap'd, 576/ 576 max/curr ttaux : 65536 mmap'd, 39384/ 39096 max/curr Sum of mmap'd memory: 94980358144 ------------------------------------------------------------------------------ My malloc implementation: Runtime: 04:30:30.425 Valgrind's memory management: out of memory: allocate_block's request for 1048576 bytes failed. 107359825920 bytes have already been allocated. Valgrind cannot continue. Sorry. core : 8388608 mmap'd, 43704/ 43618 max/curr tool : 42402316352 mmap'd, 42103467986/ 32486461370 max/curr symtab : 13533208 mmap'd, 8946989/ 8753482 max/curr client : 49339127240 mmap'd, 44231564084/ 44231564084 max/curr demangle: 0 mmap'd, 0/ 0 max/curr exectxt : 10485760 mmap'd, 516288/ 457360 max/curr errors : 1048576 mmap'd, 768/ 768 max/curr ttaux : 8388608 mmap'd, 44782/ 44470 max/curr Sum of mmap'd memory: 91783288352 I make the following observations: 1. The trunk implementation wastes more memory. The utilisation of mmap'd memory for the max case is: trunk client: 0.65 trunk tool: 0.6 me client: 0.90 me tool: 0.99 2. The client gets less memory in the trunk version. Only 28.94 GiB instead of 41.19 GiB in my version. 3. There is a gap between the mmap'd memory in trunk 94980358144 and the total number of bytes allocated 107359825920. Where are the 11.52 GiB that are not given to any arena? 4. My version uses only 91783288352 bytes instead of 94980358144. This is because of the missing munmap call for valgrind memory. Approx. 3 GiB are wasted by the missing call. 5. The trunk version takes longer to reach a state of the program where only 3/4 of the memory of my version is used. Now I wonder what happened to the missing 11.52 GiB of memory? Greetings Christoph Bartoschek |
|
From: Christoph B. <bar...@or...> - 2007-11-24 21:42:24
|
Hi, please forget all my remarks regarding my memory allocator. I have changed the red zones for it. No wonder that it uses less memory. Christoph |
|
From: Christoph B. <bar...@or...> - 2007-11-25 17:56:25
|
I wonder how to reduce the memory overhead of valgrind. Currently I see the
following numbers:
Total allocated memory by valgrind: 130 GiB
Memory overhead of aspace manager: 15 GiB
Memory mmap'd by the arenas: 111 GiB
Memory mmap'd by the client arena: 59 GiB
Memory given to the client: 38 GiB
Malloc overhead + Wasted Mem: 21 GiB
Memory mmap'd by the tool arena: 52 GiB
Memory given to the tool: 41 GiB
Malloc overhead + Wasted Mem: 11 GiB
Wasted memory my missing munmap: 3 GiB
Altogether the memory overhead is (130 - 38)/38 = 2.5. I find this number
quite high. I always thought memcheck doubles the used memory.
Is there a way to reduce the overhead of the aspace manager?
Christoph
|
|
From: Christoph B. <bar...@or...> - 2007-11-26 13:39:40
|
Am Sonntag, 25. November 2007 schrieb Christoph Bartoschek: > I wonder how to reduce the memory overhead of valgrind. Currently I see the > following numbers: > > Total allocated memory by valgrind: 130 GiB > Memory overhead of aspace manager: 15 GiB > Memory mmap'd by the arenas: 111 GiB > Memory mmap'd by the client arena: 59 GiB > Memory given to the client: 38 GiB > Malloc overhead + Wasted Mem: 21 GiB > Memory mmap'd by the tool arena: 52 GiB > Memory given to the tool: 41 GiB > Malloc overhead + Wasted Mem: 11 GiB > Wasted memory my missing munmap: 3 GiB > I think the memory overhead of aspace manager is in real the shadow memory. I missed that no arena is used for it. Currently my run takes 188 GiB with the following distribution: core : 458752 mmap'd, 26432/ 26362 max/curr tool : 98739159088 mmap'd, 80585380882/ 65440290042 max/curr symtab : 9246176 mmap'd, 8938189/ 8743762 max/curr client : 80113007878 mmap'd, 50036789540/ 43109716719 max/curr demangle: 0 mmap'd, 0/ 0 max/curr exectxt : 721032 mmap'd, 565912/ 565912 max/curr errors : 65536 mmap'd, 584/ 576 max/curr ttaux : 524288 mmap'd, 47268/ 46972 max/curr The client takes the biggest part of memory. However it allocates only two different sizes: Size: 40 Live blocks: 545464570 Size: 64 Live blocks: 799360540 The allocated chunks with size 40 are the MC_Chunk elements. They have size 32 and occupy 40 bytes with the malloc overhead. The chunks with size 64 are the SecVBitNode elements if I identify them correctly. They are 32 byte big but there are 24 bytes overhead from OSetNode and 8 bytes from malloc. Now I wonder whether it is necessary to have the elements in a sorted set. I do not see where their ordering is used. By using a malloc without overhead for the MC_Chunk elements one would get 4 GiB back. Using the same for SecVBitNode would save 6 GiB. If I am correct one could use a hash table for the SecVBitNode. This would reduce the overhead by about 8 bytes per element. Additionally maybe the garbage collection is no longer necessary because of the average O(1) runtime of the hash operations. This would save additional 8 bytes. Altogher this might save 22 GiB out of 188 GiB. Any comments on this? Christoph |
|
From: Nicholas N. <nj...@cs...> - 2007-11-26 21:31:20
|
On Mon, 26 Nov 2007, Christoph Bartoschek wrote: > The client takes the biggest part of memory. However it allocates only two > different sizes: > > Size: 40 Live blocks: 545464570 > Size: 64 Live blocks: 799360540 > > The allocated chunks with size 40 are the MC_Chunk elements. They have size 32 > and occupy 40 bytes with the malloc overhead. > > The chunks with size 64 are the SecVBitNode elements if I identify them > correctly. They are 32 byte big but there are 24 bytes overhead from OSetNode > and 8 bytes from malloc. > > Now I wonder whether it is necessary to have the elements in a sorted set. I > do not see where their ordering is used. It's not necessary. That's a surprising number of SecVBit nodes. Does your program have a lot of partially-defined bytes? > By using a malloc without overhead for the MC_Chunk elements one would get 4 > GiB back. Using the same for SecVBitNode would save 6 GiB. How would the 'malloc without overhead' work? You'd have to completely rewrite Valgrind's allocator. > If I am correct one could use a hash table for the SecVBitNode. This would > reduce the overhead by about 8 bytes per element. Additionally maybe the > garbage collection is no longer necessary because of the average O(1) runtime > of the hash operations. This would save additional 8 bytes. > > Altogher this might save 22 GiB out of 188 GiB. > > Any comments on this? It's hard to say anything based on speculation. Perhaps you could try implementing some of these things and see what effect they have. You are right at the bleeding edge here, I don't know of anyone else who is using Valgrind on programs anything like as big as yours. Nick |
|
From: Christoph B. <bar...@or...> - 2007-11-26 22:42:06
|
Am Montag, 26. November 2007 schrieb Nicholas Nethercote: > On Mon, 26 Nov 2007, Christoph Bartoschek wrote: > > The client takes the biggest part of memory. However it allocates only > > two different sizes: > > > > Size: 40 Live blocks: 545464570 > > Size: 64 Live blocks: 799360540 > > > > The allocated chunks with size 40 are the MC_Chunk elements. They have > > size 32 and occupy 40 bytes with the malloc overhead. > > > > The chunks with size 64 are the SecVBitNode elements if I identify them > > correctly. They are 32 byte big but there are 24 bytes overhead from > > OSetNode and 8 bytes from malloc. > > > > Now I wonder whether it is necessary to have the elements in a sorted > > set. I do not see where their ordering is used. > > It's not necessary. That's a surprising number of SecVBit nodes. Does > your program have a lot of partially-defined bytes? The program handles very big vlsi routing instances. Therefore we use lots of bitfields to pack data as dense as possible. Some bits however might remain uninitialized. Your comment indicates that (a) SecVBit elements are only created for bytes that are not fully initialized. (b) we can avoid this memory consumption by setting the undefined bits to a default value. Is this correct? However I still have to verify that my assumption is right. I am going to run some tests that count the allocated SecVBitNode elements. > > By using a malloc without overhead for the MC_Chunk elements one would > > get 4 GiB back. Using the same for SecVBitNode would save 6 GiB. > > How would the 'malloc without overhead' work? You'd have to completely > rewrite Valgrind's allocator. Of course there is no malloc without overhead. But it is possible to reduce it compared to the standard general purpose memory allocator. The C standard unfortunately does not have a pair of functions similar to malloc()/free() where one passes the size of the allocated memory to free. This means that malloc has to somehow remember the size of the allocated block. On 64 bits this means at least 8 Byte overhead per allocation. Additionally the program also remembers the size of the allocated block, especially if it stores an array of data. Altogether the size of the allocated block is known most of the time when one calls free(). One example is the MC_Chunk object. It stores the size of the malloc request by the client program. Additionally the size is stored by the memory allocator in the allocated chunk. Using the information in MC_Chunk one could save 8 bytes per allocation from the client program. As far as I can see memcheck only allocates lots of objects of type MC_Chunk and SecVBitNode. The sizes of the objects are always known and constant during the whole program. There is no need to store this in the allocated memory blocks. One way to do this is by using a memory allocator for a specific block size. One could use such a pool allocator for objects of type MC_Chunk or SecVBitNode. Another solution is by using special versions of malloc()/free() that always get the size of the objects passed. One can then write a memory allocator that does not store the size of the allocated blocks and saves precious bytes. Some months ago I had big performance problems with valgrind on our instances. During some tests I wrote a memory allocator that does exactly this. I've posted the allocator in my patch. This allocator is by no means tuned to reduce the memory overhead. But it helped to reduce the memory usage for big programs. And it allowed to reduce the memory usage of the execontext implementation. By using this memory manager instead of the build-in one, the program runs a little bit faster and uses less memory. But to save more memory I have to change more parts of valgrind to use its specific feature to not store the size of the allocated block. > > If I am correct one could use a hash table for the SecVBitNode. This > > would reduce the overhead by about 8 bytes per element. Additionally > > maybe the garbage collection is no longer necessary because of the > > average O(1) runtime of the hash operations. This would save additional 8 > > bytes. > > > > Altogher this might save 22 GiB out of 188 GiB. > > > > Any comments on this? > > It's hard to say anything based on speculation. Perhaps you could try > implementing some of these things and see what effect they have. You are > right at the bleeding edge here, I don't know of anyone else who is using > Valgrind on programs anything like as big as yours. I try to implement some things. Christoph |
>That's a surprising number of SecVBit nodes. Does >>your program have a lot of partially-defined bytes? > > > The program handles very big vlsi routing instances. Therefore we use lots of > bitfields to pack data as dense as possible. Some bits however might remain > uninitialized. Use memset() to zero the entire container which contains multiple bitfields. You will save space and time in memcheck, and might even save time and space [code] in normal execution. Of course this discards the ability to detect some usage errors related to uninitialized bitfields. Also, removing the "extra" calls to memset [say, at some time in the future] will be risky because any latent errors will have been masked for a long time. However, you'll probably enjoy the increased speed and reduced space during memcheck. Be sure to #include <string.h> so that gcc knows memset is memset. On most architectures gcc performs significant optimization when the length is a [small to medium] constant. -- |
|
From: Christoph B. <bar...@or...> - 2007-11-26 16:17:35
|
Hi, I just see, that memcheck remembers the size of the allocated chunk in a MC_Chunk object. Given this, one could use the malloc that does not remember the size and safe additional 4 GiB of memory. But now I asked myself, why not storing the MC_Chunk in the redzone of the client request? Is there a reason not to use this memory for this task? I see a comment about the possibility that a client might mark the redzone accessible. Why should one want this? Is there any case, where this is useful? Greetings Christoph |
|
From: Nicholas N. <nj...@cs...> - 2007-11-26 21:33:19
|
On Mon, 26 Nov 2007, Christoph Bartoschek wrote: > I just see, that memcheck remembers the size of the allocated chunk in a > MC_Chunk object. Given this, one could use the malloc that does not remember > the size and safe additional 4 GiB of memory. "the malloc that does not remember the size" -- what is that? > But now I asked myself, why not storing the MC_Chunk in the redzone of the > client request? Is there a reason not to use this memory for this task? What client request are you referring to? > I see a comment about the possibility that a client might mark the redzone > accessible. Why should one want this? Is there any case, where this is > useful? You could just try reducing the redzone size for the client arena, eg. from 16B down to 8B. That would save you some space, at the risk of missing some heap block overruns. Nick |
|
From: Christoph B. <bar...@or...> - 2007-11-26 21:54:05
|
Am Montag, 26. November 2007 schrieb Nicholas Nethercote: > On Mon, 26 Nov 2007, Christoph Bartoschek wrote: > > I just see, that memcheck remembers the size of the allocated chunk in a > > MC_Chunk object. Given this, one could use the malloc that does not > > remember the size and safe additional 4 GiB of memory. > > "the malloc that does not remember the size" -- what is that? See the answer to your second mail. > > > But now I asked myself, why not storing the MC_Chunk in the redzone of > > the client request? Is there a reason not to use this memory for this > > task? > > What client request are you referring to? The client is the program that is debugged by valgrind. It is analogous to the arena called client. If I understand it correctly a malloc by the client is handled in the following way. 1. Memcheck's version of malloc is called by the client with the requested number of bytes SIZE. 2. VG_(cli_malloc) is called with SIZE. 3. VG_(arena_malloc) is called with SIZE aligned. 4. VG_(arena_malloc) adds two red zones around SIZE bytes and some maintanance overhead. 5. The red zones and the mainantance information are declared unaccessible and SIZE bytes are returned to the client. Now the red zones are not accessible to the client program and they are not used by valgrind itself. My idea is to use the red zones as MC_Chunk objects. The red zones are together 32 unused bytes. I am not sure whether this might work at all because I do not understand all implications. If I understand correctly the user is not able to corrupt this data because it is protected by valgrind's noaccess mechanism. > > I see a comment about the possibility that a client might mark the > > redzone accessible. Why should one want this? Is there any case, where > > this is useful? Chrisotph |
|
From: Nicholas N. <nj...@cs...> - 2007-11-26 23:01:45
|
On Mon, 26 Nov 2007, Christoph Bartoschek wrote: >>> But now I asked myself, why not storing the MC_Chunk in the redzone of >>> the client request? Is there a reason not to use this memory for this >>> task? >> >> What client request are you referring to? > > The client is the program that is debugged by valgrind. It is analogous to the > arena called client. Ah... we use "client request" for something else -- the VALGRIND_* annotations in source code files. See the manual for more info. That's why I was confused. > If I understand it correctly a malloc by the client is handled in the > following way. > > 1. Memcheck's version of malloc is called by the client with the requested > number of bytes SIZE. > 2. VG_(cli_malloc) is called with SIZE. > 3. VG_(arena_malloc) is called with SIZE aligned. > 4. VG_(arena_malloc) adds two red zones around SIZE bytes and some maintanance > overhead. > 5. The red zones and the mainantance information are declared unaccessible and > SIZE bytes are returned to the client. > > Now the red zones are not accessible to the client program and they are not > used by valgrind itself. My idea is to use the red zones as MC_Chunk objects. > The red zones are together 32 unused bytes. I am not sure whether this might > work at all because I do not understand all implications. The point of the redzones is to provide some "dead space" to detect any overruns/underruns past the ends of heap blocks. They have to be marked as inaccessible, otherwise Memcheck won't complain if they're touched. So we can't store the MC_Chunk or any other useful data in them. You could make the redzones smaller, though. Changing MC_MALLOC_REDZONE_SZB in memcheck/mc_include from 16 to 8 will save you some space. Don't bother making it smaller than 8, any smaller and it will be rounded up to sizeof(void*). It could cause some overruns/underruns to be missed, though. > If I understand correctly the user is not able to corrupt this data because it > is protected by valgrind's noaccess mechanism. Not quite. The user will get a warning if they corrupt it, but the corruption will still occur. Nick |
|
From: Nicholas N. <nj...@cs...> - 2007-11-26 23:06:36
|
On Mon, 26 Nov 2007, Christoph Bartoschek wrote: > The program handles very big vlsi routing instances. Therefore we use lots of > bitfields to pack data as dense as possible. Some bits however might remain > uninitialized. > > Your comment indicates that > (a) SecVBit elements are only created for bytes that are not fully > initialized. > (b) we can avoid this memory consumption by setting the undefined bits to a > default value. > > Is this correct? Yes. (b) will hopefully make a big difference. > The C standard unfortunately does not have a pair of functions similar to > malloc()/free() where one passes the size of the allocated memory to free. > This means that malloc has to somehow remember the size of the allocated > block. On 64 bits this means at least 8 Byte overhead per allocation. > Additionally the program also remembers the size of the allocated block, > especially if it stores an array of data. Altogether the size of the > allocated block is known most of the time when one calls free(). > > One example is the MC_Chunk object. It stores the size of the malloc request > by the client program. Additionally the size is stored by the memory > allocator in the allocated chunk. Using the information in MC_Chunk one could > save 8 bytes per allocation from the client program. True. But the advantage of storing the size twice is that the MC_Chunk is stored a long way away from the heap block. So in a buggy program with heap block overruns, it's much less likely to be corrupted. Along with the redzones, Valgrind's/Memcheck's allocator uses a lot of space because of this type of danger. > As far as I can see memcheck only allocates lots of objects of type MC_Chunk > and SecVBitNode. The sizes of the objects are always known and constant > during the whole program. There is no need to store this in the allocated > memory blocks. One way to do this is by using a memory allocator for a > specific block size. One could use such a pool allocator for objects of type > MC_Chunk or SecVBitNode. I see. Yes, custom allocators for these types would reduce the memory consumption. Nick |
|
From: Christoph B. <bar...@or...> - 2007-11-27 00:40:38
|
> > The C standard unfortunately does not have a pair of functions similar to > > malloc()/free() where one passes the size of the allocated memory to > > free. This means that malloc has to somehow remember the size of the > > allocated block. On 64 bits this means at least 8 Byte overhead per > > allocation. Additionally the program also remembers the size of the > > allocated block, especially if it stores an array of data. Altogether the > > size of the allocated block is known most of the time when one calls > > free(). > > > > One example is the MC_Chunk object. It stores the size of the malloc > > request by the client program. Additionally the size is stored by the > > memory allocator in the allocated chunk. Using the information in > > MC_Chunk one could save 8 bytes per allocation from the client program. > > True. But the advantage of storing the size twice is that the MC_Chunk is > stored a long way away from the heap block. So in a buggy program with > heap block overruns, it's much less likely to be corrupted. Along with the > redzones, Valgrind's/Memcheck's allocator uses a lot of space because of > this type of danger. One can still store the information far away from the heap block and store it only once. However I see no reason why one should allow a client program to corrupt the red zone. One could simply ignore such requests. Christoph |
|
From: Nicholas N. <nj...@cs...> - 2007-11-27 00:58:55
|
On Tue, 27 Nov 2007, Christoph Bartoschek wrote: > However I see no reason why one should allow a client program to corrupt the > red zone. One could simply ignore such requests. I think it's one of those "the difference between theory and practice is smaller in theory than in practice". Overrunning a heap block is undefined (IIRC) so we could do this, but I suspect some buggy programs would behave differently if we did. Correctly emulating buggy programs is a tricky business. People complain when their program seg faults normally but not under Valgrind, or vice versa, and I imagine they would complain if red zone corruption was forbidden. N |