|
From: Christoph B. <bar...@or...> - 2007-08-16 08:41:34
Attachments:
valgrind.big-program.patch.bz2
|
Hi, I have bundled all changes to the current trunk I made in the attached patch. The patch allows you to run valgrind on big cpu-bound programs that do lots of allocations and deallocatoins and normally run one hour or more. My experience shows that for such programs valgrinds memcheck needs more than a week of runtime to complete. The patch helps you to reduce the runtime to one or two days. When should you try this patch? If the ratio between normal runtime and valgrind runtime gets worse than 40 this patch might help. If OProfile shows that less than 40% of the total runtime is spent in code generated by valgrind this patch might help. If you have millions of allocations and deallocations with millions of different tracebacks, this patch helps to reduce memory consumption and decreases runtime. What are the drawbacks? The patch is only tested by three users so far. And there are some regressions in the leak-check tool. I guess this is due to the different way memory allocations are handled. I do not see any regressions in normal memcheck oprations. Any comments are appreciated Christoph Bartoschek |
|
From: Greg S. <gms...@us...> - 2007-08-16 17:00:17
|
To provide a first-hand success story, this patch proved essential to our development team recently. We have had a long-standing memory corruption bug that was impossible to track down because running the testcase under Valgrind took upwards of 10 days, and the run consistently died before getting to the problem. With Christoph's patch, the run finished in 2 days, and correctly pointed out the bug. No other tool in our arsenal flagged the problem - we are highly dependent on Valgrind for any Linux-specific problems! Thus, I would urge the Valgrind developers to validate and adopt this patch so that it may become official, and the general community may use it. Thanks for your efforts in making this a world-class tool! -Greg Schaeffer |
|
From: Julian S. <js...@ac...> - 2007-08-25 08:08:55
|
On Thursday 16 August 2007 16:25, Greg Schaeffer wrote: > Thus, I would urge the Valgrind developers to validate and adopt this patch > so that it may become official, and the general community may use it. I've merged many but not all parts of the patch into the svn trunk now (as of svn r6778), and would be interested to hear what the effect is for people with large programs. The changes I merged are: * m_mallocfree.c: use many more freelists. * m_mallocfree.c: binary search of Superblocks when freeing. These speed up programs which have millions of blocks live at the same time. I didn't merge in the complete rewrite of this module though. * m_execontext.c: allow the ExeContext table to expand dynamically. Contexts are still represented as arrays though, not as shared linked lists. * m_hashtable.c: allow these to expand dynamically. J |
|
From: Julian S. <js...@ac...> - 2007-08-17 16:07:27
|
Hi. Potentially useful patch. Some explaination of what you did would be useful though. > I have bundled all changes to the current trunk I made in the attached > patch. m_hashtable: looks ok. AFAICS you have not changed the interface to m_hashtable, just made it do dynamically expanding hash tables. Yes? m_execontext: you are using a dynamically expanding hash table and also building a tree (trees?) of code locations, so that traces with common tails are represented only once? There is some change to the interface semantics for VG_(extract_stacktrace) but I am not sure what it is. m_mallocfree: many changes here, can you summarise? Also, what are VG_(arena_malloc_fast) and VG_(arena_free_fast)? J |
|
From: Christoph B. <bar...@or...> - 2007-08-17 23:37:32
|
Am Freitag, 17. August 2007 schrieb Julian Seward: > m_hashtable: looks ok. AFAICS you have not changed the interface to > m_hashtable, just made it do dynamically expanding hash tables. Yes? Yes, the hashtable is now dynamically expanding. The only minor change in the interface is that users have to call the HT_destruct function to cleanly clean up a hash table. > m_execontext: you are using a dynamically expanding hash table and > also building a tree (trees?) of code locations, so that traces with > common tails are represented only once? The module builds a forest of stack traces. Each stack trace corresponds to a path from a node to the root of a tree. Each node is an address from the stack trace and the parent of a node is its caller. If two stack traces have not a common start they are in different trees. This can easily happen if you limit the depth of the stack trace. The nodes are also in a dynamically expanding hash table to easily locate them if necessary. I have expanded the prime numbers to numbers higher 2^40. For 32bit this is more than enough. For 64bit this should be enough for the near future. Because I did not know better I have copied them to m_execontext. > There is some change to the interface semantics for VG_(extract_stacktrace) > but I am not sure what it is. The interface has changed because the callers of the old interface expected to get a pointer to an array of addresses. To get such an array it has to be constructed from the information in the tree the execution context belongs to. The array has now to be given as the second parameter such that the function can fill it. One could change the interface to UInt VG_(extract_StackTrace) (ExeContext * e, Addr ips[VG_DEEPEST_BACKTRACE]) and return the number of frames in the stack trace. > m_mallocfree: many changes here, can you summarise? Also, > what are VG_(arena_malloc_fast) and VG_(arena_free_fast)? First I had changed the superblock lookup to a binary search. Buth then I had to save as many memory as possible to be able to store millions of execution contexts. To do this I decided to replace the memory manager. The memory manager I wrote needs more memory than the old one for small programs because it is not yet tuned for small programs and because it does not try to merge free blocks again. But for the programs I have targeted the memory consumption drastically reduced. As the old memory manager there are three kinds of memory chunks one can allocate. There are memory chunks in the linear range from 8 to 256 bytes. Then there is an exponential growing range from 512 bytes to 1 MiB. And finally there are memory chunks bigger than 1 MiB. The memory chunks bigger than 1 MiB are allocated via mmap and freed by munmap. The smaller chunks are in free lists that hold chunks of identical size. The value of size_table[i] gives you the size of chunks in list i. If a client requests memory the corresponding list is searched for a free chunk. If there is none the memory manager searches the bigger lists for a memory chunks and finally has to allocate new memory via sbrk. The allocated memory is then spread over the free lists. The client has to pass the size of the chunk he wants to allocate and to free such that the memory manager knows which lists to use. This is done in the arena_malloc_fast and arena_free_fast functions. For a normal malloc and free one has to remember the size of the memory chunk because free has no size_t parameter. This is done in the normal arena_malloc and arena_free functions. Most of the time one does know the size of the memory chunk one wants to free. For example this is the case for the m_execontext module. It does not make sense to waste memory for the size information in each allocated chunk. That's why I use the arena_malloc_fast functions there and save about 100 MiB in the example program with exponential many execution contexts. Big valgrind requests are currently not freed because there is no munmap method for valgrind memory. I did not try to clean up the whole file sucht that there are still methods from the old memory manager. Greetings Christoph |