|
From: Christoph B. <bar...@or...> - 2007-07-19 11:30:20
|
Hi, I have a new binary of our program and now there is a new problem. The arena exectxt allocates about 14 GB of memory for lots of ExeContexts. Additionally the method VG_(record_ExeContext) has a high runtime. I guess that the function VG_(record_ExeContext) stores all stacks with events for the tool. For memcheck each malloc/free is recorded here. The obvious answer for such a big number of contexts could be that we really have introduced a lot of new context. But can the number grow by such an extent? Another reason could be that I have introduced a bug with the earlier changes that is now triggered. Do you know other reasons why such a thing could happen? The big number of ExeContext leads to a high runtime because the hashtable used has a fixed number of entries. Using a growing hashtable would remove code duplication and reduce the runtime. Here is the ExeContext stat while there are only 0.9 GB used in ExeContext. I will post an updated stat when the run reaches the 14GB. --32605-- exectx: 30,011 lists, 3,200,468 contexts (avg 106 per list) --32605-- exectx: 738,424,447 searches, 2,246,479,141 full compares (3,042 per 1000) --32605-- exectx: 0 cmp2, 36 cmp4, 0 cmpAll Greetings Christoph |
|
From: Christoph B. <bar...@or...> - 2007-07-19 14:28:14
|
Now with 7GB of ExeContext I see the following stats:
--32605-- exectx: 30,011 lists, 26,534,091 contexts (avg 884 per list)
--32605-- exectx: 1,078,173,748 searches, 23,207,430,990 full compares
(21,524 per 1000)
--32605-- exectx: 0 cmp2, 36 cmp4, 0 cmpAll
But I have also found, I guess, the source of the problem.
A routine for calculating rectangle lists intersection that is implemented
recursively:
TreeInsert(Tree t) {
//some stuff
TreeInsert(t.left);
TreeInsert(t.right);
}
The depth of one example tree is 23. This leads to about 8 Mio different
execution contexts.
Greetings
Christoph
|
|
From: Christoph B. <bar...@or...> - 2007-07-19 21:07:33
|
Hi,
here is an example program that shows the problem:
void rekur(int a) {
if (a == 30) {
char * p = new char;
delete p;
return;
}
if (a < 9) {
rekur(a + 1);
return;
}
rekur(a + 1);
rekur(a + 1);
}
int main() {
rekur(0);
}
Valgrind 3.2.3 (with --num-callers=32) allocates 870MB of memory and needs 67
seconds. The native run takes 0.3 seconds and nearly no memory.
Christoph
|
|
From: Christoph B. <bar...@or...> - 2007-07-20 00:07:46
Attachments:
m_execontext.c
|
With the following reimplementation of the execution context, the memory consumption goes down to 500MB and the runtime goes down to 20 seconds. The runtime improvement is due to the bigger hashtable and the size improvement is due to the way tracebacks are stored now. The idea is to use a linked list for the traceback and share common parts between different tracebacks. To do this the external interface only changed for extract_StackTrace. Your comments? One additional question: Why is eq_ExeContext currently not comparing the pointers? I understand that the whole mechanism guarantees that each context has a unique pointer. Greetings Christoph |
|
From: Christoph B. <bar...@or...> - 2007-07-20 00:12:49
|
Am Freitag, 20. Juli 2007 schrieb Nicholas Nethercote: > On Thu, 19 Jul 2007, Christoph Bartoschek wrote: > > I've looked at some tracebacks and I see that the current hashtable with > > 30011 entries is too small for the program. > > A general comment: there's a module m_oset.c which defines the OSet type. > It implements an AVL tree. When I added it, it was intended to replace the > Vg_HashTable type, the idea being that it would scale better as it had no > fixed limits. But it turned out to have noticeable worse constant factors, > and most people aren't running programs nearly as large as the ones you > are, so the Vg_HashTable is still used. It might be worth trying the OSet > instead of the Vg_HashTable for some of these structures. The interfaces > are pretty similar. My experience shows that a dynamically resizing hashtable performs better than an AVL tree. As long as one does not guaranteed access times and sorted iteration over the set there is no reason to use an AVL tree. Christoph |
|
From: Nicholas N. <nj...@cs...> - 2007-07-20 07:49:33
|
On Fri, 20 Jul 2007, Christoph Bartoschek wrote: > With the following reimplementation of the execution context, the memory > consumption goes down to 500MB and the runtime goes down to 20 seconds. > The runtime improvement is due to the bigger hashtable and the size > improvement is due to the way tracebacks are stored now. The idea is to use a > linked list for the traceback and share common parts between different > tracebacks. > > To do this the external interface only changed for extract_StackTrace. > > Your comments? Interesting. One more thing to look at when I have time :) You've undoubtedly realised by now that you are stressing Valgrind's memory allocation much more than anyone else has, both with the size of the memory footprint and the number of allocations! > One additional question: Why is eq_ExeContext currently not comparing the > pointers? I understand that the whole mechanism guarantees that each context > has a unique pointer. The idea is to reduce the number of errors reported. If the same error occurs with different Execontexts, if the ExeContexts are sufficiently similar (eg. the top 4 elements are the same) then it gets reported to the user only once. The exact-pointer comparison is done for ExeContexts related to leak checking if you set --leak-resolution=high. Nick |
|
From: Christoph B. <bar...@or...> - 2007-07-20 08:24:20
Attachments:
m_execontext.c.bz2
|
> > One additional question: Why is eq_ExeContext currently not comparing > > the pointers? I understand that the whole mechanism guarantees that each > > context has a unique pointer. > > The idea is to reduce the number of errors reported. If the same error > occurs with different Execontexts, if the ExeContexts are sufficiently > similar (eg. the top 4 elements are the same) then it gets reported to the > user only once. > > The exact-pointer comparison is done for ExeContexts related to leak > checking if you set --leak-resolution=high. > Ok, here is a version that makes this hopefully correct and uses arena_malloc_fast for allocation and saves additional 100MB of memory. Christoph |
|
From: Josef W. <Jos...@gm...> - 2007-07-20 11:03:12
|
On Friday 20 July 2007, Christoph Bartoschek wrote: > Am Freitag, 20. Juli 2007 schrieb Nicholas Nethercote: > > On Thu, 19 Jul 2007, Christoph Bartoschek wrote: > > > I've looked at some tracebacks and I see that the current hashtable with > > > 30011 entries is too small for the program. > > > > A general comment: there's a module m_oset.c which defines the OSet type. > > It implements an AVL tree. When I added it, it was intended to replace the > > Vg_HashTable type, the idea being that it would scale better as it had no > > fixed limits. But it turned out to have noticeable worse constant factors, > > and most people aren't running programs nearly as large as the ones you > > are, so the Vg_HashTable is still used. It might be worth trying the OSet > > instead of the Vg_HashTable for some of these structures. The interfaces > > are pretty similar. > > My experience shows that a dynamically resizing hashtable performs better than > an AVL tree. As long as one does not guaranteed access times and sorted > iteration over the set there is no reason to use an AVL tree. That is my understanding, too. I pretty much would like a general resizable hash table in Valgrind core; something like the one you posted (in topic "Long valgrind runtimes"). I use them a lot in callgrind for different structures, but have my own implementation(s). However, Vg_HashTable uses a simply key for entry comparision, which is not enough in some cases. What about an optional comparision callback? Josef |
|
From: Christoph B. <bar...@or...> - 2007-07-20 12:42:23
|
> That is my understanding, too. > I pretty much would like a general resizable hash table in Valgrind core; > something like the one you posted (in topic "Long valgrind runtimes"). > > I use them a lot in callgrind for different structures, but have my own > implementation(s). > > However, Vg_HashTable uses a simply key for entry comparision, which is > not enough in some cases. What about an optional comparision callback? Yes, I am currently working on a m_hashtable.c that uses open adressing with linear probing. Each table entry stores a pointer to the item. The user supplies three functions to the implementation: - A function that calculates the hash of an entry. The default would be to return the key. The hash modulo the table size is the bucket of the entry. - A function that compares two entries for equality. The default would be to only compare keys. - A function that deletes the items in the hash table. On the one hand this acts as a destructor that is able to clean up other allocations. On the other hand this allows the usage of the arena_malloc_fast/arena_free_fast pair of functions. I hope that such an implementation is at least as fast as the current one and that it allows me to use less memory for the execution context by getting rid of a pointer. This would reduce the size of an execution context in my implementation from 16 bytes (4 bytes padding) to 8 bytes. Greetings Christoph |
|
From: Nicholas N. <nj...@cs...> - 2007-07-21 01:26:58
|
On Fri, 20 Jul 2007, Christoph Bartoschek wrote: > Yes, I am currently working on a m_hashtable.c that uses open adressing with > linear probing. Each table entry stores a pointer to the item. I would recommend that you look at the interface in pub_tool_oset.h. I designed it pretty carefully, and made it quite generic. Eg. you can even use an OSet to represent ranges. If you could make the hash table implementation use the same interface we could even consider removing the AVL tree implementation and use the hash table as the OSet. Nick |
|
From: Christoph B. <bar...@or...> - 2007-07-22 21:20:47
|
Am Samstag, 21. Juli 2007 schrieb Nicholas Nethercote:
> On Fri, 20 Jul 2007, Christoph Bartoschek wrote:
> > Yes, I am currently working on a m_hashtable.c that uses open adressing
> > with linear probing. Each table entry stores a pointer to the item.
>
> I would recommend that you look at the interface in pub_tool_oset.h. I
> designed it pretty carefully, and made it quite generic. Eg. you can even
> use an OSet to represent ranges. If you could make the hash table
> implementation use the same interface we could even consider removing the
> AVL tree implementation and use the hash table as the OSet.
After reading the header of m_hashtable.c carefully I abandoned my plans to
change it further.
The current code implements a hash map with a UInt key. On the other hand the
execution context code uses a hash set. There are good reasons to separate
both ways to use a hash table. Both Java and C++ use two classes to implement
a hash set and a hash map. They do not try to use one class that covers both
functionalities.
On the other hand linear probing seems to be no choice now. The current hash
table interface exposes the chaining implementation such that clients can
optimize for speed.
A hash map or a hash set can not replace an AVL implementation because it is
not possible to have sorted access or to use range searches.
If I find time, I am going to change my implementation of the execution
context to a hash list with linear probing because it allows me to get rid of
the next pointer in each context. (Valgrind still uses 6GB for all execution
contexts in my application). However I see no direct way to embed execution
contexts into the hashtable:
struct ExeContext {
struct ExeContext * parent; //calling frame
Addr ip;
};
To support such a thing one would need a way to distinguish an empty entry
from a used one and during resize an empty that is already moved from an
unmoved one. Are there ip values that are not possible? Or values for parent?
I guess I can use the fact that all pointers are 4/8 byte aligned but maybe
there is a better way.
Greetings
Christoph
|
|
From: Nicholas N. <nj...@cs...> - 2007-07-22 22:36:15
|
On Sun, 22 Jul 2007, Christoph Bartoschek wrote:
> The current code implements a hash map with a UInt key. On the other hand the
> execution context code uses a hash set. There are good reasons to separate
> both ways to use a hash table. Both Java and C++ use two classes to implement
> a hash set and a hash map. They do not try to use one class that covers both
> functionalities.
The OSet allows both -- if you specify a comparison function, you can
implement a hash map, if you don't it defaults to integer keys, effectively
giving a hash set (if I've understood the distinction).
> On the other hand linear probing seems to be no choice now. The current hash
> table interface exposes the chaining implementation such that clients can
> optimize for speed.
How? If you're worried about changing the interface, don't be too worried,
we've changed things like that before.
> A hash map or a hash set can not replace an AVL implementation because it is
> not possible to have sorted access or to use range searches.
Ah, true.
> If I find time, I am going to change my implementation of the execution
> context to a hash list with linear probing because it allows me to get rid of
> the next pointer in each context. (Valgrind still uses 6GB for all execution
> contexts in my application).
Fair enough.
> However I see no direct way to embed execution contexts into the
> hashtable:
>
> struct ExeContext {
> struct ExeContext * parent; //calling frame
> Addr ip;
> };
>
> To support such a thing one would need a way to distinguish an empty entry
> from a used one and during resize an empty that is already moved from an
> unmoved one. Are there ip values that are not possible? Or values for parent?
>
> I guess I can use the fact that all pointers are 4/8 byte aligned but maybe
> there is a better way.
0 and 1 are unlikely pointer values, for both 'parent' and 'ip'. It's not
great to special-case them like that, but it might be good enough, so long
as it's an internal detail and not exposed to the users of the hash table.
Nick
|