|
From: Christoph B. <bar...@gm...> - 2007-07-11 22:19:23
Attachments:
m_hashtable.c
|
Hi, I've looked into the profile after the first fix, that I sent earlier this day, and have seen that the method vgPlain_HT_remove took lots of runtime in my testcase program. The cause is a too small hashtable and chains of up to 99 elements. Therefore I have changed the m_hashtable.c implementation to automatically expand itself if the density of the hash reaches 75% (or 60% in a test). The implementation has at least the following flaws: - The primes are from http://planetmath.org/encyclopedia/GoodHashTablePrimes.html and stop at 31 bits. - One has to call VG_(HT_destruct)(VgHashTable table) because there are two allocations for each table. - There is an additional indirection for the chains array. One could get rid of it but would have to change the HT_add_node interface. Here are the runtimes on an Opteron 2.6GHz machine for the testcase from an earlier post: Native: 25 s Revision 6765 - tool=none: 59 s Revision 6765 - tool=memcheck: 7711 s Superblock fix - none 58 s Superblock fix - memcheck: 903 s Superblock + Hashtable 75% - none: 59 s Superblock + Hashtable 75% - memcheck: 423 s Superblock + Hashtable 60% - none: 59 s Superblock + Hashtable 60% - memcheck: 392 s Greetings Christoph |
|
From: Nicholas N. <nj...@cs...> - 2007-07-11 22:47:23
|
On Thu, 12 Jul 2007, Christoph Bartoschek wrote: > I've looked into the profile after the first fix, that I sent earlier this > day, and have seen that the method vgPlain_HT_remove took lots of runtime in > my testcase program. The cause is a too small hashtable and chains of up to > 99 elements. > > Therefore I have changed the m_hashtable.c implementation to automatically > expand itself if the density of the hash reaches 75% (or 60% in a test). > [...] > > Native: 25 s > Revision 6765 - tool=none: 59 s > Revision 6765 - tool=memcheck: 7711 s > Superblock fix - none 58 s > Superblock fix - memcheck: 903 s > Superblock + Hashtable 75% - none: 59 s > Superblock + Hashtable 75% - memcheck: 423 s > Superblock + Hashtable 60% - none: 59 s > Superblock + Hashtable 60% - memcheck: 392 s Christoph, this is very helpful stuff -- pathological cases like this are a real problem. Thanks for hunting it down. I'm sure it will make its way into future releases. If you're feeling motivated, you might like to try running your modified version(s) on a big program like OpenOffice or Firefox, and see if it speeds them up too. They both do a lot of allocations. Nick |
|
From: Julian S. <js...@ac...> - 2007-07-11 23:57:11
|
> Christoph, this is very helpful stuff -- pathological cases like this are a Yes, I agree. Excellent work. I looked at the revised m_hashtable.c and it looks plausible, although I'm not familiar with its implementation, and no diff. > The implementation has at least the following flaws: > > - The primes are from > http://planetmath.org/encyclopedia/GoodHashTablePrimes.html and stop at > 31 bits. Sounds fairly harmless. AIUI from reading the resize function, if you use up all the prime numbers then after that you double the size of the table and add 1. Yes? > - One has to call VG_(HT_destruct)(VgHashTable table) because there are > two allocations for each table. Not sure what the significance of that is. Does it change the behaviour or required behavior for users of the hashtable? > - There is an additional indirection for the chains array. One could get rid > of it but would have to change the HT_add_node interface. Sounds harmless, apart from minor performance loss, yes? If the hash table automagically expands to suitable sizes, then maybe we can get rid of the argument to VG_(HT_construct) and start off at primes[0] ? I'll look at the Superblock fix too. J |
|
From: Christoph B. <bar...@gm...> - 2007-07-12 06:46:27
|
Am Donnerstag 12 Juli 2007 schrieb Julian Seward: > > Christoph, this is very helpful stuff -- pathological cases like this > > are a > > Yes, I agree. Excellent work. > > I looked at the revised m_hashtable.c and it looks plausible, although > I'm not familiar with its implementation, and no diff. You can make the diff against revision 6765. > > The implementation has at least the following flaws: > > > > - The primes are from > > http://planetmath.org/encyclopedia/GoodHashTablePrimes.html and stop at > > 31 bits. > > Sounds fairly harmless. AIUI from reading the resize function, if you use > up all the prime numbers then after that you double the size of the table > and add 1. Yes? Yes, just to have an odd number. For current programms and machines with up to 128GB of memory, the primes should be enough, but for the future systems this might be a problem. At least for 32bit systems the numers are sufficient, because you have filled your address space before you hit the limit. > > - One has to call VG_(HT_destruct)(VgHashTable table) because there are > > two allocations for each table. > > Not sure what the significance of that is. Does it change the behaviour > or required behavior for users of the hashtable? Till now the user could just free the table pointer. Now he has to call the HT_destruct function. I brief look at the code reveals that there are at least 6 calls to HT_construct and only 1 call to HT_destruct. There are two possibilities: 1. The hash tables are not freed. 2. They are freed directly without calling HT_destruct. > > > - There is an additional indirection for the chains array. One could get > > rid of it but would have to change the HT_add_node interface. > > Sounds harmless, apart from minor performance loss, yes? Yes. > If the hash table automagically expands to suitable sizes, then maybe we > can get rid of the argument to VG_(HT_construct) and start off at > primes[0] ? What about also changing it to linear probing. The comment in pub_tool_hashtable.h mentions that it might be faster and one could remove the next pointer from the _VgHashNode struct and would save space. However one has to find all other structs that inherit from _VgHashNode. Christoph |
|
From: Christoph B. <bar...@gm...> - 2007-07-12 06:49:39
|
> Christoph, this is very helpful stuff -- pathological cases like this are > a real problem. Thanks for hunting it down. I'm sure it will make its > way into future releases. > > If you're feeling motivated, you might like to try running your modified > version(s) on a big program like OpenOffice or Firefox, and see if it > speeds them up too. They both do a lot of allocations. Do you have a standard testcase to check the runtime? I guess that it is hard to get meaningful numbers from interactive tools. Christoph |
|
From: Nicholas N. <nj...@cs...> - 2007-07-12 22:19:46
|
On Thu, 12 Jul 2007, Christoph Bartoschek wrote: >> If you're feeling motivated, you might like to try running your modified >> version(s) on a big program like OpenOffice or Firefox, and see if it >> speeds them up too. They both do a lot of allocations. > > Do you have a standard testcase to check the runtime? I guess that it is hard > to get meaningful numbers from interactive tools. Generally I just start it up and then quit immediately. The numbers do vary a bit, but any significant difference (eg. more than 5--10%) should be obvious. Nick |
|
From: Christoph B. <bar...@or...> - 2007-07-13 09:19:51
Attachments:
m_mallocfree.c
|
After about an hour there seems to be only one function that runs: 75373374 93.0560 memcheck memcheck vgPlain_arena_free More than 93% and increasing in vgPlain_arena_free. However it seems as if this function is not the real bottleneck. I have attached the output of opannotate for the file m_mallocfree.c. It shows that findSb() is the slowest part of the program. I guess it is inlined into vgPlain_arena_free. Greetings Christoph |