From: <lut...@fr...> - 2007-04-01 18:00:33
|
Hi, I wrote: > I can think of the following options: > > 1. Shrink package hash tables during SAVE-LISP-AND-DIE. > > 2. In FINISH-SYMBOLS, count the symbols saved for each package and base > the calculation of the sizes of the package hash tables on these > counts. > > 3. Make UNINTERN check whether the package hash tables get too sparse > and rehash them if so. William Newman answered: > I think #3 is worth doing whether or #1 and #2 are done. An > application programmer can reasonably expect an exported > delete-from-table operation to be implemented efficiently. It also > looks relatively easy to test #3, which should tend to make it easier > to set up and maintain. > > #1 and #2 could still be improvements worth doing, but in trying to > determine how much they improve things, I think the best baseline > would be relative to #3. Thanks for the feedback! I have implemented all three variants and tested their effects (on x86-64). It turns out that #3 alone gains nothing, that is, the size of "sbcl.core" is slightly larger with it than without it. Also, most package hash tables are exactly as sparse as before. With hindsight this is easy to explain: While saving the cold core no symbols are uninterned and while saving the warm core only a few, say, 100 in total, for instance in SB-IMPL some 30 out of some 3000. This does not suffice to trigger the down-sizing rehashing. #1 alone shrinks "sbcl.core" by 737 KB (with a target load factor of 0.5), or by 807 KB (load factor: 0.6). #2 alone shrinks "sbcl.core" by 778 KB. I left the parameters in FINISH-SYMBOLS that determine the load factor at their current values, which amounts to a load factor of 0.6. In summary, I still like #1 best because it helps not only SBCL itself but also custom cores. #3 is nice to have in addition, as it will help at runtime in cases where lots of symbols are uninterned. Once #1 is in effect, #2 doesn't affect the final core size any longer, but can nevertheless be added: I measured a slight speedup of the warm build phase with it. All of these changes amount to only a few lines of code, as the common code (resizing and rehashing a package hash table) is easily factored out. So I suggest to simply implement all three parts. This is what the attached patch does. Apart from the changes necessary to this end there are some gratuitous ones: - In ADD-SYMBOL, if the table is full and needs rehashing, I changed the calculation of the new size to disregard deleted entries. This way, if the table contains deleted entries, the resized hash table will not be too sparse. - In MAKE-OR-REMAKE-PACKAGE-HASHTABLE I changed the calculation of the actual package hash table size to start with CEILING instead of TRUNCATE of the number of entries asked for and the rehash threshold. This way the table will in all cases be able to hold the asked for number of entries without rehashing. (The current version, when asked for a table to hold 10 entries, for example, will use a vector of length 13 that can hold only 9 entries.) Kind regards Lutz Euler |