> I can think of the following options:
> 1. Shrink package hash tables during SAVE-LISP-AND-DIE.
> 2. In FINISH-SYMBOLS, count the symbols saved for each package and base
> the calculation of the sizes of the package hash tables on these
> 3. Make UNINTERN check whether the package hash tables get too sparse
> and rehash them if so.
William Newman answered:
> I think #3 is worth doing whether or #1 and #2 are done. An
> application programmer can reasonably expect an exported
> delete-from-table operation to be implemented efficiently. It also
> looks relatively easy to test #3, which should tend to make it easier
> to set up and maintain.
> #1 and #2 could still be improvements worth doing, but in trying to
> determine how much they improve things, I think the best baseline
> would be relative to #3.
Thanks for the feedback!
I have implemented all three variants and tested their effects
It turns out that #3 alone gains nothing, that is, the size of
"sbcl.core" is slightly larger with it than without it. Also, most
package hash tables are exactly as sparse as before. With hindsight
this is easy to explain: While saving the cold core no symbols are
uninterned and while saving the warm core only a few, say, 100 in total,
for instance in SB-IMPL some 30 out of some 3000. This does not suffice
to trigger the down-sizing rehashing.
#1 alone shrinks "sbcl.core" by 737 KB (with a target load factor
of 0.5), or by 807 KB (load factor: 0.6).
#2 alone shrinks "sbcl.core" by 778 KB. I left the parameters in
FINISH-SYMBOLS that determine the load factor at their current values,
which amounts to a load factor of 0.6.
In summary, I still like #1 best because it helps not only SBCL itself
but also custom cores. #3 is nice to have in addition, as it will help
at runtime in cases where lots of symbols are uninterned.
Once #1 is in effect, #2 doesn't affect the final core size any longer,
but can nevertheless be added: I measured a slight speedup of the warm
build phase with it.
All of these changes amount to only a few lines of code, as the common
code (resizing and rehashing a package hash table) is easily factored
out. So I suggest to simply implement all three parts. This is what the
attached patch does.
Apart from the changes necessary to this end there are some gratuitous
- In ADD-SYMBOL, if the table is full and needs rehashing, I changed the
calculation of the new size to disregard deleted entries. This way, if
the table contains deleted entries, the resized hash table will not be
- In MAKE-OR-REMAKE-PACKAGE-HASHTABLE I changed the calculation of the
actual package hash table size to start with CEILING instead of
TRUNCATE of the number of entries asked for and the rehash threshold.
This way the table will in all cases be able to hold the asked for
number of entries without rehashing. (The current version, when asked
for a table to hold 10 entries, for example, will use a vector of
length 13 that can hold only 9 entries.)