looking at "sbcl.core" with a hex editor I noticed largish areas
consisting of zeroes sparsely interrupted by nonzero bytes or words.
It turns out that many of these are the arrays that implement package
hash tables, each hash table consisting of one byte array and one word
array, both containing zeroes in empty slots. Lots of these tables are
way too sparse, some even completely empty, for example:
* (inspect (find-package :sb-sys))
#<SB-IMPL::PACKAGE-HASHTABLE :SIZE 1733 :FREE 1730 :DELETED 0>
* (inspect (find-package :sb-ext))
#<SB-IMPL::PACKAGE-HASHTABLE :SIZE 1812 :FREE 1812 :DELETED 0>
In sum, there are about 21500 symbols in SBCL stored in about 115500
pairs of array slots. Populating the package hash tables more densely,
for example using a density of 0.5, reduces the core size by 2 percent
(750 KB on x86-64) and could reduce cache usage in the reader.
(For reference: when interning symbols INTERN doubles the size of the
tables and rehashes the entries when the density gets larger than 0.75.)
If this is considered a reasonable improvement to SBCL, may I ask for
some advice on how best to implement it?
As far as I understand, the sizes of the hash tables currently get into
sbcl.core as follows: The package hash tables grow as large as needed
while SBCL compiles itself and interns lots of symbols from the source
code. When writing the cold core only the symbols listed in
*COLD-PACKAGE-SYMBOLS* are saved but the hash table sizes are calculated
from all symbols in the corresponding packages (see FINISH-SYMBOLS).
Later during make-target-2-load.lisp !UNINTERN-INIT-ONLY-STUFF is called
and uninterns symbols with funny names. It doesn't touch the package
hash table sizes which get into sbcl.core when finally SAVE-LISP-AND-DIE
I can think of the following options:
1. Shrink package hash tables during SAVE-LISP-AND-DIE.
Advantage: Not only SBCL itself benefits but also user-saved cores,
provided the user uninterns unneeded symbols before calling
2. In FINISH-SYMBOLS, count the symbols saved for each package and base
the calculation of the sizes of the package hash tables on these
Disadvantage: This only helps for packages whose symbol count is
approximately the same in the cold and the final core.
3. Make UNINTERN check whether the package hash tables get too sparse
and rehash them if so.
Personally, I prefer option 1 as it provides the most control over the
space used by the hash tables in sbcl.core.