<#part type="application/octet-stream" filename="~/src/sb-studio/repos/sbcl/src/code/huffman.lisp" disposition=attachment description="src/code/huffman.lisp">
<#part type="text/x-patch" filename="~/src/sb-studio/repos/sbcl/unicode-names.patch" disposition=inline description=patch>
Date: Tue, 21 Feb 2006 11:36:35 +0000
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux)
Content-Type: text/plain; charset=us-ascii
Attached is a resurrected version of my unicode character name patch,
against SBCL 0.9.935, plus new file src/code/huffman.lisp.
Character-to-name and name-to-character mappings are both O(ln
size-of-character-space), which seems reasonable enough for me.
This bloats the core by 600k or so, with the character names
huffman-encoded. Without encoding the bloat is around 900k, so the win
is not huge but reasonable, and slightly better then using 6bits to
encode the 32 symbols encountered in character names.
I did some quick experiments using extended huffman-encoding, but
didn't manage to improve the compression rate. A more sophisticated
scheme should do better, but I'm not sure it is worth the effort.
Do people consider character names worth +600k bloat (as in, should
they be enabled by default or not)?
-- Nikodemus Schemer: "Buddha is small, clean, and serious."
Lispnik: "Buddha is big, has hairy armpits, and laughs."