Thread: [Sbcl-devel] Unicode character names

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Having been reminded on IRC that I've done this work, and that it
might be nice to have in sbcl itself, here's a piece of code that
gives unicode characters names, and makes their readable output
ascii-safe.

The code-data "array" stores pretty much everything there is to know
about unicode characters.  The idea was to build a full unicode string
manipulation library on top of this.  Some of it's been done (not
included here) but I got a little distracted.

Currently, it builds its tables at load-time (because this was
simpler) from the files UnicodeData.txt (already used by sbcl, though
this has its own copy), PropList.txt, SpecialCasing.txt, and
CaseFolding.txt.  I envision someone wanting it always-on would dump a
core containing it anyway, so I don't see the large load-time that
results as a particularly bad side-effect.  The fact that large
amounts of currently-unused data is loaded is potentially one, but
it's fairly easy to cut out if that's wanted, and I'm still hoping to
get sufficiently motivated to start building unicode stuff atop what's
here again.

Rather than attaching it, I've put it at
  http://www.rojoma.com/sb-char-names.tar.bz2
-- 
Robert Macomber
sf-...@ro...

Thread: [Sbcl-devel] Unicode character names

Common Lisp compiler and runtime

sbcl-devel