I'm working on improving SBCL's Unicode support through the Google
Summer of Code. To make my project more useful, I'd like to know what
Unicode (or possibly internationalization)-related features you'd like
to see already implemented in SBCL, so you wouldn't have to roll your own.
So far, I have implemented (on an experimental branch)
- Accessors for many of a character's Unicode properties, such as its
script or general category
- Functions to break a string into graphemes (what users would think of
as "characters"), words, sentences, and lines according to the Unicode
- Unicode standards for case conversion, with optional locale detection
so that certain locale-specific casing rules (such as i uppercasing as
dotted-I (İ) in Turkish) can be applied
- The standard Unicode sorting algorithm
I've also added an option to the reader to normalize unescaped symbols,
so that, for example :ë and :ë (LATIN SMALL LETTER E WITH DIAERESIS and
LATIN SMALL LETTER E + COMBINING DIAERESIS, respectively) are EQ with
What other similar improvements would make things easier for you as an
SBCL user? Please let me know.
support for collating sequences in string comparison functions.
i find numerous reflections on what one might do, but nothing concrete, which would make it possible to use collation sequences effectively, particularly on a call-by-call basis:
best regards, from berlin,