On Wed, Oct 03, 2001 at 09:52:31PM -0700, s. champ wrote:
> I heard that CMUCL doesn't support the Unicode character-set. This would
> be a problem with developing a fully "xml-compliant' application. It was
> one of the hangups that's made me unsure about committing to using CMUCL.
> i was guessing it would be the same case with SBCL.
> I don't know C or ASM, yet, but can learn either when needed, given time
> for it, etc.
It is possible to learn by doing -- that's e.g. more or less what I
did with Unix signals when rewriting the SBCL signal handlers. But
it's probably easier when you already have some system programming
experience. If you've never worked in any language at the C/assembly
level, with pointers and machine words and alignment issues and
whatnot, I'd think that this kind of major change to SBCL's low-level
types would be a very difficult choice for a first project.
> So, what would have to be done to get unicode support into SBCL, if it
> isn't there already?
As you can see from Brian Spilsbury's earlier reply, he's already
working on it. He's making new character and string classes which
exist side by side with the old specialized-to-8-bit classes, with all
the character and string operations dynamically dispatched on the type
code. Thus (I think) BASE-CHAR stays 8-bit but CHARACTER becomes
20-bit Unicode, and when you call e.g. CHAR=, it has to deal with all
the possible cases:
* comparing an 8-bit character to an 8-bit character
* comparing an 8-bit character to a 20-bit character
* comparing an 20-bit character to a 20-bit character
In my Sep 24 mail I also suggested an alternative partial solution
which might be an easier way to start: making a Unicode-only variant
of SBCL, dependent on a *SHEBANG-FEATURES* entry :SB-UNICODE-ONLY or
some such thing, so that BASE-CHAR becomes 20-bit Unicode and
SIMPLE-BASE-STRING gets 32-bit cells, each holding a fully-expanded
Unicode char. I think this would be easier because everything which is
only a single case in current SBCL becomes only a single case in the
variant, e.g. the new CHAR= only has to deal with comparing a 20-bit
character to a 20-bit character. But Brian Spilsbury is the one who's
doing it, and if he finds the full-blown dynamically typed solution
manageable, more power to him. (And since I'm far from perfect about
estimating the size of projects, who knows?)
(Either way is necessarily a lot of work.)
William Harold Newman <william.newman@...>
"Ma'am, you might find labor easier if you weren't having twins."
PGP key fingerprint 85 CE 1C BA 79 8D 51 8C B9 25 FB EE E0 C3 E5 7C