From: Brian Spilsbury <brian@de...> - 2001-12-31 07:33:18
Well, things seem to be settling down nicely.
I found a few bugs, but it seems to work so far.
I've completed an input-method and a console based line buffer system
which works with bi-width characters, and can now type korean directly
into the reader, with line-editing :].
I've also started to put together a rule-set for mathematical symbols
(ie "/=>" maps to the symbol for not-equal-to-or-greater-than), which is
quite cute. If anyone knows of a decent typing input method for
mathematical symbols, please tell me. There are quite a lot of them. :)
There are a few things that I'm a little hesitant about...
Characters are full unicode, so Character streams need to take Characters.
Due to this I've made the base level Character Fd-Stream do utf8 encoded
input and output.
Byte streams are not affected.
As a side-effect Base-Char is reduced to a range of 0-127.
I can't see a sensible alternative, and it seems to work out ok,
although some people who like to put non-ascii, non-utf8 characters in
their source-code may be disappointed.
Character-Encoding translation streams are simple enough, except that
they need to gracefully handle characters that they cannot represent...
I'd consider a (register-character-encoding-stream-vendor :euc-kr #'foo)
or something similar to allow
(open "name" :direction :input :encoding :euc-kr)
Although, :encoding isn't a permissible flag to open from my reading of
The more ansi way would be to allow the definition of character
repertoire types, then it would become
(open "name" :direction :input :element-type 'euc-kr-character)
although this would require unsealing the Character type.
In any case, it's heading toward being to be quite usable.
I haven't implemented the utf-8 encoded strings yet, but I think it's
still quite important, and there are a bunch of things which were fixed
temporarily, but need fixing properly. The reader mostly works, but its
character information database should be rationalised, for example.
Pathnames are another issue, they're being dropped to simple-base-string
via coersions (just truncates the characters), etc, etc.