On Thu, Apr 11, 2002 at 12:10:00PM -0500, William Harold Newman wrote:
> On Tue, Apr 09, 2002 at 12:41:22AM +1000, Brian Spilsbury wrote:
> > http://designix.com.au/brian/SBCL/sbcl-0.7.0-unicode.p0.gz
> OK, thank you.
I'd like to second this; it looks like a lot of work, particularly since
I know you've effectively done this twice (once for 0.6.13 as well... I
remember the pain of the transition... not a criticism of the renaming
either, as it had to be done, but still :-)
I've got a couple of additional questions (well, actually, I have about
an A4 side of notes from my first read through the patch, but some of
them are cosmetic issues and some can be worked out in detail later)
9. Firstly, can you say why you made base-char have 256 elements? [side
note: revisit the char-code-limit things to actually be exclusive
bounds] Naively, it would seem to me that making base-char have just
ASCII and extended-char the rest of the space would be easier; then on
non-unicode lisps the rest of the iso-8859-foo space could be
extended-char. Is this workable, and would it make sense for an
integrated sbcl unicode go down this route?
10. Could you comment at a high-level on which bits of the interface to
the external world (foreign code, filesystem, etc) should be
unicode-enabled and which shouldn't? Sometimes declarations are changed
one way (e.g. simple-base-string -> simple-string) and sometimes vice
versa. Consider doing e.g. (deftype foreign-function-designator ()
(#!-unicode ... #!+unicode ...)) in one place, so that it's slightly
more obvious from a consistency point of view (this is something that
would be good in the sbcl code base generally, in any case)
11. I don't understand this change:
- (base-char (char-code (truly-the base-char char)))))
+ (base-char (char-code (truly-the base-char char)))
+ (character (char-code (truly-the base-char char)))))
12. There are some hard-coded limits (256, for instance) in genesis --
if they're not related to char-code-limit or the equivalent, are they
some other limit that should be abstracted away?
13. I was slightly concerned by a quick read of the VOPs; I don't know
what provision there was to guard against this (related to question 9
(defparameter *a* (make-string 1 :element-type 'base-char
(defparameter *b* (make-string 1 :element-type 'character
(char= (char *a* 0) (char *b* 0)) -> NIL?
I'm not at all sure that this can happen, but I couldn't see it ruled
I had some other thoughts, but that's probably enough for now...
> +#ifdef LISP_FEATURE_UNICODE
> + "character", /* yes, this is dubious */
This one I think I figured out, possibly; I think it actually wanted to
but I'm not sure. It's to get ldb to print the types right, I think.
> 7. I still don't understand why the +Unicode system can't bootstrap
> normally under CMU CL or SBCL. Do you have any hints, or should I just
> try it myself and see how it fails?
At a guess, it's something to do with the separation of types that
previously were the same; maybe cross-typep needs to be smarter? I dunno
-- I haven't tried yet either.
> 8. Might it be possible to do the patch in smaller pieces? E.g.
> in three phases, each adding some testable functionality:
> 1. Make the system manipulate Unicode data (reading and
> writing it, representing it in characters and
> strings) but not know anything about its properties
> other than what's a BASE-CHAR and what's not.
> (I.e. no upcasing, symbolic names for Unicode chars,
> or other messy stuff.)
> 2. Add Unicode-capable implementation of upcasing.
> 3. Add Unicode-capable implementation of symbolic char names.
My initial thought was that there are things of value in this patch
that stand apart from the unicodization; for instance, systemization of
foreign interaction (alluded to in my question 10), making the reader
hash-table based rather than alist based; then if it turns out that types
are the problem, maybe it might be worth trying to separate base-char
(ASCII) and character (high-bit 1) nextly, without worrying about
unicode, and then finally adding unicode support.
But as I say, I haven't tried anything computational yet; I won't feel
comfortable doing so until I've read through it again and tried to
absorb a little more.
> Also, a few style quibbles:
Aside from Bill's, there's a fair amount that's commented out in your
patch -- are these blind alleys, or works-in-progress, or previous
implementations of things? It would be helpful to know...
I hope I'm not coming across as too negative -- as a speaker of several
languages, it would be useful for me to have Unicode support too, and
I'm willing to do some work to help it happen.
Thanks very much,
Jesus College, Cambridge, CB5 8BL +44 1223 510 299
http://www-jcsu.jesus.cam.ac.uk/~csr21/ (defun pling-dollar
(str schar arg) (first (last +))) (make-dispatch-macro-character #\! t)
(set-dispatch-macro-character #\! #\$ #'pling-dollar)