William Harold Newman wrote:
>On Fri, Apr 05, 2002 at 10:17:00AM +1000, Brian Spilsbury wrote:
>>This version supplies a 21 bit character object and an 8 bit base-char
>>object, with (simple-array base-char (*)) and (simple-array character
>>(*)) primitives to go with it.
>>It also redefines string and simple-string as union types.
>Do you happen to know whether this is compatible with ANSI's
>definition of STRING as a system class? Off the top of my head it
>seems as though the CMU CL type system tends to use special hacks to
>represent other system classes (like NUMBER) which you'd naively
>expect could be union types. I don't know whether this is required
>for ANSI compatibility or whether it happens for some other reason.
I believe this is ansi compatible.
>>You cannot build sbcl-0.7.0-unicode with sbcl or cmucl.
>>This is due to the type-information leakage from the host environment
>>into the build.
>I assume I'm responsible for some of this. Sorry.
Oh, we can blame those evil cmucl people :)
>>In order to build this system, you need to use my sbcl-0.6.13-unicode
>>patch, which does all sorts of horrible things to the host system to
>>build itself, because of the sacrifices made here, sbcl-0.7.0-unicode is
>Are the "horrible things" CMU-CL-style one-off bootstrapping hacks? My
>strong preference would be to try to rearrange things so that it can
>be built under any sufficiently ANSI CL, including non-Unicode
>versions of SBCL or CMU CL. I don't know how hard that will be, though.
>It's conceivable that the changes involved in getting the sbcl-0.7.2
>system to bootstrap under CLISP could help with this. Not that I've
>been doing any work to speak of on this, or anything else on SBCL, in
>the last few weeks...
In 0.6.13-unicode it's mostly the deranged hacks such as at the top of
>>Secondly, because the changes are so wide-spread, I suspect that it will
>>not be practical to apply it as a patch to 0.7.2 or whatever (although I
>>haven't tried yet), and it may be necessary to step backward to
>>0.7.0-unicode and apply the patches to this base, working forward. (I'll
>>give this a go in a day or two).
>There also may be some confusion because we've disagreed off and on
>for a year or so about what is appropriate to be merged into SBCL, and
>it sounds as though all the issues are now combined in one big patch.:-|
Well, there are two levels here.
One is the primitive changes to the system to make the types work properly.
That more or less comes as a unitary lump, I don't see that you can
break it up in any meaningful or workable fashion.
Then there is the stuff that uses this, which can.
All of the work as been going into the primitive end so far, with the
minimum amount of crud required to test/use it, which is why
unicode.lisp, etc is how it is.
There's plenty of work left to do at the higher end, but I can't afford
to keep chasing major version changes :\
What I'd hope for at this point is that now there is a working thing, to
open a new branch of sbcl-0.7.13, and bring it into sync with the main
branch, and perhaps transition slowly across.
I don't expect that it would be desirable to drop so many changes so
quickly into a production release, especially since my understanding of
the deep systems of sbcl is certainly not perfect, and I am certain that
I have introduced a significant number of dubiousnesses. (Although a lot
less than in 0.6.13 :] )
>>I'd like a couple of days to shake down some of the more obvious
>>problems that I've almost certainly introduced, I plan to put up;
>>* a complete 0.7.0-unicode source tree.
>>* a patch against 0.7.0
>>* a complete binary set for 0.7.0-unicode/x86
>>* a complete binary set for 0.6.13-unicode/x86
>>The 0.6.13-unicode patch is already available at
>OK. I just skimmed it. Wow...
>I have a few remarks and questions. (I plan to look at Christophe
>Rhodes' "get atom subtype..." patch first, and possibly other patches
>which are simple enough to jump the queue, before thinking about the
>Unicode stuff with any care.)
Sounds reasonable, I'm mostly just happy to have reached this milestone,
and want people to think about what should be done next.
>* src/code/unicode-han.lisp seems to be about 60% of the 6 Mbyte patch,
> and src/code/unicode.lisp seems to be much of the rest of it. I think
> we might want some way not to have that much language-specific data
> embedded in the source tree, or for that matter in the #+UNICODE
> executable for people whose language needs are focused differently. I
> imagine this is an issue that all Unicode systems face. AFAIK, Java
> systems support Unicode but don't ship with some 4 Mbytes of
> character set data in their runtimes. Do you happen to know how
> they address the problem of carrying around Cyrillic when
> the user is only interested in Arabic and Thai? (Or am I just
> confused, and in fact they don't address the problem?)
I've thought about it a bit, practically all of the work has been in
getting the primitive support, so a lot of the actual unicode code has
been neglected except what was necessary to make it work.
I'm intending to rewrite unicode.lisp and unihan.lisp to parse supplied
In that case, it should be easy to distribute partial unicode datasets
and build from those.
> * You seem to have used "diff" in a way which sometimes makes
> things unnecessarily confusing:
> ** Some things (e.g., early on in the patch, make-host-2.sh and
> make-target-2.sh) seem to have lots of unchanged text, but appear
> in the patch as though all their text is new.
> ** The "out" file probably doesn't belong in the patch. Also
> "src/code/class.lisp-old", "src/code/diff", and perhaps
Yeah, that patch isn't very nice.
> * I noticed some unexplained, perhaps unused stuff in there.
> ** The ZAPPED and UNICODE-ZAPPED features are tested but never
> set and never documented. Are they superfluous now?
Yes, they should not be there.
Since it started working to the point where I could compile 0.7.x with
it, all work on 0.6.13-unicode stopped.
> ** %MAKE-RANDOM-STATE
> * If you're going to change things more-or-less orthogonal to
> Unicode (e.g. changing the default from 'sbcl --noprogrammer'
> to 'sbcl' in slam.sh) it'd probably be best to do it in a
> separate patch, and to explain why you're changing them.
Hmm, oh, I thought I explained that in the readme.
You need to do some manual error correction in the build.
Basically, I do not suggest that 0.6.13-unicode is used by anyone for
anything, except for building 0.7.x-unicode.
I certainly would not like to see that merged into anything mainstream. :)
The 0.7.13-unicode contains a lot less evil, there is still a fair bit
of cruft to clean up.
Oh, I fixed those three warnings in the build of
src/compiler/x86/array.lisp, btw :)