On Fri, Apr 05, 2002 at 10:17:00AM +1000, Brian Spilsbury wrote:
> Finally after many weeks of agonising debugging and code-rewriting, I
> have rewritten my unicode support for sbcl-0.7.0, and it has produced
> what appears to be a working core image.
> Hurrah :)
> This version supplies a 21 bit character object and an 8 bit base-char
> object, with (simple-array base-char (*)) and (simple-array character
> (*)) primitives to go with it.
> It also redefines string and simple-string as union types.
Do you happen to know whether this is compatible with ANSI's
definition of STRING as a system class? Off the top of my head it
seems as though the CMU CL type system tends to use special hacks to
represent other system classes (like NUMBER) which you'd naively
expect could be union types. I don't know whether this is required
for ANSI compatibility or whether it happens for some other reason.
> You cannot build sbcl-0.7.0-unicode with sbcl or cmucl.
> This is due to the type-information leakage from the host environment
> into the build.
I assume I'm responsible for some of this. Sorry.
> In order to build this system, you need to use my sbcl-0.6.13-unicode
> patch, which does all sorts of horrible things to the host system to
> build itself, because of the sacrifices made here, sbcl-0.7.0-unicode is
> much cleaner.
Are the "horrible things" CMU-CL-style one-off bootstrapping hacks? My
strong preference would be to try to rearrange things so that it can
be built under any sufficiently ANSI CL, including non-Unicode
versions of SBCL or CMU CL. I don't know how hard that will be, though.
It's conceivable that the changes involved in getting the sbcl-0.7.2
system to bootstrap under CLISP could help with this. Not that I've
been doing any work to speak of on this, or anything else on SBCL, in
the last few weeks...
> Secondly, because the changes are so wide-spread, I suspect that it will
> not be practical to apply it as a patch to 0.7.2 or whatever (although I
> haven't tried yet), and it may be necessary to step backward to
> 0.7.0-unicode and apply the patches to this base, working forward. (I'll
> give this a go in a day or two).
There also may be some confusion because we've disagreed off and on
for a year or so about what is appropriate to be merged into SBCL, and
it sounds as though all the issues are now combined in one big patch.:-|
> I'd like a couple of days to shake down some of the more obvious
> problems that I've almost certainly introduced, I plan to put up;
> * a complete 0.7.0-unicode source tree.
> * a patch against 0.7.0
> * a complete binary set for 0.7.0-unicode/x86
> * a complete binary set for 0.6.13-unicode/x86
> The 0.6.13-unicode patch is already available at
OK. I just skimmed it. Wow...
I have a few remarks and questions. (I plan to look at Christophe
Rhodes' "get atom subtype..." patch first, and possibly other patches
which are simple enough to jump the queue, before thinking about the
Unicode stuff with any care.)
* src/code/unicode-han.lisp seems to be about 60% of the 6 Mbyte patch,
and src/code/unicode.lisp seems to be much of the rest of it. I think
we might want some way not to have that much language-specific data
embedded in the source tree, or for that matter in the #+UNICODE
executable for people whose language needs are focused differently. I
imagine this is an issue that all Unicode systems face. AFAIK, Java
systems support Unicode but don't ship with some 4 Mbytes of
character set data in their runtimes. Do you happen to know how
they address the problem of carrying around Cyrillic when
the user is only interested in Arabic and Thai? (Or am I just
confused, and in fact they don't address the problem?)
* You seem to have used "diff" in a way which sometimes makes
things unnecessarily confusing:
** Some things (e.g., early on in the patch, make-host-2.sh and
make-target-2.sh) seem to have lots of unchanged text, but appear
in the patch as though all their text is new.
** The "out" file probably doesn't belong in the patch. Also
"src/code/class.lisp-old", "src/code/diff", and perhaps
* I noticed some unexplained, perhaps unused stuff in there.
** The ZAPPED and UNICODE-ZAPPED features are tested but never
set and never documented. Are they superfluous now?
* If you're going to change things more-or-less orthogonal to
Unicode (e.g. changing the default from 'sbcl --noprogrammer'
to 'sbcl' in slam.sh) it'd probably be best to do it in a
separate patch, and to explain why you're changing them.
William Harold Newman <william.newman@...>
"Now it's a couple of guys sitting in a living room with laptops. (And
jeans turn out not to be the last word in informality.)"
PGP key fingerprint 85 CE 1C BA 79 8D 51 8C B9 25 FB EE E0 C3 E5 7C