Thread: [Sbcl-devel] unicode progress (or lack thereof) | Steel Bank Common Lisp

sbcl-devel

[Sbcl-devel] unicode progress (or lack thereof)

From: Brian S. <br...@de...> - 2002-04-29 07:42:58

Well, given the lack of discussion, I'm inclined to proceed as I see 
fit, which will be more or less along the lines of my last post.

If this causes inconvenience further down the track, that's too bad.

I have things to get done, and I cannot keep on waiting for a week on 
each response.

Regards,

Brian.

Re: [Sbcl-devel] unicode progress (or lack thereof)

From: William H. N. <wil...@ai...> - 2002-04-29 18:48:00

(I hope this message isn't a duplicate or otherwise messed up. I did
the infamous upgrade of sendmail from 8.11 to 8.12 this weekend, and
I'm still trying to straighten out the consequences.)

On Mon, Apr 29, 2002 at 05:42:24PM +1000, Brian Spilsbury wrote:
> Well, given the lack of discussion, I'm inclined to proceed as I see 
> fit, which will be more or less along the lines of my last post.
> 
> If this causes inconvenience further down the track, that's too bad.
> 
> I have things to get done, and I cannot keep on waiting for a week on 
> each response.

Sorry. It isn't even that I've been too busy to reply, but I reacted
badly to the perceived tone of your previous message, throwing up my
hands and stalking away from my computer, and I never got back to it.

from your previous message:
- I am not going to write the code to make the current versions of 
- sbcl/cmucl bootstrap this, someone who find this a major priority is 
- welcome to do so.

Well, I certainly find this a major priority. They say "never say
never", so: if I get a lot of pressure from people I respect (or
better yet, compelling arguments from people I can understand) to
merge the patches into SBCL even though they make SBCL
unbootstrappable, then maybe. Failing that, the patches are extremely
unlikely to be accepted until bootstrappablized, whether at your hand
or someone else's.

Furthermore, since your personal style grates on my nerves (as, trying
to be fair, I suspect my style grates on yours), and since I have no
personal need for Unicode, the likelihood that your unbootstrappable
code will be cleaned up at my hand seems pretty low right now.

- I will need the unicode form to be default, and then to later produce 
- code to make it efficient for the cases where extended character sets 
- are not utilised. It is not workable to maintain such fundamental 
- disparities for such minimal returns.

I didn't really understand this. First, when you write "I will need"
it sounds as though you're talking about stuff that you expect us to
do for you, but I can't figure out what. Second, when you say unicode
form will be "the default", do you mean what you wrote elsewhere
  - what I have already produced, but with the
  - character data-base files removed, and the #[!][+-]unicode options 
  - removed to make the unicode form default
so that if we apply your patch, SBCL will be always be compiled into a
with-Unicode form even for people who don't need or want Unicode? This
might be OK, and certainly isn't the same sort of nearly-nonnegotiable
issue as making SBCL unbootstrappable. However, you should realize
that doing things this way will probably raise the bar on the quality
(not just correctness, but size and performance as well) for your
patch to be considered for inclusion. (Maybe a lot, depending on how
many users and developers really want Unicode.)

- At some point I will require a feature freeze so that I can synchronise 
- things properly.

- I will require a sequential list of patches to apply to the source to 
- bring it up to the normal level, applying a single diff across the whole 
- system, I suspect will not be workable, but it should be doable to apply 
- the patches in sequence, and fix them as they break.

I didn't understand either of these.

The rate of change of SBCL since 0.7.0 seems unlikely to break
anyone's code, so I don't understand the need for a freeze. How much
of a freeze do you think you would require? Over how much time? Over
how much code?

When you write "I will require" do you mean "I will produce"?

-- 
William Harold Newman <wil...@ai...>
Anyone who says you can have a lot of widely dispersed people hack
away on a complicated piece of code and avoid total anarchy has never
managed a software project. -- Andy Tanenbaum, quoted in
<http://www.cs.dartmouth.edu/~perrone/oldsite/feud.html>
PGP key fingerprint 85 CE 1C BA 79 8D 51 8C  B9 25 FB EE E0 C3 E5 7C

Re: [Sbcl-devel] unicode progress (or lack thereof)

From: Brian S. <br...@de...> - 2002-04-30 04:04:18

William Harold Newman wrote:

>(I hope this message isn't a duplicate or otherwise messed up. I did
>the infamous upgrade of sendmail from 8.11 to 8.12 this weekend, and
>I'm still trying to straighten out the consequences.)
>
>On Mon, Apr 29, 2002 at 05:42:24PM +1000, Brian Spilsbury wrote:
>
>>Well, given the lack of discussion, I'm inclined to proceed as I see 
>>fit, which will be more or less along the lines of my last post.
>>
>>If this causes inconvenience further down the track, that's too bad.
>>
>>I have things to get done, and I cannot keep on waiting for a week on 
>>each response.
>>
>
>Sorry. It isn't even that I've been too busy to reply, but I reacted
>badly to the perceived tone of your previous message, throwing up my
>hands and stalking away from my computer, and I never got back to it.
>
>from your previous message:
>- I am not going to write the code to make the current versions of 
>- sbcl/cmucl bootstrap this, someone who find this a major priority is 
>- welcome to do so.
>
>Well, I certainly find this a major priority. They say "never say
>never", so: if I get a lot of pressure from people I respect (or
>better yet, compelling arguments from people I can understand) to
>merge the patches into SBCL even though they make SBCL
>unbootstrappable, then maybe. Failing that, the patches are extremely
>unlikely to be accepted until bootstrappablized, whether at your hand
>or someone else's.
>
>Furthermore, since your personal style grates on my nerves (as, trying
>to be fair, I suspect my style grates on yours), and since I have no
>personal need for Unicode, the likelihood that your unbootstrappable
>code will be cleaned up at my hand seems pretty low right now.
>
What grates on my nerves is the repetition of questions which have been 
answered months before, along with requests that something which can't 
effectively be broken up be broken up into smaller pieces.

You can cut those datafiles off, but pretty much anything else, and it 
won't compile, that patch is more or less minimal to get the system to 
boot, with some test code from the last version left over, and the 
datafiles for testing.

The only reason that the current sbcl/cmucl can't compile this is that 
the current sbcl/cmucl is broken.
Perhaps clisp will prove less broken for once, although I can't say that 
I'm particularly optimistic.

Having already hacked around this lossage once, I do not care to do so 
again.

>- I will need the unicode form to be default, and then to later produce 
>- code to make it efficient for the cases where extended character sets 
>- are not utilised. It is not workable to maintain such fundamental 
>- disparities for such minimal returns.
>
>I didn't really understand this. First, when you write "I will need"
>it sounds as though you're talking about stuff that you expect us to
>do for you, but I can't figure out what. Second, when you say unicode
>form will be "the default", do you mean what you wrote elsewhere.
>
I have very few expectations at this point.

However what that meant is that "to do this practically we can't expect 
to mantain both the broken code which conflates character and base-char 
as well as where they are properly separate."

It means that I'm not going to attempt to maintain the #!-unicode stuff, 
and will elide it presently.

There's no point - it doesn't give any real advantage, and has many 
significant drawbacks.

The correct solution is to support strings specialised onto 
character-repertoires, and to support translation between a repertoire 
and the unicode form where necessary, since base-char-reg is already 32 
bits, it should be no burden to support 32k of 8 bit repertoires, and 
256 16 bit repertoires.
The only cost coming when different repertoires are used together, 
although it might take a bit of doing to make the primitive sufficiently 
flexible.

This will allow europeans to run in random 8 bit charsets without 
penalty (as long as the streams share the same representation - 
otherwise you'll have a small mapping cost). It will also allow CJK to 
use 16 bit charsets in a similar fashion, while retaining integration 
with the full unicode set.

> have already produced, but with the
>  - character data-base files removed, and the #[!][+-]unicode options 
>  - removed to make the unicode form default
>so that if we apply your patch, SBCL will be always be compiled into a
>with-Unicode form even for people who don't need or want Unicode? This
>might be OK, and certainly isn't the same sort of nearly-nonnegotiable
>issue as making SBCL unbootstrappable. However, you should realize
>that doing things this way will probably raise the bar on the quality
>(not just correctness, but size and performance as well) for your
>patch to be considered for inclusion. (Maybe a lot, depending on how
>many users and developers really want Unicode.)
>
I've come to the conclusion that the only way to move forward is to do 
so separately, and to expect no co-operation.

Fortunately I have downloaded the cvs snapshot from sourceforge, so it 
should not cause any futher inconvenience to yourself.

If at some point you want to try to re-integrate these things, then you 
can do so, if not, that's fine too.

Regards,

Brian.

Re: [Sbcl-devel] unicode progress (or lack thereof)

From: Christophe R. <cs...@ca...> - 2002-04-30 08:39:15

On Tue, Apr 30, 2002 at 02:03:38PM +1000, Brian Spilsbury wrote:
> The only reason that the current sbcl/cmucl can't compile this is that 
> the current sbcl/cmucl is broken.

Could you please provide more details?  Or are you referring to the fact
that cmucl and sbcl (validly) have no CHARACTERs that are not
BASE-CHARs?

> Perhaps clisp will prove less broken for once, although I can't say that 
> I'm particularly optimistic.

:-) well, no.  In this respect:

  [6]> (typep (code-char 1024) 'base-char)
  T
  [7]> (typep (code-char 2048) 'base-char)
  T
  [8]> (subtypep 'simple-string 'simple-base-string)
  T ;
  T

it looks like clisp also has everything as a BASE-CHAR...

> If at some point you want to try to re-integrate these things, then you 
> can do so, if not, that's fine too.

I'm not giving up the hope of a unicode-aware sbcl in the mainline -- I
hope that you will not diverge the trees too much, and I hope to have
the time to investigate these issues more fully soon...

Thanks,

Christophe
-- 
Jesus College, Cambridge, CB5 8BL                           +44 1223 510 299
http://www-jcsu.jesus.cam.ac.uk/~csr21/                  (defun pling-dollar 
(str schar arg) (first (last +))) (make-dispatch-macro-character #\! t)
(set-dispatch-macro-character #\! #\$ #'pling-dollar)

Re: [Sbcl-devel] unicode progress (or lack thereof)

From: Brian S. <br...@de...> - 2002-04-30 09:39:04

Christophe Rhodes wrote:

>On Tue, Apr 30, 2002 at 02:03:38PM +1000, Brian Spilsbury wrote:
>
>>The only reason that the current sbcl/cmucl can't compile this is that 
>>the current sbcl/cmucl is broken.
>>
>
>Could you please provide more details?  Or are you referring to the fact
>that cmucl and sbcl (validly) have no CHARACTERs that are not
>BASE-CHARs?
>
The real problem is the bootstrap process is broken, and inherits much 
type information that it should not.

The conflation of base-char and character is one such inheritance that 
should not happen.

Normally you can ignore it since cmucl and sbcl agree about most type 
issues at that level.

>>Perhaps clisp will prove less broken for once, although I can't say that 
>>I'm particularly optimistic.
>>
>
>:-) well, no.  In this respect:
>
>  [6]> (typep (code-char 1024) 'base-char)
>  T
>  [7]> (typep (code-char 2048) 'base-char)
>  T
>  [8]> (subtypep 'simple-string 'simple-base-string)
>  T ;
>  T
>  
>it looks like clisp also has everything as a BASE-CHAR...
>
Oh well, so much for that :)
It won't avoid the bootstrap problem then.

>>If at some point you want to try to re-integrate these things, then you
>>can do so, if not, that's fine too.
>>
>I'm not giving up the hope of a unicode-aware sbcl in the mainline -- I
>hope that you will not diverge the trees too much, and I hope to have
>the time to investigate these issues more fully soon...
>
Well, see what happens. I have a number of things that I want to do, 
which are not directly related to unicode, which will need to go in 
there as well.

In the mean-while I'll be adding in the patches which are made to the 
main-line where they apply, so divergance should be reasonably 
controlled, at least over the next six months.

Regards,

Brian.

Re: [Sbcl-devel] unicode progress (or lack thereof)

From: William H. N. <wil...@ai...> - 2002-04-30 13:03:04

On Tue, Apr 30, 2002 at 07:38:02PM +1000, Brian Spilsbury wrote:
> Christophe Rhodes wrote:
> 
> >On Tue, Apr 30, 2002 at 02:03:38PM +1000, Brian Spilsbury wrote:
> >
> >>The only reason that the current sbcl/cmucl can't compile this is that 
> >>the current sbcl/cmucl is broken.
> >>
> >
> >Could you please provide more details?  Or are you referring to the fact
> >that cmucl and sbcl (validly) have no CHARACTERs that are not
> >BASE-CHARs?
> >
> The real problem is the bootstrap process is broken, and inherits much 
> type information that it should not.

Could you give an example?

(I know of one big sin in this department, that lots of IEEE floating
point behavior is, mostly implicitly, assumed to be present on the
cross-compilation host, mostly affecting calculation and folding of
constants. But the only such brokenness which affects characters, that
I can think of offhand, is the assumption that the xc host's BASE-CHAR
set is sufficiently ASCII-like that we can use
non-ANSI-Common-Lisp-standard whitespace characters like TAB and ^L in
the SBCL sources.)

> The conflation of base-char and character is one such inheritance that 
> should not happen.
> 
> Normally you can ignore it since cmucl and sbcl agree about most type 
> issues at that level.
> 
> >>Perhaps clisp will prove less broken for once, although I can't say that 
> >>I'm particularly optimistic.
> >>
> >
> >:-) well, no.  In this respect:
> >
> > [6]> (typep (code-char 1024) 'base-char)
> > T
> > [7]> (typep (code-char 2048) 'base-char)
> > T
> > [8]> (subtypep 'simple-string 'simple-base-string)
> > T ;
> > T
> > 
> >it looks like clisp also has everything as a BASE-CHAR...
> >
> Oh well, so much for that :)
> It won't avoid the bootstrap problem then.

I have never understood (or, speaking of unanswered questions, perhaps
have never seen?) your explanation of this bootstrap problem.
Fundamentally I don't see why the +Unicode cross-compiler can't be
written in terms of ANSI Common Lisp, without depending on a dialect
which implements characters other than STANDARD-CHAR and which makes a
distinction between BASE-CHAR and CHARACTER. The cross-compiler
generates code which deals with characters other than STANDARD-CHAR,
but why does the host Lisp need to know about them? (aside from the
trivial reason of dealing with TAB and ^L in the SBCL sources) Writing
a cross-compiler which supports types not supported in the host
language isn't weird rocket science, it's standard practice. What's
different here?

-- 
William Harold Newman <wil...@ai...>
"Palantir great. Better than cable."
  -- <http://home.nyu.edu/~amw243/diaries/saruman.html>
PGP key fingerprint 85 CE 1C BA 79 8D 51 8C  B9 25 FB EE E0 C3 E5 7C

Re: [Sbcl-devel] unicode progress (or lack thereof)

From: Brian S. <br...@de...> - 2002-04-30 14:24:38

William Harold Newman wrote:

>
>I have never understood (or, speaking of unanswered questions, perhaps
>have never seen?) your explanation of this bootstrap problem.
>
Well, I've answered this question a few times now.

>Fundamentally I don't see why the +Unicode cross-compiler can't be
>written in terms of ANSI Common Lisp, without depending on a dialect
>which implements characters other than STANDARD-CHAR and which makes a
>distinction between BASE-CHAR and CHARACTER. The cross-compiler
>
>generates code which deals with characters other than STANDARD-CHAR,
>but why does the host Lisp need to know about them?
>
It can, and is written in terms of ANSI Common Lisp.

The host doesn't need to know, and in fact does not know.

It is the target's use of the host's type-system to build parts of 
itself, in the bootstraping, which causes the problem.

> (aside from the
>trivial reason of dealing with TAB and ^L in the SBCL sources) Writing
>a cross-compiler which supports types not supported in the host
>language isn't weird rocket science, it's standard practice. What's
>different here?
>

The problem is that the bootstrap doesn't work to redefine existing 
types which it can't distinguish.

In cmucl and sbcl character is aliased to base-char...

So, when sbcl bootstraps, and it uses host's type definitions to build 
your defknowns, etc, where you write 'character' it writes 'base-char'.

The next phase inherits the broken defknowns, and naturally the next step is

***boom***

It is a bit more complex than that iirc, but that's the gist of it.

You are welcome to look at the hacks to get around it which are in the 
sbcl-0.6.13-unicode patch.

Regards,

Brian.

Re: [Sbcl-devel] unicode progress (or lack thereof)

From: William H. N. <wil...@ai...> - 2002-04-30 15:36:04

On Wed, May 01, 2002 at 12:23:55AM +1000, Brian Spilsbury wrote:
> William Harold Newman wrote:
> 
> >
> >I have never understood (or, speaking of unanswered questions, perhaps
> >have never seen?) your explanation of this bootstrap problem.
> >
> Well, I've answered this question a few times now.
> 
> >Fundamentally I don't see why the +Unicode cross-compiler can't be
> >written in terms of ANSI Common Lisp, without depending on a dialect
> >which implements characters other than STANDARD-CHAR and which makes a
> >distinction between BASE-CHAR and CHARACTER. The cross-compiler
> >
> >generates code which deals with characters other than STANDARD-CHAR,
> >but why does the host Lisp need to know about them?
> >
> It can, and is written in terms of ANSI Common Lisp.
> 
> The host doesn't need to know, and in fact does not know.
> 
> It is the target's use of the host's type-system to build parts of 
> itself, in the bootstraping, which causes the problem.
> 
> >(aside from the
> >trivial reason of dealing with TAB and ^L in the SBCL sources) Writing
> >a cross-compiler which supports types not supported in the host
> >language isn't weird rocket science, it's standard practice. What's
> >different here?
> >
> 
> The problem is that the bootstrap doesn't work to redefine existing 
> types which it can't distinguish.
> 
> In cmucl and sbcl character is aliased to base-char...
> 
> So, when sbcl bootstraps, and it uses host's type definitions to build 
                                ^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?
> your defknowns, etc, where you write 'character' it writes 'base-char'.
  ^?^?^?^?^?^?^?
> The next phase inherits the broken defknowns, and naturally the next step is
> 
> ***boom***
> 
> It is a bit more complex than that iirc, but that's the gist of it.

Well, OK, I am somewhat familiar with misbegotten interactions between
the host type system and the cross-compiler type system, having done a
fair amount of work to remove them in the process of making SBCL
bootstrap itself.:-| What I don't see is why they show up here. (And
beyond that what I really don't understand is why when they show up
here, you consider them a feature, or at least an unavoidable part of
bootstrapping Unicodeness, and not a bug.)

You do say something about where host/xc type system interactions show
up, talking about DEFKNOWNs in particular. Ordinarily I would expect
that the types which appear in DEFKNOWNs should be used only by the
cross-compiler's type system (i.e. turned into SB!KERNEL:CTYPE objects
by calls to SB!KERNEL:SPECIFIER-TYPE, and then further manipulated by
other machinery in the SB!KERNEL package). Evidently, from what you
say about "uses host's definitions", they're also being passed to the
*host's* type system (with CL:SUBTYPEP or whatever). If I could see
how that is happening, I'd probably consider it a fairly fundamental
problem in SBCL and be interested in fixing it.

> You are welcome to look at the hacks to get around it which are in the 
> sbcl-0.6.13-unicode patch.

Right.

I grepped for "defknown" in the sbcl-0.7.0-unicode.p0 patch, and
didn't get any immediate insights.

Perhaps this is a blind spot of mine, where I've been able to overlook
a rat's nest of dependency bugs because the xc hosts I've worked with
have always been too similar to the target I'm creating. But blind
spot or whatever, I would probably recognize it faster if I got a more
specific hint than "the problem is implicit in the uncommented
workaround hacks in this multi-megabyte patch of mine [which
incidentally also implicitly conveys my working estimate of the value
of others' time compared to mine, since I can't be bothered e.g.
either to remove irrelevant files or to provide a summary which would
let the reader confidently determine for himself which files are
irrelevant]".

-- 
William Harold Newman <wil...@ai...>
"Programming should be fun, programs should be beautiful."
  -- P. (Paul?) Graham, quoted in comp.lang.lisp by David E. Young
PGP key fingerprint 85 CE 1C BA 79 8D 51 8C  B9 25 FB EE E0 C3 E5 7C

Re: [Sbcl-devel] unicode progress (or lack thereof)

From: Brian S. <br...@de...> - 2002-05-01 00:24:47

William Harold Newman wrote:

>
>Well, OK, I am somewhat familiar with misbegotten interactions between
>the host type system and the cross-compiler type system, having done a
>fair amount of work to remove them in the process of making SBCL
>bootstrap itself.:-| What I don't see is why they show up here. (And
>beyond that what I really don't understand is why when they show up
>here, you consider them a feature, or at least an unavoidable part of
>bootstrapping Unicodeness, and not a bug.)
>
I don't.

I consider it a design flaw in sbcl.

I have worked around it to my satisfaction, and I do not care to spend a 
bazillion years fixing this, and then trying to get it into the main 
release.

You are welcome to do so.

>You do say something about where host/xc type system interactions show
>up, talking about DEFKNOWNs in particular. Ordinarily I would expect
>that the types which appear in DEFKNOWNs should be used only by the
>cross-compiler's type system (i.e. turned into SB!KERNEL:CTYPE objects
>by calls to SB!KERNEL:SPECIFIER-TYPE, and then further manipulated by
>other machinery in the SB!KERNEL package). Evidently, from what you
>say about "uses host's definitions", they're also being passed to the
>*host's* type system (with CL:SUBTYPEP or whatever). If I could see
>how that is happening, I'd probably consider it a fairly fundamental
>problem in SBCL and be interested in fixing it.
>
No, the defknowns are being _built_ using the host system's datatypes.
They don't need to pass anything back to the host's type-system for this 
to break things.
Especially in the case where one type is being aliased to another in the 
host, and therefore they inherit this implicit aliasing.
(ie they cannot refer to the type 'character', in this particular 
example, since all references to 'character' turn into 'base-char', 
which you can imagine might cause problems for defknowns, etc which want 
to know about functions deal with 'character')

>>You are welcome to look at the hacks to get around it which are in the 
>>sbcl-0.6.13-unicode patch.
>>
>
>Right.
>
>I grepped for "defknown" in the sbcl-0.7.0-unicode.p0 patch, and
>didn't get any immediate insight.
>
No, sbcl-0.6.13 is not sbcl-0.7.0.

The sbcl-0.7.0-unicode version avoids all of this garbage by using the 
sbcl-0.6.13-unicode to bootstrap from.

As the sbcl-0.6.13 provides the correct host-level type-structure this 
problem is not present in the sbcl-0.7.0 patch.

>Perhaps this is a blind spot of mine, where I've been able to overlook
>a rat's nest of dependency bugs because the xc hosts I've worked with
>have always been too similar to the target I'm creating. But blind
>spot or whatever, I would probably recognize it faster if I got a more
>specific hint than "the problem is implicit in the uncommented
>workaround hacks in this multi-megabyte patch of mine [which
>incidentally also implicitly conveys my working estimate of the value
>of others' time compared to mine, since I can't be bothered e.g.
>either to remove irrelevant files or to provide a summary which would
>let the reader confidently determine for himself which files are
>irrelevant]".
>
At this point you are asking me to explain this as a favour to you, 
since I no-longer give a damn.

Extracted from sbcl-0.6.13/src/compiler/fndb.lisp, and cut and pasted 
once again.

Here is a simple english summary of what this horrible code does:

(a) change the system types into user types so that we can redefine them.
(b) destroy the cached type information for 'character', which will 
cause it to split off from base-char.
(c) redefine a bunch of types to include 'string' explicitly, so that it 
expands into the new definitions, in the case of sequence this is wrong, 
but there was probably some reason to do it at the time. The array stuff 
isn't relevant to compiler/fndb, but is to compiler/generic/vm-fndb 
where this is included a second time.

#!+(and sb-xc-host (or sbcl cmu) unicode)
(progn
 #+unicode-bootstrap
 (progn
  ; be naughty and allow these types to be redefined
  ; so that the right type definitions leak in...
  ; *sigh* [BTS]

  ; once for loaded things
  #.(setf (sb-c::info :type :kind 'sb-c::simple-string) :defined)
  #.(setf (sb-c::info :type :kind 'sb-c::string) :defined)
  #.(setf (sb-c::info :type :kind 'sb-c::character) :defined)
  #.(setf (sb-c::info :type :kind 'sb-c::sequence) :defined)
  #.(setf (sb-c::info :type :kind 'sb-c::vector) :defined)

  ; once again for compiled things - probably a better way FIXME [BTS]
  (setf (sb-c::info :type :kind 'sb-c::simple-string) :defined)
  (setf (sb-c::info :type :kind 'sb-c::string) :defined)
  (setf (sb-c::info :type :kind 'sb-c::character) :defined)
  (setf (sb-c::info :type :kind 'sb-c::vector) :defined)

  (setf (sb-c::info :type :builtin 'sb-c::character) nil)
  (sb-c::values-specifier-type-cache-clear)
  (sb!c::values-specifier-type-cache-clear)

  ; I wonder if this will work?
  (def!type base-string (&optional size)
    `(array base-char (,size)))
  (def!type simple-base-string (&optional size)
    `(simple-array base-char (,size)))
  (def!type simple-string-32 (&optional size)
    `(simple-array character (,size)))
  (def!type simple-string (&optional size)
    `(or (simple-array character (,size))
              (simple-base-string ,size)))
  (def!type complex-string (&optional size)

    (def!type complex-string (&optional size)
    `(array character (,size)))
  (def!type string (&optional size)
    `(or (array character (,size))
         (base-string ,size)
         (simple-string ,size)))

  (def!type sequence (&optional size)
    `(or cons
         (member nil)
         vector
         string))

  (def!type vector (&optional element-type size)
    `(or (array ,element-type (,size))
         (string ,size)))

  (def!type pathname-designator ()
    '(or string pathname stream))

  (def!type logical-host-designator ()
    '(or host simple-string string simple-string-32))

  (def!type simple-unboxed-array (&optional dims)
    `(or (simple-array bit ,dims)
         (simple-array base-char ,dims)
         (simple-array character ,dims)
         (simple-array (unsigned-byte 2) ,dims)
         (simple-array (unsigned-byte 4) ,dims)
         (simple-array (unsigned-byte 8) ,dims)
         (simple-array (unsigned-byte 16) ,dims)
         (simple-array (unsigned-byte 32) ,dims)
         (simple-array (signed-byte 8) ,dims)
         (simple-array (signed-byte 16) ,dims)
         (simple-array (signed-byte 30) ,dims)
         (simple-array (signed-byte 32) ,dims)
         (simple-array (complex single-float) ,dims)
         (simple-array (complex double-float) ,dims)
         #!+long-float (simple-array (complex long-float) ,dims) ; hmm?
         (simple-array single-float ,dims)
         (simple-array double-float ,dims)
         #!+long-float (simple-array long-float ,dims)))

  (def!type unboxed-array (&optional dims)
    `(or (array bit ,dims)
         (array base-char ,dims)
         (array character ,dims)
         (array (unsigned-byte 2) ,dims)
         (array (unsigned-byte 4) ,dims)
         (array (unsigned-byte 8) ,dims)
         (array (unsigned-byte 16) ,dims)
         (array (unsigned-byte 32) ,dims)
         (array (signed-byte 8) ,dims)
         (array (signed-byte 16) ,dims)
         (array (signed-byte 30) ,dims)
         (array (signed-byte 32) ,dims)
         (array (complex single-float) ,dims)
         (array (complex double-float) ,dims)
         #!+long-float (simple-array (complex long-float) ,dims) ; hmm?
         (array single-float ,dims)
         (array double-float ,dims)
         #!+long-float (array long-float ,dims)))))

Re: [Sbcl-devel] unicode progress (or lack thereof)

From: William H. N. <wil...@ai...> - 2002-05-01 16:44:20

On Wed, May 01, 2002 at 10:23:54AM +1000, Brian Spilsbury wrote:
> William Harold Newman wrote:
> 
> >
> >Well, OK, I am somewhat familiar with misbegotten interactions between
> >the host type system and the cross-compiler type system, having done a
> >fair amount of work to remove them in the process of making SBCL
> >bootstrap itself.:-| What I don't see is why they show up here. (And
> >beyond that what I really don't understand is why when they show up
> >here, you consider them a feature, or at least an unavoidable part of
> >bootstrapping Unicodeness, and not a bug.)
> >
> I don't.
> 
> I consider it a design flaw in sbcl.

[...]

> No, sbcl-0.6.13 is not sbcl-0.7.0.

(Oops, sorry. I did actually make this mistake, but then I realized it
myself and corrected it by looking at the sbcl-0.6.13 patch as well as
0.7.0. Alas, I neglected to correct what I wrote in my email.)

> At this point you are asking me to explain this as a favour to you, 
> since I no-longer give a damn.

OK. Thank you for, despite this, highlighting in the rest of your
message what you did to work around the design flaw. As per my annoyed
remark in my previous message, I appreciate not having to sift the
information out of all the other stuff in the patch. However, I will
apologize in advance for quite possibly still not figuring out what
you mean. If our conversation hadn't degenerated so badly, the
information that I'd hope for would be *why* you did this stuff (like
hacking SB-C::INFO as opposed to SB!C::INFO), or at least what broke
when you tried doing the less twisted alternatives. That is, I would
like to know the actual symptoms of the design flaw, and/or your
detailed understanding of the design flaw, not just the things you did
to work around the design flaw after you came to an understanding of it. 

When you just tell me the weird things you did without saying why, it
seems a bit like someone sending a bug report "Emacs has a design
flaw. I have to reload my .emacs file every time I close a file. EOT."
Maybe this really represents a design flaw in emacs, but a maintainer
receiving this report is going to suspect that the design flaw is on
the sender's end. Even if the maintainer knows that the sender is a
brilliant programmer who is unlikely to be mistaken about the design,
the maintainer faces a difficult logic problem trying to work
backwards from the workaround to the problem. Of course, it's not a
perfect analogy, since there is much more detail in what you have done
to work around it than just reloading a .emacs file. Thus, to the
extent that no longer giving a damn doesn't go both both ways, the
information here might give me a decent chance of working backwards
from your actions to your thinking and determining the actual problem
for myself. But still...

I don't always write this kind of "why" text myself, but I do try to
do it when I think of it. There's a fair amount of it in
doc/FOR-CMUCL-DEVELOPERS, e.g.
  Even after we've implemented the UNCROSS hack, a lot of the code inside
  EVAL-WHEN forms is still broken, because it does things like CL:DEFMACRO
  to define macros which are intended to show up in the target, and
  under the new system we really need it to do SB-XC:DEFMACRO instead
  in order to achieve the desired effect. So...
Working backward from "what" to "why" can be hard. If you edited all
the "why" text out of FOR-CMUCL-DEVELOPERS, then asked a CMU CL
developer to read it, work out what the changes are for, and write his
own explanation, it'd might take him a while, and he might even make
some mistakes. And a lot of the stuff in FOR-CMUCL-DEVELOPERS is
probably fairly straightforward compared to some other cases where you
could do the same exercise. Consider e.g. the changes CSR has made to
the type system over the last few months, where I certainly would not
have wanted to have to reverse engineer what he was thinking.

I would just as soon fix, or at least catalog, problems in SBCL, so if
you feel like explaining more at some point, I'll probably be
interested. On the other hand, it's also OK if we drop the whole
thing. I can understand you not giving a damn, since I too have had
email correspondences that I've found more productive and enjoyed
more; and I don't need Unicode, so I can afford to be patient.:-|

-- 
William Harold Newman <wil...@ai...>
"My experience has shown that many people find it hard to make their
design ideas precise. They are willing to express their ideas in loose,
general terms, but are unwilling to express them with the precision
needed to make them into patterns." -- Christopher Alexander, _The
Timeless Way of Building_
PGP key fingerprint 85 CE 1C BA 79 8D 51 8C  B9 25 FB EE E0 C3 E5 7C

Re: [Sbcl-devel] unicode progress (or lack thereof)

From: Brian S. <br...@de...> - 2002-05-03 05:10:21

William Harold Newman wrote:

>
>OK. Thank you for, despite this, highlighting in the rest of your
>message what you did to work around the design flaw. As per my annoyed
>remark in my previous message, I appreciate not having to sift the
>information out of all the other stuff in the patch. However, I will
>apologize in advance for quite possibly still not figuring out what
>you mean. If our conversation hadn't degenerated so badly, the
>information that I'd hope for would be *why* you did this stuff (like
>hacking SB-C::INFO as opposed to SB!C::INFO), or at least what broke
>when you tried doing the less twisted alternatives. That is, I would
>like to know the actual symptoms of the design flaw, and/or your
>detailed understanding of the design flaw, not just the things you did
>to work around the design flaw after you came to an understanding of it. 
>
I did say that this was to modify the host's type-system, so naturally 
this needs to be did sb-c::info when the host is sbcl.

Part of the problem is that my understanding of it has been imperfect, 
although I think I've found some reasons for that now.

>
>I would just as soon fix, or at least catalog, problems in SBCL, so if
>you feel like explaining more at some point, I'll probably be
>interested. On the other hand, it's also OK if we drop the whole
>thing. I can understand you not giving a damn, since I too have had
>email correspondences that I've found more productive and enjoyed
>more; and I don't need Unicode, so I can afford to be patient.:-|
>
I decided to try compiling with 0.7.2 to re-look over the problem, and 
suprisingly the main issues had disappeared, which leads me to suspect 
that the byte-code compiler may have been at fault, since I can't think 
of any other good reason.

This would also explain why I was never able to really track down a 
sensible reason for why it behaved as it did.

In any case, hopefully this peculiar issue has died with 0.6.13.

--

Having taken some time to cool down, I think that there are a couple of 
reasons as to why I've become incrementally more angry over this last month.

I think the main thing has been a combination of forgetting, repetition, 
and a sense of a profound lack of communication over this whole period.

1/ the re-raising of the 'string type as union of arrays types' issue, 
which had been dealt with earlier.
2/ the bootstrapping problem was initially raised and discussed about a 
few months back as well.
3/ the long delays before replies over this whole period, about a week 
on average or so.
4/ the sense of absolutely no communication, or even some weird negative 
communication where I've said one thing, and you've replied to me 
indicating that you think that I've said the opposite.

I'm sure that sensible reasons exist for the first three, and the fourth 
is undoubtedly a collaborative effort.
I'm not making accusations, just giving an explanation of my perception 
of this.

Originally what I wanted was feedback on the approach to take, and what 
I needed to do and could expect in return. As the replies continued to 
be more or less vague, and mostly concerning either what I understand to 
be trivial (ie cutting off the character databases) or impossible things 
(ie chopping the patch up into small pieces which could be incrementally 
used to move toward a working system) I tried to turn this around by 
stating what I expected and expected to do, which was a mistake in 
hindsight.

Overall the process has been one of progressive disillusionment and 
perplexment, and an increasing sense of unreality.

At this time I do not bear any ill-will toward yourself or sbcl, however 
my current feeling is that there is little point trying to submit 
extensive core changes which the maintainer is disinterested in (and 
while back-end ports are certainly extensive efforts, they do not affect 
the core of the code significantly imho).

This being the case, I will pursue these in a different fork of sbcl, 
and when the code is mature and if the maintainers of sbcl become 
interested, then they can work out what they want to do to integrate 
these things, and I will be happy to announce if/when things become 
usable or stable, and provide cvs access.

Hopefully future communication will be less broken. :)

Regards,

Brian.

Re: [Sbcl-devel] unicode progress (or lack thereof)

From: William H. N. <wil...@ai...> - 2002-05-03 14:32:59

On Fri, May 03, 2002 at 03:09:13PM +1000, Brian Spilsbury wrote:
> Having taken some time to cool down, I think that there are a couple of 
> reasons as to why I've become incrementally more angry over this last month.
> 
> I think the main thing has been a combination of forgetting, repetition, 
> and a sense of a profound lack of communication over this whole period.
> 
> 1/ the re-raising of the 'string type as union of arrays types' issue, 
> which had been dealt with earlier.
> 2/ the bootstrapping problem was initially raised and discussed about a 
> few months back as well.

On these things I was still uncertain. If this was just me being spacy
and forgetting how the issues were previously resolved, I'm sorry, I
can certainly see how that would be aggravating.

> 3/ the long delays before replies over this whole period, about a week 
> on average or so.

Yes. I've been falling down on that, and I'm sorry. At this point I'm
not very energetic about SBCL for at least two reasons, (1) it works
well enough for my application programming that I'm more motivated to
do application programming in it than I am to improve it, and (2) I'm
still recovering from an overrun of my "how much time do I want to
allocate to SBCL" time budget in 0.7.0.

Incidentally, in case I didn't convey it before or you didn't pick it
up before, this is related to the way that I've tried to press you to
break things up into smaller chunks, and to develop any messy
still-under-development parts of the system in a loosely-coupled way
under your own change control. It isn't only that I have some abstract
s/w engineering aesthetic idea that loose coupling is good (though I
do) but also that I thought it would help in this particular
situation. I have only so much time and energy, less now than at some
other times, and I thought it would be wiser to spend my time and
energy on things which really need to tightly integrated. Merging e.g.
the recent APD compiler patches and NJF PCL patches is mandatory,
since they're not very meaningful independently. I think the low-level
wide-character support is similar. But I don't think the high level
Unicode stuff needs to go through me, and I had hoped that by keeping
it independent we'd avoid the aggravation of the bottleneck (which is
aggravation not just for you but for me).

> 4/ the sense of absolutely no communication, or even some weird negative 
> communication where I've said one thing, and you've replied to me 
> indicating that you think that I've said the opposite.

Uh, yeah, what you said. I think I've referred before to similar
feelings on this end. I'm not sure what's wrong, but for sure it
hasn't worked well.

> At this time I do not bear any ill-will toward yourself or sbcl, however 
> my current feeling is that there is little point trying to submit 
> extensive core changes which the maintainer is disinterested in (and 
> while back-end ports are certainly extensive efforts, they do not affect 
> the core of the code significantly imho).
>
> This being the case, I will pursue these in a different fork of sbcl, 
> and when the code is mature and if the maintainers of sbcl become 
> interested, then they can work out what they want to do to integrate 
> these things, and I will be happy to announce if/when things become 
> usable or stable, and provide cvs access.

OK. At this point I can be enthusiastic about any solution which
solves the bottleneck problem without having as a prerequisite any
agreement between Texas and Australia.:-|

Meanwhile, as I've said, if at some time you put the
tightly-integrated-into-SBCL parts of your patch (changes to the type
system, VOPs, etc.) into what I consider mainstream-SBCL-friendly form
(bootstrappable, not very expensive or dangerous for people who are
satisfied with STANDARD-CHAR, not too much stuff which doesn't follow
from the ANSI specification, ideally some explanations to help
reviewers over the confusing parts...) then you should be welcome to
merge them. But there's no rush. (Or to the extent that there is time
pressure that I'm not aware of, if past is prologue, someone else may
get impatient and port it.)

> Hopefully future communication will be less broken. :)

Amen. 

-- 
William Harold Newman <wil...@ai...>
"Palantir great. Better than cable."
  -- <http://home.nyu.edu/~amw243/diaries/saruman.html>
PGP key fingerprint 85 CE 1C BA 79 8D 51 8C  B9 25 FB EE E0 C3 E5 7C

[Sbcl-devel] progress

From: Brian S. <br...@de...> - 2002-05-04 05:56:05

William Harold Newman wrote:

>
>OK. At this point I can be enthusiastic about any solution which
>solves the bottleneck problem without having as a prerequisite any
>agreement between Texas and Australia.:-|
>
>Meanwhile, as I've said, if at some time you put thetightly-integrated-into-SBCL parts of your patch (changes to the type
>system, VOPs, etc.) into what I consider mainstream-SBCL-friendly form
>(bootstrappable, not very expensive or dangerous for people who are
>satisfied with STANDARD-CHAR, not too much stuff which doesn't follow
>from the ANSI specification, ideally some explanations to help
>reviewers over the confusing parts...) then you should be welcome to
>merge them. But there's no rush. (Or to the extent that there is time
>pressure that I'm not aware of, if past is prologue, someone else may
>get impatient and port it.)
>
The bootstrap issue is gone, but I the only thing that I can really cut 
down on is the character databases, and I've produced an Ascii database 
which allows 0.7.2 to compile it, but the reader, read-table, streams, 
printer, filesystem, pathname, sequence, array, type-system, etc bits 
can't be chopped off, or you'll get a broken build.

In the other direction, it should be possible to patch some peripheral 
systems, such as the read-table, filesys, format, and so on, without 
supporting the deeper changes.

I'll look into cleaning these up, and doing the refactoring which I've 
avoided so far in order to minimise changes.

At the moment, I'm manually synchronising my code with a recent cvs 
snapshot from sourceforge, so, I guess I'll see how long that takes to 
complete first.

Regards,

Brian.