William Harold Newman wrote:
>On Fri, Oct 05, 2001 at 03:54:00PM +1000, Brian Spilsbury wrote:
>>What isn't done yet is fixing the string support.
>>simple-string needs to be duplicated into simple-immutable-string and
>>simple-byte-string, the latter for the ffi interface mostly.
>As long as there is any specialized support for 8-bit characters left
>in the implementation, shouldn't they be BASE-CHARs, and then
>shouldn't this be SIMPLE-BASE-STRING?
Well, base-char is now 21 bit, so... not sure, but yes, I've left it as
simple-string for now.
>>then a (make-literal-string) method is necessary which produces a
>>simple-immuatable-string if all the characters qualify,
>>or a utf8-immutable-string if they don't.
>I'd prefer separating "make read only" and "squash into BASE-STRING
>if possible" into separate orthogonal operations. Perhaps
> * a new SB-EXT:READ-ONLY-STRING function analogous to STRING, which
> coerces its argument to a read-only string; or a :READ-ONLY option
> to STRING, as an alternate interface to the same thing
> * a new keyword argument (perhaps :COMPACT, I dunno) to functions
> like COPY-STRING, STRING, and READ-ONLY-STRING, to cause the
> squash-into-BASE-STRING-if-possible operation
>(But also see my growing reservations about read-only-ness below.)
Yes, I was thinking of (make-immutable-string) as a pair to
(make-string), and having the reader naturally use it for string
literals, which would allow us to by default use the fat 32-bit char
strings without too much anguish, and support coersion when the
type-specs are sorted out.
>>complex-string is likewise UCS-4 encoded.
>>I think that covers things pretty well, I'm not sure what kind of
>>overhead we can expect on string dispatching though, and I don't know of
>>an ansi typespecifier for an immutable string... which is annoying since
>>they effectively specify them in the literals section.
>Of course you're right that there's no ANSI type specifier for an
>immutable string. (And you're also right that there are some annoying
>things about the ANSI standard.:-) Some sort of extension would be
>needed. And now that I think about it, I'm afraid that might be a
>jumbo-size can of worms.
I don't think we need a type-specifier for ansi-cl code, in ansi code
the only immutables will be literals anyhow, and modifying those is
undefined behaviour. However we can add a non-ansi type specifier for
sbcl-ish code such as immutable-simple-string etc, where we can afford
some ugliness. immutable-p might also be useful.
>I don't see any very nice way to put read-only-ness in a Common Lisp
>type specifier. The C++ idea that 'const char' is used where you'd use
>'char' doesn't seem like an obvious fit to Common Lisp type syntax
>(and there might even be deep problems in extending the semantics that
>way). You could avoid any deep syntax problems by extending the Common
>Lisp type system to include READ-ONLY-ARRAY and READ-ONLY-STRING and
>so forth, but I'm afraid it'd be quite a mess, since the system is
>already big enough that it can be hard to remember everything.
>So it might be hard to put immutability in the type specifier.
>Unfortunately, I think there are some good reasons that immutability
>ought to show up there.
> * A lot of optimization code in the compiler works by passing
> around type information, so if you want to be able to compile
> things like (SETF AREF) efficiently in the presence of
> read-only-ness, you'll probably need to pass around
> read-only-ness (and writeable-ness) as part of the type.
> * The idea that (TYPE-OF FOO) could return a standard type like
> SIMPLE-STRING, but then when you tried to do (SETF (SCHAR FOO 0) #\0)
> you'd get a runtime error, makes me uneasy. (Although I
> might be able to live with it if it's only used to signal an error
> in the ANSI-undefined case (SETF (SCHAR (SYMBOL-NAME 'PRINT) 0) #\p).)
>When some time ago I asked whether you were designing the extension
>yourself or porting something from some other Lisp, I was mostly
>worried about the problem of generalizing all the operators in Common
>Lisp to do the right thing when presented with read-only inputs, and
>to construct read-only outputs when wanted. (If I do DELETE-IF on an
>immutatable string, should I get back an immutable string?) But now
>that I think about it, I'm even more worried about the problems of the
>type system than I am about the problems with operators.
My feeling is to simply raise a condition upon a mutation attempt upon
an immutable object.
Initially, I will silently ignore attempts to mutate such strings,
though, for simplicity.
This seems to satisfy the spec in a polite fashion (the condition
raising), which allows undefined behaviour on such anyhow.
>I was receptive to the idea of read-only-ness, even in the absence of
>a complete design, because I sorta thought that even if a complete
>design of operator behavior turned out to be hard, you could punt.
>Just add a READ-ONLY keyword argument to MAKE-ARRAY and MAKE-STRING
>and COPY-STRING and declare victory! I still think that that simple
>result would be somewhat useful (especially since it would make SBCL's
>system data, e.g. SYMBOL-NAME values, a little harder to corrupt).
>However, I had also assumed it would be complete, and now that I've
>thought more about the type system issues, I'm no longer sure of that.
> * What will TYPE-OF return when it's called on a read-only string?
> * What will TYPE-OF return when it's called on an ordinary
> ANSI-standard string, one which supports (SETF SCHAR)?
> (TYPE-OF (MAKE-STRING 51))
> * (SUBTYPEP (TYPE-OF (SYMBOL-NAME 'FOO)) (TYPE-OF (MAKE-STRING 3))) => ?
> * (SUBTYPEP (TYPE-OF (TYPE-OF (MAKE-STRING 3)) (SYMBOL-NAME 'FOO))) => ?
> * If there are new type specifiers, how much code can be exposed
> to the new type specifiers by portable constructs, e.g.
> (CONCATENATE (TYPE-OF (SYMBOL-NAME SYM)) THIS THAT)
> (MAP (TYPE-OF (SYMBOL-NAME SYM)) *FROBBER* THOSE THESE)
> (COERCE MY-NAME (TYPE-OF (SYMBOL-NAME SYM)))
> and so must necessarily be made to work with the new type specifiers
> before the extended system is ANSI-compliant?
> * Is there a way (some sort of extension?) to explicitly declare a
> string argument to be writable, so that (SETF SCHAR) can be compiled
> efficiently? Will portable string-handling code, without such
> extended declarations, still be compiled reasonably efficiently?
Well, initially type-of will return weird names for the extended
strings, but that's a good question.
It may be possible to transform the weird type into a polite type at the
surface level, while retaining the true nature for the compiler, and
those non-ansi programs which care to use the weird names, I'm not sure
about that though, I'll read the hyperspec some more.
I believe that by default all strings which are not literals should be
mutable for ansi compatibility.
This means that all our (make-string) results will be ucs-4 encoded, and
we pay in size, but we don't usually do a huge amount of this.
String operations between representations are a bit messy, but it isn't
We can use the same operators for comparison (of strings) for all
simple-string, immutable-simple-string, immutable-utf8-strings.
Comparison between simple-string and ucs-4-string is fairly
straight-forward, comparing ucs-4-string and immutable-utf8-string is
the most expensive since we need to decode the utf-8-string character by
Fortunately utf-8-string will mostly be in symbol-names (I expect),
where it will generally be compared with other symbol-names and printed,
which are operations that do not require random access.
I believe with some care that the result can be acceptably efficient.