[Sbcl-devel] Re: Re: octet strings SBCL broken

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Christophe Rhodes,

> Adam Warner <li...@co...> writes:
> 
>> Thanks for the clarifications. It's now clear to me that SBCL must be
>> built with Unicode support to continue to be as useful:
>> <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=284602>
> 
> "useful" seems to be in the eye of the beholder, here.  I prefer to be
> able to type an e-acute at my lisp prompt and have it be recognized as
> an e-acute rather than as some binary junk.  In addition, in the bug
> report you have ignored, deliberately or not, what I wrote in my
> previous mail about running under a given locale; if you run SBCL in
> an ISO-8859-1 or ASCII locale, then the octets are directly
> interpreted as characters.  Try
>   $ LANG=C sbcl
>   * "whatever binary stuff you want"

Thank you for your persistence. In your original reply I thought you were
encouraging me to run my terminal in ISO-8859-1 which would have solved
the arbitrary octet problem at the expense of not being able to print
characters with code points above 255.

Now I appreciate that I can simply lie to SBCL about the current character
encoding so I can read and build UTF-8 octet sequences using CHAR-CODE and
CODE-CHAR respectively. This has tremendous implications for being able to
mix strings, code and binary data over a character stream from one Lisp
implementation to another. Binary data can be sent over the character
stream with virtually no overhead (such as first converting the binary
data to an ASCII subset and then decoding that ASCII subset at the other
end).

Thanks again.

>> I'd drop the support unless you later intend to revert to naive CMUCL-like
>> string handling. In the naive situation one could use CHAR-CODE to
>> manually decode the Unicode code points. Without the former naive string
>> handling one can't even read the string in the first place, so there's
>> very little point in keeping it as a build option (i.e. to be useful one
>> would also need to disable external format support).
> 
> This seems a little broad -- are you really asserting that naive
> CMUCL-like string handling is the only possible use of a
> non-wide-character build?

You've demonstrated that even Unicode SBCL can be used for precisely this!
The benefit over Unicode SBCL is: strings take significantly less storage
space with European character sets. The cost is: No Unicode code point
character handling. Unicode SBCL sounds like the build most people will
want to use even when reading arbitrary octets over character streams
(since they also have the option to turn then into sequences of Unicode
code point strings).

Regards,
Adam

[Sbcl-devel] Re: Re: octet strings SBCL broken

Common Lisp compiler and runtime

[Sbcl-devel] Re: Re: octet strings SBCL broken