Re: [Sbcl-devel] OUTPUT-REPLACEMENT restart for fd-streams external-format

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Sep 7, 2009, at 9:24 AM, Christophe Rhodes wrote:

> John Fremlin <jf...@ms...> writes:
>
>> On CCL the implementation checks for invalid code points by itself  
>> but
>> on SBCL, the encoder lets them through. I agree that it is nice to
>> deal with it, but the sensible default option is not `error'.
>
> I strongly disagree, and here's why: in a dynamic scope, with
> sufficiently expressive restarts (as I'm trying to provide in the
> patches you're not apparently commenting on) the programmer can  
> specify
> the recovery strategy; when an `error' is not in fact the same as a
> program `crash', there is no particular need to fear it, and because
> "UTF-8" has a standardized meaning and specifies certain conditions as
> error situations, it seems reasonable to model those conditions as
> Common Lisp errors.
>
> As an example of a situation where one recovery strategy does not fit
> all, imagine a user deciding that, when reading files corresponding to
> source code, a decoding error while reading a string literal should
> cause Unicode replacement characters to be substituted, but a decoding
> error in other contexts should be an error that demands human
> intervention -- for a simpler example of that, consider how sbcl deals
> with decoding errors within comments.
>
> The exception, of course, is when presentation of error information  
> and
> the error recovery strategies available would cause a further error:
> such as when attempting to write a string with a noncharacter in it to
> the same low-level stream as would be used for the debugger.  As I  
> said
> in the message you replied to, my aim is to provide external formats
> with, effectively, the recovery strategy predetermined for such cases.
>
> If the OUTPUT-REPLACEMENT restart I've implemented, along with the
> analogous INPUT-REPLACEMENT restart for decoding errors, is not
> sufficient to express most useful recovery strategies, then clearly  
> I'm
> going down the wrong path.  But I think it is sufficient for many
> purposes; for example, output of #\uFFFD for each encoding error is
>  (handler-bind ((encoding-error
>                  (lambda (c)
>                    (invoke-restart 'output-replacement #\uFFFD))))
>    ...)

Does the encoding-error condition include a slot with the erroneous  
code sequence?

Could we provide several characters as output-replacement?

Given a lisp string, how could we output mostly a utf-8 byte sequence,  
but with some invalid codes interspersed (ie. to reproduce the  
original byte stream)?

It seems to me that in a number of situation, it would be desirable to  
transparently transmit the "error" in utf-8 data.  One way to do so  
would be to encode invalid utf-8 byte sequences as a sequence of "non- 
character codepoints" (U+FDD0..U+FDEF) when reading, and of course, to  
do the reverse transformation when writing, assuming these "non- 
character codepoints" are Lisp CHARACTERs.  Or better, some other Lisp  
CHARACTER, if there exist characters beyond the unicode set.

Concerning the use of conditions, perhaps efficiency considerations  
would call for a more proactive mechanism.  For example, in clisp, the  
handling of invalid code sequences may be specified in the encoding  
structure (which can be used as external-format).   http://clisp.cons.org/impnotes/encoding.html#make-encoding

-- 
__Pascal Bourguignon__
http://www.informatimago.com

Re: [Sbcl-devel] OUTPUT-REPLACEMENT restart for fd-streams external-format

Common Lisp compiler and runtime

Re: [Sbcl-devel] OUTPUT-REPLACEMENT restart for fd-streams external-format