On Aug 9, 2009, at 6:32 PM, Christophe Rhodes wrote:
> Does all of this accord with other people's interpretations of SBCL
Yes, that all sounds right to me.
> Is there code out there which actively takes a different
> interpretation? I'm planning to implement better support for Unicode
> data querying and algorithms (such as normalization, collation and
> comparison); I suspect that we will also want to implement a more
> permissive UTF-8 variant, which allows ill-formed UTF-8 through or
> replaces ill-formed sequences with a replacement character, for
> interactive use.
I think it'd be nice to have an API like Python's, which separates out
the encoding and the error handling.
SBCL currently currently just raises a condition when there's an
encoding/decoding error, which, while you can catch it, it's not
convenient since it's then not associated with a particular stream.
Their docs website is apparently down at the moment, so I reproduce
part of the docs below:
> To simplify and standardize error handling, the encode() and
> decode() methods may implement different error handling schemes by
> providing the errors string argument. The following string values
> are defined and implemented by all standard Python codecs:
> Value Meaning
> 'strict' Raise UnicodeError (or a subclass); this is the default.
> 'ignore' Ignore the character and continue with the next.
> 'replace' Replace with a suitable replacement character; Python will
> use the official U+FFFD REPLACEMENT CHARACTER for the built-in
> Unicode codecs on decoding and ‘?’ on encoding.
> 'xmlcharrefreplace' Replace with the appropriate XML character
> reference (only for encoding).
> 'backslashreplace' Replace with backslashed escape sequences (only
> for encoding).
> The set of allowed values can be extended via register_error().
PEP-383 added the new error handler "surrogateescape" which, when used
as an error handler with UTF-8 encoding/decoding, implements UTF-8b.
But it can be used just as well with any other ASCII-compatible