"Robert J. Macomber" <sf-sbcl-2@...> writes:
> Here's a patch that makes file-string-length report the change in
> file-position correctly for non-unibyte encodings.
Thanks. I merged this into sbcl-0.9.7.2.
Here's a new version of file-string-length which takes into account
encoding issues. If FSL is passed a (string containing a) character
which can't be represented in the output stream's external-format, FSL
returns NIL rather than, as it used to, making a sort of wild guess.
I'm a touch less happy with it because, in order to keep it minimally
intrusive, I'm somewhat abusing the current define-external-format
definition. DEF/variable-width now gets an additional parameter
specifying the maximum number of bytes to which a single character can
encode. The sizer function which was added in the last patch now
actually _does the encoding_ (this is the bit I'm not entirely happy
with) into a dummy octet array and catches the stream-encoding-error
if it's signalled. The alternative though is to add a
"character-encodable-p"-style clause to define-external-format and
modify all the uses of that macro to provide them, but since "check if
encodable" and "encode" are such very similar things to do to a
character I didn't see a tremendously OAOO way to separate them.
I've added a couple of tests to external-format.impure.lisp. The
first one checks that FSL, given a latin-1 stream, returns 1 for
character codes 0 through 255 and nil for codes 256 through
char-code-limit. This one is not fast, since it goes
character-by-character through the entire set of unicode code points;
FSLing a string looks up the external format's sizer function and sets
up the encoding error condition handler only once, so
(file-string-length utf8-stream "all 1114112 characters") is much
faster. The other is a "spot check" for latin-9, asserting that
#\euro-sign has a file-string-length of 1, and
#\coptic-capital-letter-hori returns NIL.