Menu

#78 text re-encoding sometimes fails

open
5
2006-04-07
2006-04-07
No

All of the various text encoding conversion functions
in strings.h can produce entirely the wrong output if
the input string is not in the correct format.

All of the converters, such as utf8toansi, use
converttextencoding() on the mac.

converttextencoding() uses TECConvertText(), which is
provided by the OS.

However, all calls to the TEC api are ignoring the
OSStatus results returned by those calls.

When a badly-formed utf-8 string is passed to
utf8toansi, then to converttextencoding, then to
TECConvertText, the result is a converted string that
is ***truncated*** at the first non-utf8 character.

(I found this because there's also a bug in the string
verb for decoding quoted-printable strings!)

Discussion

  • Seth Dillingham

    Seth Dillingham - 2006-04-07

    Logged In: YES
    user_id=1171838

    The quick fix here is for converttextencoding() return a
    boolean. It'll return true if everything works, and false if
    TECConvertText returns an error.

    A better fix would be a new set of error messages to go
    along with text encoding conversions, to really let the
    script know what happened.

    More testing before I can submit my changes.

     
  • Seth Dillingham

    Seth Dillingham - 2006-04-08

    Logged In: YES
    user_id=1171838

    I have this working as I described, but it's still not
    right. We really need the error codes. Failing to convert
    the text encoding, without explaining why, is just plain rude.

    Microsoft's converter only offers four error messages,
    whereas Apple's offers lots of them. Of course. It looks
    like MS's four errors can be mapped to some of Apple's, so
    I'm going to add Apple's errors to the resource list and map
    the error codes from either platform to the appropriate
    resource number.

    Unless someone says otherwise Real Soon Now. :-) (I'm
    working on this stuff right now.)

     
  • Andre Radke

    Andre Radke - 2006-04-09

    Logged In: YES
    user_id=1137587

    No complaints from me, just go ahead.

    Using common error messages for both platforms sounds good.

    If there are invalid characters in the string to be
    converted, would it be possible to make the error message
    report the position of those characters?

     
  • Seth Dillingham

    Seth Dillingham - 2006-04-09

    Logged In: YES
    user_id=1171838

    >If there are invalid characters in the string to be
    >converted, would it be possible to make the error message
    >report the position of those characters?

    I think so, because (at least in Apple's converter) the
    result string is truncated at the first invalid character
    found in the stream.

     

Log in to post a comment.

MongoDB Logo MongoDB