Strange behavior of encoding detection

Francesco
2010-08-17
2012-10-17
  • Francesco
    Francesco
    2010-08-17

    Hello,

    If I set the utf8 as the default encoding and I create a new file everything
    is ok (in the status bar "UTF-8" appears).

    On the other hand, if I open a pure ASCII LaTeX file, since there is no
    difference among ASCII, UTF-8 and ISO-8859-1
    for the first 128 characters, I expect that TeXmakerX chooses what I set in
    the preferences, that is UTF-8.
    Instead tmx chooses ISO-8859-1. I think this is wrong.

    Regards.
    Fra

     
  • Hi,
    Well, I had three reasons to use the latin1 default for unrecognized files:
    1) latin1 has a character for every possible byte, so if a not latin1-file is
    opened and saved as latin1, it is not modified, while utf-8 can destroy a non-
    utf8 file.
    2) The utf8 detector, couldn't report a file as ascii, only as utf16, utf8 and
    not-utf8.
    3) The encoding detector doesn't know about your default encoding, because it
    is contained in qcodeedit, and qce should stay indepenendt of tmx.

    Anyways, I changed it now:

    The encoding detection works now as follow:
    If QDocument detects the file is UTF16LE/BE, use that encoding
    Else If QDocument detects UTF-8 {
    If LatexParser::guessEncoding finds an encoding, use that
    Else use UTF-8
    } Else {
    If LatexParser::guessEncoding finds an encoding use that
    Else if QDocument detects ascii (only 7bit characters) {
    if default encoding == utf16: use utf-8 as fallback (because utf16 can be
    reliable detected and the user seems to like unicode)
    else use default encoding
    }
    Else {
    if default encoding == utf16/8: use latin1 (because the file contains invalid
    unicode characters )
    else use default encoding
    }
    }

    Benito

     
  • Francesco
    Francesco
    2010-08-18

    Thanks for the explanation and for this new behavior!

    Regards.
    Fra