SourceForge has been redesigned. Learn more.

Strange behavior of encoding detection

  • Francesco

    Francesco - 2010-08-17


    If I set the utf8 as the default encoding and I create a new file everything
    is ok (in the status bar "UTF-8" appears).

    On the other hand, if I open a pure ASCII LaTeX file, since there is no
    difference among ASCII, UTF-8 and ISO-8859-1
    for the first 128 characters, I expect that TeXmakerX chooses what I set in
    the preferences, that is UTF-8.
    Instead tmx chooses ISO-8859-1. I think this is wrong.


  • Benito van der Zander

    Well, I had three reasons to use the latin1 default for unrecognized files:
    1) latin1 has a character for every possible byte, so if a not latin1-file is
    opened and saved as latin1, it is not modified, while utf-8 can destroy a non-
    utf8 file.
    2) The utf8 detector, couldn't report a file as ascii, only as utf16, utf8 and
    3) The encoding detector doesn't know about your default encoding, because it
    is contained in qcodeedit, and qce should stay indepenendt of tmx.

    Anyways, I changed it now:

    The encoding detection works now as follow:
    If QDocument detects the file is UTF16LE/BE, use that encoding
    Else If QDocument detects UTF-8 {
    If LatexParser::guessEncoding finds an encoding, use that
    Else use UTF-8
    } Else {
    If LatexParser::guessEncoding finds an encoding use that
    Else if QDocument detects ascii (only 7bit characters) {
    if default encoding == utf16: use utf-8 as fallback (because utf16 can be
    reliable detected and the user seems to like unicode)
    else use default encoding
    Else {
    if default encoding == utf16/8: use latin1 (because the file contains invalid
    unicode characters )
    else use default encoding


  • Francesco

    Francesco - 2010-08-18

    Thanks for the explanation and for this new behavior!




Cancel  Add attachments