Encoding detection, ANSI (Windows 1252) vs. UTF-8 (w/o BOM)

Eigencon
2013-08-17
2013-08-19
  • Eigencon

    Eigencon - 2013-08-17

    Are there ways to configure encoding detection when opening an existing file?

    I'd like to make NP++ detect ANSI files that contain only standard characters (and are therefore identical to UTF-8 (w/o BOM), if my understanding is correct) as "ANSI" and not as "ANSI as UTF-8".

    (I'll post my "problematic scenario", if requested, to explain why one would want such a thing.)

     
    Last edit: Eigencon 2013-08-17
  • cchris

    cchris - 2013-08-17

    Please do, because what difference does it make to the text file?

    CChris

     
  • Eigencon

    Eigencon - 2013-08-18

    If "non standard characters" like "ä","ö","ü" are added to a file which up to this point contained only standard characters, encoding detection does determine how these special characters are encoded (ANSI or UTF-8).

    (The Setting "Preferences -> New Document > Encoding" does not seem to matter here.)

    -

    Try the following experiment:

    1.0 Set "NP++ Preferences -> New Document" to "Format: Windows" and "Encoding: ANSI"

    2.0 Restart NP++ just to be sure

    3.1 Create a new file

    3.2 Enter/Copy the line
    echo Hello World>>"%~dp0test.log"

    3.3 Save the file as "test1.cmd" (don't close)

    3.4 Add the line
    echo Hallöchen Welt>>"%~dp0test.log"

    3.5 Save "test1.cmd" again

    4.1 Create another new file

    4.2 Enter/Copy the line
    echo Hello World>>"%~dp0test.log"
    (like step 3.2)

    4.3 Save the file as "test2.cmd" and close the file in NP++

    4.4 Reopen "test2.cmd" with NP++

    4.5 Add the line
    echo Hallöchen Welt>>"%~dp0test.log"
    (like step 3.4)

    4.6 Save "test2.cmd" again

    Result:

    • The first thing to note is that "test1.cmd" is now encoded with "ANSI (Windows 1252)", while "test2.cmd" is encoded with "UTF-8 (w/o BOM)". The files are not identical, because we "forgot" to manually change the encoding of "test2.cmd" to ANSI before we entered the problematic characters (Step 4.5).

    • If you know a bit of Windows scripting, you will have noticed that both files contain commands to add write output to the same text file "test.log".

    Please run both scripts in any order and look at the output file "test.log". It will look like this:

    Hello World
    Hallöchen Welt
    Hello World
    Hallöchen Welt

    or this

    Hello World
    Hallöchen Welt
    Hello World
    Hallöchen Welt

    We now have produced an ugly case of mixed encoding. :(

    -

    I work in large environment, where I can't force every administrator to use UTF-8 only. Most scripts that we use call several other subscripts and the output they produce goes in the same text file. This may just be a log file in most cases, but it can also be an important config file.

    The Windows Editor prefers ANSI by default and now I'm looking for a way to make NP++ also prefer ANSI by default when loading files that conform to "UTF-8 (w/o BOM)" to avoid cases of mixed encoding.

     
    Last edit: Eigencon 2013-08-18
  • Eigencon

    Eigencon - 2013-08-19

    Problem solved!

    There is a sub-setting to "Preferences -> New Document > Encoding > UTF8 without BOM" called Apply to opened ANSI files that does exactly what I need. This option needs to be unchecked in my case.

    This check-box corresponds to openAnsiAsUTF8 in config.xml.

    I kept using a very old config.model.xml in my unattended installation that led to a faulty configuration, which most likely cannot be achieved by using the GUI.

    I give NP++ 98 of 100 stars! I can hardly thank you enough for this great application!

     
    Last edit: Eigencon 2013-08-19