Menu

[BUG]Always "Encode in ANSI"?

2007-11-22
2012-11-13
  • Nobody/Anonymous

    First of all, sorry for my poor English, i'm not sure if i can explain clearly what i've met, but i will try.

    When i open a "UCS-2 Little Endian" txt File at the first time, i find that in the Menu "format" it always points at the place "Encode in ANSI", but after i switch to another window and switch back to NP++, it can points at the right place "UCS-2 Little Endian". Next time i open the file again, it can points at the right place.

    Find this bug starts from v4.3, but not in v4.2.2.

     
    • Philippe Verdy

      Philippe Verdy - 2007-11-24

      Note for example that a compliant Java compiler will reject a Java source file starting with a BOM, despite Java source fil€es CAN be encoded with Unicode.

      Don't assume that all programming source files are using an 8-bit only encoding (or even UTF-8). Don't assume that source files are ASCII-based or use some (non portable) ANSI or OEM encoding.

      Note that there's NO such encoding like "ANSI" or "OEM". These are just placeholders for codepages that vary from system to system, including on Windows. The "ANSI" and "OEM" encodings are just MISNOMERS inherited from OLD Windows API compatibility; they are even officially seriously deprecated on Windows (only some OEM encodings are really needed for some kernel drivers at boot time when they need to log some mesages on the debug console, when the internationalization support is still not loaded). You should not advertize these ambiguous terms.

      Please refer to actual codepage numbers or names in Notepad++, and PLEASE let users select the appropriate codepage (don't assume the local system default as the file may come from another host using different system settings for the "ANSI" or "OEM" pseudo-codepages!)

       
    • Philippe Verdy

      Philippe Verdy - 2007-11-24

      Actually, Notepad++ cannot even open a UTF16-LE text file it it contains no leading BOM!
      Notepad++ can ONLY perform autodetection of the encoding if there's a leading BOM, but does not allow specifying the encoding for loading a file.

      It can be admissible to open a file without first specifying the encoding, but there REALLY SHOULD exist a menu option to reload the file (or reinterpret it) with another encoding.

      DON'T ASSUME that a leading BOM is present in all Unicode encoded files (UTF-8, UTF-16BE, UTF-16LE, UTF-32LE, UTF-32BE) because in fact this is wrong, and for some applications, storing a BOM even in a UTF-8 file will be incorrect.

      Note that storing a leading BOM when saving to a UTF-16LE or UTF-16BE or UTF-32-LE or UTF-32BE file is NON-CONFORMING, and COMPLETELY INCORRECT according to the Standard. BOMs are ONLY acceptable in UTF-8, UTF-16 (not UTF-16LE or UTF-16BE) and UTF-32 (not UTF-32LE or UTF-32BE).

      This is a serious COMPATIBILITY BUG, including for some Windows system files that DON'T WANT any BOM in some files encoded in UTF-16LE (not UTF-16!).