Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#4596 "ANSI as UTF-8" not applied on empty files

All
open
nobody
9
1 day ago
2013-12-05
Hubert Hansen
No

Hey there,

since your last (few) Notepad++ does not open ANSI files as "UTF-8 without BOM" anymore even if I tick the option in the settings.

You can you can reproduce that by setting the option in the settings and opening an empty text file. The file should be generated OUTSIDE of Notepad++. You really need to open it, not create it from inside Notepad++.
You will see that bottom right tells you "ANSI". If you type in some special char like "é", save the file (no change in Encoding), close and reopen it, you will see, that the mayor issue with this.
The file will be saved with ANSI encoding and not with UTF-8 without BOM.

This is pretty huge, since this could break whole applications.

I do not know why this behavior was changed or maybe it is just a bug, but it needs to be fixed immedeatly, please.

Gr33tZ
Rn

Discussion

  • Hubert Hansen
    Hubert Hansen
    2013-12-05

    Ah, I forgot: I think this has been an issue somewhen since you have changed the settings dialog appearance.

     
  • Chinoto Vokro
    Chinoto Vokro
    2014-04-21

    What happens when you set the encoding, save it with a non-ascii character, then open it again? I'm guessing there is nothing to detect that it is "UTF-8 without BOM" otherwise.

     
  • Hubert Hansen
    Hubert Hansen
    2014-04-25

    What do you mean by "set the encoding"?

    The option stating to open ANSI files with UTF-8 without BOM should not need anything in the file. It should just choose the encoding by itself even if there is "nothing to detect".

     
  • Chinoto Vokro
    Chinoto Vokro
    2014-04-28

    Sorry, I didn't actually test it last time. You're right, if an 'é' is saved in "ANSI"/ISO 8859-1/Windows-1252 encoding, Notepad++ will decide to incorrectly use "UTF-8 without BOM" and show an "xE9" placeholder in it's place regardless of encoding preference for new documents.

    Notepad++ should realize that there are no valid UTF-8 characters there. But what about when there are valid characters, such as "à" (second character is nbsp) in ISO 8859-1 being "à" in UTF-8? I don't believe that's really a problem since the [\x80-\xff] characters are usually used by themselves in ISO 8859-1, not together, whereas in UTF-8 they must be together to be valid. Perhaps encoding detection could be based on the ratio of valid UTF-8 characters to invalid?

    Side note: The "Apply to opened ANSI files" checkbox under the "UTF-8 without BOM" does nothing for files that are completely empty.
    Clarification: I quote "ANSI" and use "ISO 8859-1" instead because "ANSI" is an organization, not an encoding.

     
  • Hubert Hansen
    Hubert Hansen
    2014-05-02

    It worked previously as I stated above.
    I expect Notepad++ to open ANY file in UTF-8 when that checkbox is checked.
    Since there is a perfectly working convert in Notepad++ I do not expect this to be a problem. Especially under the circumstance that it has worked before.

     
  • daljun
    daljun
    2014-05-05

    @Chinoto Vokro

    I quote "ANSI" and use "ISO 8859-1" instead because "ANSI" is an organization, not an encoding.

    Note that ANSI is not a synonym for ISO 8859-1. In this comment I described it in more detail: https://sourceforge.net/p/notepad-plus/bugs/3263/#a813

     
  • Chinoto Vokro
    Chinoto Vokro
    2014-05-05

    I was squirming about that when I wrote it, but I felt better using any actual encoding name rather than "ANSI". Windows-1252 would have been more fitting than ISO 8859-1 based on Notepad++'s "ANSI" and Windows-1252 displaying exactly the same. Thanks for the clarification link.

     
    Attachments
  • alex357
    alex357
    1 day ago

    The problem exists not only for "UTF-8 without BOM". I need to encode all my files in Windows-1251 (I selected it in the lower combobox but have to set it manually for all new files, otherwise "ANSI" encoding is set). This is the behavior of NP++ regarding the option:
    - creating new file from NP++: takes into account
    - opening an empty file created outside of NP++: ignores
    - passing non-existing file name via command line parameter: ignores