A file with a .php extension and Thai characters will not open as "UTF-8 without BOM". Here are some examples, and all work except a .php file with Thai characters. Note that a .html file with Thai opens correctly.
File with .html extension and Spanish characters: opens correctly as "UTF-8 without BOM"
File with .html extension and Thai characters: opens correctly as "UTF-8 without BOM"
File with .php extension and Spanish characters: opens correctly as "UTF-8 without BOM"
File with .php extension and Thai characters: OPENS AS ANSI AND TRASHES THE THAI CHARACTERS
Even if I enable the "UTF-8 without BOM" encoding setting and check the "Apply to opened ANSI files", it still opens as ANSI.
Sample files can be downloaded from http://ic.payap.ac.th/SampleFiles.zip.
Can someone help with a workaround other than manually selecting the "Encode in UTF-8 without BOM" menu option every time? Should this be added as a bug?
Any help is greatly appreciated.
Your failing file fools Notepad++ by indicating UTF-8 charset and not having a BOM. Since the file is thus inconsistent, N++ opens it as ANSI.
Removing the charset: meta attribute, file opens in UTF-8.
Thanks for your response. Unfortunately the file still opens as ANSI for me. I've tried removing the whole meta Content-Type line and tried removing just the charset=UTF-8 portion, but in all cases it still opens as ANSI for me.
If the meta tag line is the problem then why does it work if the file has a .html extension and why does it work with a .php file that has Spanish characters?
I guess I just don't understand why it seems that file extension matters to N++ when deciding if the file should open as ANSI or UTF-8 without BOM. Why doesn't it open the .php file the same as the .html file?
I don't think I would have to deal with this issue if the web server was running PHP5, but it is running PHP4 and I don't have any control over that so I'm stuck with having to save the files as UTF-8 without BOM.
Thanks again for your time.
What is the locale on your OS? And your OS, for that matter? On XP with French locale, I get the file to open as UTF-8 when I remove the charset: part.
I'm using WinXP with U.S. locale.
When you say yours opened as UTF-8, do you mean "UTF-8" or "UTF-8 without BOM". If you look in the Encoding menu, which one of those options is selected? I need mine to be "UTF-8 without BOM" since I cannot save the BOM character or the website doesn't display properly when using PHP4 (which I have to use for now).
Thanks for the help, but I've decided to try and request that the server be upgraded to PHP5. I think that will be better in the long run. Hopefully this won't be an issue then because I think I'll be able to save as UTF-8 with BOM.
With the charset attribute, file opens as ANSI.
Without the chareset directive, it opens as UTF-8 (with BOM).
I can confirm having this issue as well. I have a file I have been using in a project for some time, and suddenly it stopped opening in utf8. Nothing I tried was allowing me to tell np++ to open it as utf8 (without BOM) and eventually I discovered it was due to the addition of thai characters.
it really seems like a bug to me since it works without these characters.
I cannot believe how old this post is and this bug still exists.
I just wasted hours today not to mention the time I wasted previously all due to Thai characters ("฿" in this case) completely breaks the encoding. Any file with that character (and I would assume other Thai characters after reading above) will open as ANSI regardless of settings or previous encoding conversion/settings.
You're right. That's ridiculous. The UTF-8 code detection code had a bug, where for a certain range of characters (where U+0E3F, your Thai Baht symbol is one of them), it would incorrectly decide it was an invalid UTF-8 encoding, and therefore must be 8 bit ASCII (i.e. "ANSI").
I've created a patch on the patch tracker to fix this. https://sourceforge.net/p/notepad-plus/patches/513/
My build now works correctly with this patch for your scenario.
Hopefully this will be integrated by Don soon.