How does NP++ (or other editors) detect the actual encoding (ANSI, UTF-8, UTF-8 with BOM, UCS-16 LE,…) of a file?
When it loads a file it is at first just a stream of bytes.
Are there a kind of encoding type bits in the first byte of a file?
Or does NP++ really read at first the full content then intrepret the the content and make a guess?
Is such a guess always correct?
The method of detection must be compatible with other editors because they should be able to read a file content written by NP++.
If the file has a BOM, NP++ detects it and knows about the encoding.
If the file is HTML or XML, the encoding is read from the first line of the file.
Otherwise, NP++ takes a guess between UCS-2LE, UCS2-BE and ANSI. You cannot make a difference between a file encoded in UTF-8 without BOM and a file in ANSI with plenty of high ASCII characters.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.