If a UTF-8 (no BOM) file contains a certain character, it will always be detected as 8 bit ASCII, as the code is marked as invalid. Thai baht (U+0E3F) is one example, but I think anything from U+0E00 to U+0FFF would trigger it.
This patch fixes the valid UTF-8 tests.
For 3 byte codes, the ONLY thing we can say about the first byte, is that [byte] & 0xE0 == 0xE0, anding with 0x0F is completely dependant on the code point, and fails for codes that fall into a range 0E00-0FFF (I think). Thai baht (U+0E3F) is a good example. See https://sourceforge.net/p/notepad-plus/discussion/331754/thread/0fac5f5e/?limit=25
This patch also corrects the two byte check to check for byte & 0xC0 == 0xC0 - the previous test worked, but only due to all characters lying in the requisite range where byte & 0x1F would always be 0. The new test (byte & 0xC0 == 0xC0) is more "correct", and says exactly what we should be checking for.
The patch contains a patch that can be applied using the standard patch.exe, and the complete Utf8_16.cpp file, which can simply be replaced if nothing else has changed.