In Scite 3.5.4, if a UTF16-LE file contains a character coded on 4 bytes started at position 131070 (the 2 wchar are in middle of file read buffer), it causes buffer overrun and wrong display of char.
See in attached file uni2.txt, the last character in file which is a valid unicode U+1d11e, is wrongly displayed.
While this should be fixed properly, a quicker part-fix would be to ensure that both the surrogates making up this character 0x1D11E: (D834, DD1E) are encoded as UTF-8 inside Scintilla as (ED, A0, B4, ED, B4, 9E) so that the file will at least save out the same as it was read in. Currently the character becomes bytes (9D, 87, BD, ED, B4, 9E) and is saved out with further mangling as UTF-16 (9D, 87, BD, DD1E).
Its unlikely I will look at this again for several weeks.
Partial fix that avoids file corruption committed as [9403f1].
Related
Commit: [9403f1]
Should be fixed with [0a9464]. Its a complex change so please check.
Related
Commit: [0a9464]
Tested preview 3.5.6 without any problems
Thanks