Menu

#1710 Scite 3.5.4 : read file of utf16 of 4 four bytes can cause troubles

Bug
closed-fixed
5
2015-05-26
2015-04-16
Olivier
No

In Scite 3.5.4, if a UTF16-LE file contains a character coded on 4 bytes started at position 131070 (the 2 wchar are in middle of file read buffer), it causes buffer overrun and wrong display of char.
See in attached file uni2.txt, the last character in file which is a valid unicode U+1d11e, is wrongly displayed.

1 Attachments

Discussion

  • Neil Hodgson

    Neil Hodgson - 2015-04-16
    • labels: --> scite, unicode
    • status: open --> open-accepted
    • assigned_to: Neil Hodgson
     
  • Neil Hodgson

    Neil Hodgson - 2015-04-21

    While this should be fixed properly, a quicker part-fix would be to ensure that both the surrogates making up this character 0x1D11E: (D834, DD1E) are encoded as UTF-8 inside Scintilla as (ED, A0, B4, ED, B4, 9E) so that the file will at least save out the same as it was read in. Currently the character becomes bytes (9D, 87, BD, ED, B4, 9E) and is saved out with further mangling as UTF-16 (9D, 87, BD, DD1E).

    Its unlikely I will look at this again for several weeks.

     
  • Neil Hodgson

    Neil Hodgson - 2015-04-21
    • status: open-accepted --> open-later
     
  • Neil Hodgson

    Neil Hodgson - 2015-05-15

    Partial fix that avoids file corruption committed as [9403f1].

     

    Related

    Commit: [9403f1]

  • Neil Hodgson

    Neil Hodgson - 2015-05-16

    Should be fixed with [0a9464]. Its a complex change so please check.

     

    Related

    Commit: [0a9464]

  • Neil Hodgson

    Neil Hodgson - 2015-05-16
    • status: open-later --> open-fixed
     
  • Olivier

    Olivier - 2015-05-25

    Tested preview 3.5.6 without any problems
    Thanks

     
  • Neil Hodgson

    Neil Hodgson - 2015-05-26
    • status: open-fixed --> closed-fixed
     

Log in to post a comment.