Menu

#2116 Bugs on processing isolated invalid bytes

Bug
closed-fixed
4
2019-10-24
2019-06-24
Zufu Liu
No

See the screenshot, put caret before \xE5, then press Delete, three bytes been deleted; put caret after \xE5, then press Backspace, only \xE5 been deleted.
Also, double-click on \xE5, the next two characters been selected too.

2 Attachments

Discussion

  • Neil Hodgson

    Neil Hodgson - 2019-06-24
    • labels: --> scintilla, utf-8, invalid
    • Priority: 5 --> 4
     
  • Neil Hodgson

    Neil Hodgson - 2019-06-24

    While it would be better to treat invalid bytes more consistently, this doesn't seem too important. The user can fix the invalid area in the text although it might be a little more work.

     
  • Zufu Liu

    Zufu Liu - 2019-06-26

    The bug is in Document::LenChar(), it cause caret bug, and maybe others.

     
  • Zufu Liu

    Zufu Liu - 2019-06-26

    Copy some code from Document::CharacterAfter() to Document::LenChar() would fix deleting and caret (not block caret, text been drawn U+00E5 LATIN SMALL LETTER A WITH RING ABOVE).

    Double-click selecting bug is because 0xFFFD been treated as punctuation.

     

    Last edit: Zufu Liu 2019-06-26
  • Zufu Liu

    Zufu Liu - 2019-06-26

    Read the old code again, don't understand why it returns 1 for pos < 0?

    It it returns 1 for pos < 0, it's likely the added line for pos >= Length() will break something?

        if (pos < 0) {
            return 1;
        } else if (pos >= Length()) {
            return 0;
    

    Changed to return 1 for invalid pos.

     

    Last edit: Zufu Liu 2019-06-27
  • Neil Hodgson

    Neil Hodgson - 2019-06-27

    LenChar is from the original Unicode support change set which expected valid UTF-8. Returning 1 for out-of-bounds instead of 0 will be defending against hanging with a loop that goes (or starts) out of bounds.

     
  • Zufu Liu

    Zufu Liu - 2019-06-27

    Added the explanation in the comment.

     
  • Zufu Liu

    Zufu Liu - 2019-06-28

    \xE5 been drawn as U+00E5 LATIN SMALL LETTER A WITH RING ABOVE is because DrawTextClipped at the end of DrawBlockCaret().

    When using block caret, in UTF-8 mode, all invalid single byte will been drawn as Latin-1.
    Maybe some invalid bytes will be passed to underlying platforms in DBCS/SBCS encodings, which may cause some failures as that happened before on Cocoa.

     

    Last edit: Zufu Liu 2019-06-28
  • Zufu Liu

    Zufu Liu - 2019-06-28

    Possible after the call (in EditView::DrawCarets)

    const int widthChar = model.pdoc->LenChar(posCaret.Position());
    widthOverstrikeCaret = ll->positions[offset + widthChar] - ll->positions[offset];
    

    We could know (e.g. add a optional bool pointer parameter to LenChar) that the byte at caret position is invalid, so don't draw block caret, instead draw line or bar caret depends on inOverstrike to preserve hex representation for the invalid byte.

     
    • Neil Hodgson

      Neil Hodgson - 2019-06-28

      This should be approached directly with a check inside DrawBlockCaret instead of a change that avoids the problem far from its occurrence.

      LenChar doesn't know about representations and adding a partial check there is just going to lead to more issues on other cases.

       
      • Zufu Liu

        Zufu Liu - 2019-06-29

        Do you means check the text, to see whether is has representation?

        const std::string_view text(&ll->chars[offsetFirstChar], numCharsToDraw);
        
         
        • Neil Hodgson

          Neil Hodgson - 2019-06-29

          The problem is that DrawBlockCaret is not implementing the same text drawing logic as DrawForeground.

           
  • Zufu Liu

    Zufu Liu - 2019-06-28

    Patch for fix the block caret.

    I think the whole block for widthOverstrikeCaret calculation can be avoid for line caret.

     
  • Neil Hodgson

    Neil Hodgson - 2019-07-01
     
  • Neil Hodgson

    Neil Hodgson - 2019-07-01

    Committed lenChar-3.cxx as [1031c1].

     

    Related

    Commit: [1031c1]

  • Neil Hodgson

    Neil Hodgson - 2019-10-24
    • status: open --> closed-fixed
     
  • Neil Hodgson

    Neil Hodgson - 2019-10-24

    Committed lenChar-3.cxx as [1031c1].

     

    Related

    Commit: [1031c1]


Log in to post a comment.

MongoDB Logo MongoDB