Menu

#2309 Parentheses and arrows character displaying wrongly in RTL AND crash

Bug
open-accepted
nobody
5
2022-02-25
2022-01-16
No

On git master, run geany on RTL language, like Hebrew:

env LANG=he_IL.UTF-8 scite

than open some code-file.

You can see some parentheses in the wrong place in the line.

For example, this code: char buf[16] ; displaying as char buf[16;].
When you move the cursor to the ']' character the displaying change to the right way.

Another examples:
The code self->area_width look like self>-area_width.
The code step[0] >= w look like step[0] => w.

I see also crash when I'm trying to zoom in the text.

See screenshot and gdb output (in file and in https://pastebin.com/YzVgx4MZ).

Tested on SciTE Version 5.1.6 Scintilla:5.1.5 Lexilla:5.1.4, Compiled for GTK:3.24.31, ArchLinux.

I saw the same bug in scite 5.1.5 from archlinux repo.

I see this in the first time when built geany from git master, in geany stable version (1.38) this bug dosn't happen ( https://github.com/geany/geany/issues/3101 ).

1 Attachments

Discussion

  • Neil Hodgson

    Neil Hodgson - 2022-01-16
    • labels: RTL --> RTL, scintilla
    • status: open --> open-accepted
    • Priority: 7 --> 5
     
  • Neil Hodgson

    Neil Hodgson - 2022-01-16

    Scintilla does not support RTL languages on GTK.
    https://www.scintilla.org/ScintillaDoc.html#SCI_SETBIDIRECTIONAL

    Scintilla and SciTE have their own settings (code.page, character.set) for encoding separate from LANG to allow an instance to display text in multiple encodings. These default to 8-bit latin encoding but the LANG override here tells GTK to assume UTF-8.

    There is a rtlCheck fallback for RTL in the 8-bit (code.page=0) case but that requires character.set to be 177 for Hebrew or 178 for Arabic. With defaults and LANG=he_IL.UTF-8 iteration over neutral characters goes backwards and the code is written for forwards iteration. It could detect the backwards case and either do the right thing or fail more pleasantly without crashing.

    Set SciTE to UTF-8 and some progress can be made. In SciTEUser.properties:

    code.page=65001
    

    This avoids the crash for me but leaves the parentheses wrongly ordered/positioned. In RTL languages, neutral characters adapt to the text around them but, in Scintilla, code is lexed into different styles. Each style run is drawn independently without context so the ";]" (operator style) isn't surrounded by latin text so may be assumed RTL and drawn in the wrong order.

     
  • Neil Hodgson

    Neil Hodgson - 2022-01-16
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -21,4 +21,4 @@
    
     I saw the same bug in scite 5.1.5 from archlinux repo.
    
    -I see this in the first time when built geany from git master, in geany stable version (1.38) this bug dosn't happen (https://github.com/geany/geany/issues/3101).
    +I see this in the first time when built geany from git master, in geany stable version (1.38) this bug dosn't happen ( https://github.com/geany/geany/issues/3101 ).
    
     
  • Neil Hodgson

    Neil Hodgson - 2022-02-25

    Tracing some punctuation /**:

    ClusterIterator 4 [/** ]
    ClusterIterator{ 4..3 0 7.33301
    ClusterIterator+ 3..2 7.33301 7.33301
    ClusterIterator+ 2..1 14.666 7.33301
    ClusterIterator+ 1..0 21.999 7.33301
    ClusterIterator} 0..4
    

    The float numbers are the x position and width of the cluster. So its iterating in visual order which is inverted from memory order.

    Code changed to detect the situation where the iteration starts with a non-0 index and quickly exits after evenly dividing the total layout width over the characters (bytes actually). This makes that segment take the correct width (at least in tested cases) so layout to other segments will be OK and crashes are avoided. It means that text may move around as parts are selected so are part of separate segments and lose/gain neutrality.

    Change committed as [0759b3].

     

    Related

    Commit: [0759b3]


Log in to post a comment.

MongoDB Logo MongoDB