On git master, run geany on RTL language, like Hebrew:
env LANG=he_IL.UTF-8 scite
than open some code-file.
You can see some parentheses in the wrong place in the line.
For example, this code: char buf[16] ; displaying as char buf[16;].
When you move the cursor to the ']' character the displaying change to the right way.
Another examples:
The code self->area_width look like self>-area_width.
The code step[0] >= w look like step[0] => w.
I see also crash when I'm trying to zoom in the text.
See screenshot and gdb output (in file and in https://pastebin.com/YzVgx4MZ).
Tested on SciTE Version 5.1.6 Scintilla:5.1.5 Lexilla:5.1.4, Compiled for GTK:3.24.31, ArchLinux.
I saw the same bug in scite 5.1.5 from archlinux repo.
I see this in the first time when built geany from git master, in geany stable version (1.38) this bug dosn't happen ( https://github.com/geany/geany/issues/3101 ).
Scintilla does not support RTL languages on GTK.
https://www.scintilla.org/ScintillaDoc.html#SCI_SETBIDIRECTIONAL
Scintilla and SciTE have their own settings (
code.page,character.set) for encoding separate from LANG to allow an instance to display text in multiple encodings. These default to 8-bit latin encoding but the LANG override here tells GTK to assume UTF-8.There is a
rtlCheckfallback for RTL in the 8-bit (code.page=0) case but that requirescharacter.setto be 177 for Hebrew or 178 for Arabic. With defaults andLANG=he_IL.UTF-8iteration over neutral characters goes backwards and the code is written for forwards iteration. It could detect the backwards case and either do the right thing or fail more pleasantly without crashing.Set SciTE to UTF-8 and some progress can be made. In SciTEUser.properties:
This avoids the crash for me but leaves the parentheses wrongly ordered/positioned. In RTL languages, neutral characters adapt to the text around them but, in Scintilla, code is lexed into different styles. Each style run is drawn independently without context so the ";]" (operator style) isn't surrounded by latin text so may be assumed RTL and drawn in the wrong order.
Diff:
Tracing some punctuation
/**:The float numbers are the x position and width of the cluster. So its iterating in visual order which is inverted from memory order.
Code changed to detect the situation where the iteration starts with a non-0 index and quickly exits after evenly dividing the total layout width over the characters (bytes actually). This makes that segment take the correct width (at least in tested cases) so layout to other segments will be OK and crashes are avoided. It means that text may move around as parts are selected so are part of separate segments and lose/gain neutrality.
Change committed as [0759b3].
Related
Commit: [0759b3]