Scintilla / Bugs / #2094 GTK/a11y: drastically improve performances

Neil Hodgson - 2019-04-05

Committed as [01aab5].

There are many different techniques for optimizing GetRelativePosition use. Subsequent calls are often very close (or identical) to previous calls and a single cached value (with cache invalidation or fixing after modification) may be a big help.

Another possibility is that when there are no multi-byte characters on a line, there is a 1:1 character:byte offset so simpler calculation. This could be optimized further by returning the index line start and index line start of the next line in one call, avoiding the log(n) access to the next line index. Maintaining a flag whether there are any multi-byte characters in the file may be a large win for many cases.

Its difficult to estimate which particular optimizations are going to be a win. Providing raw access allows client code to try different strategies. It may be better to provide a slightly cooked API GetRelativePositionUsingLineIndex that does something reasonable while still retaining GetRelativePosition as raw access.

It'd likely be worth optimizing the UTF-8 forward moving case in GetRelativePosition to avoid some of the NextPosition overhead.

Related

Commit: [01aab5]

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Neil Hodgson - 2019-04-06
  
  by returning the index line start and index line start of the next line in one call, avoiding the log(n) access
  
  Actually, that one doesn't make sense but I'm sure there are ways to use linear searches with better memory locality in some situations instead of binary searches.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2019-04-05

labels: gtk, a11y --> gtk, a11y, scintilla

status: open --> open-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2019-04-06

There are problems here with encodings other than UTF-8 as the indices are not allocated unless the encoding is UTF-8. This leads to crashes when IndexLineStart called. Uncertain which layer to add checks for this. Could be in ScintillaGTKAccessible to remove all logic with test

if (sci->pdoc->LineCharacterIndex() & SC_LINECHARACTERINDEX_UTF32)

or could be in CellBuffer/Document.

There's also an issue with SciTE not calling SCI_SETACCESSIBILITY when the user switches to a UTF-8 encoding.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Colomban Wendling - 2019-04-06

There are problems here with encodings other than UTF-8 as the indices are not allocated unless the encoding is UTF-8. […]

Oops, I didn't think about this (I designed this in Geany that always use UTF-8). I can re-introduce the custom cache implementation when the encoding is different (or rather, when the cache is missing) so it still works. It adds complexity and is slower, but with the optimizations I made before coming to the patch state it's already reasonable.
I'm just wondering how I could minimize ovehead for this checks, but maybe it's not so relevant. I'll have to also catch encoding changes or such, anything that can alter the caches.

Do you think it's better to re-introduce the previous custom caching for non-UTF-8 documents, or disable a11y for those? I don't really like dsiabling without a good reason, but if position cache for non-UTF-8 is planed "soon" it might be OK in the meantime and not worth the effort -- which however can be not so big given that I already have much of the optimization work done.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Neil Hodgson - 2019-04-06
  
  There are 3 cases here: UTF-8, DBCS, and SBCS. Single byte characters sets like Latin1 / ISO8859-1 are quite common for source code and they are very simple to implement: byte position == character position. DBCS is more complex as it is irregular with different ranges used for lead bytes in Japanese, Korean and both Chinese encodings.
  
  In the long term, core Scintilla should support DBCS with SC_LINECHARACTERINDEX_UTF32 (which is equivalent to UTF16 for DBCS) but that is a fairly low priority.
  
  For now, I think it is good enough to fallback to 1:1 if Document::LineCharacterIndex() indicates the UTF32 index is not available. Then I note this in the documentation. If the feature is important enough then someone may step forward to implement DBCS support.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Colomban Wendling - 2019-04-07
    
    For now, I think it is good enough to fallback to 1:1 if Document::LineCharacterIndex() indicates the UTF32 index is not available.
    
    Okay, fair enough and fine by me. Do you think it would make sense for the position cache to return 1:1 if it's not available, or should that be implemented on the consumer side?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Neil Hodgson - 2019-04-07
      
      I think there is more room for optimisation if the check is made in the client code. Attached is a quick fix which may be incomplete but prevented crashes in a brief test.
      
      CheckIndex.patch
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2019-04-11

Committed the above index checking fix as [2c8b52].

Related

Commit: [2c8b52]

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Colomban Wendling - 2019-04-12

Sorry for the delay, handn't time earlier to look at it. I dind't test the change yet, but it looks fine to me.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2019-04-16

status: open-fixed --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

GTK/a11y: drastically improve performances

Group

Searches

Help

#2094 GTK/a11y: drastically improve performances

Discussion

Related

Related