Menu

#2094 GTK/a11y: drastically improve performances

Bug
closed-fixed
nobody
5
2019-04-16
2019-04-05
No

https://github.com/geany/geany/issues/2092 showed that while during common editing the a11y code is nto getting in the way, when performing huge amount of search & replace can lead to critical slowdowns. This is mostly due do byte/character position computation, which happens on each editing operation in order to notify the OS a11y layer.

Attached patch improves the situation a lot by making use of the new built-in character cache that is noticeably faster than the custom code that was used eralier (even after optimizing the custom code to avoid clearing computed positions and such, it still was about 3× slower in some situations).
Another important improvement is making use of the position cache also when mapping a byte offset to a UTF32 offset, instead of only making use of GetRelativePosition(). BTW, this might be an interesting optimization to add to GetRelativePosition() when the UTF32 position cache is enabled and the given range is fairly large. Not sure how much sense it makes, it probably depends on the typical use case callers have for it to know whether the extra checks are worth it for the gain it gives when the range spans multiple lines.

Anyway, in some extreme situations like mentioned in the above-linked report, this can lead to speedups so that something that took several hours before now only takes about a second or so.

1 Attachments

Discussion

  • Neil Hodgson

    Neil Hodgson - 2019-04-05

    Committed as [01aab5].

    There are many different techniques for optimizing GetRelativePosition use. Subsequent calls are often very close (or identical) to previous calls and a single cached value (with cache invalidation or fixing after modification) may be a big help.

    Another possibility is that when there are no multi-byte characters on a line, there is a 1:1 character:byte offset so simpler calculation. This could be optimized further by returning the index line start and index line start of the next line in one call, avoiding the log(n) access to the next line index. Maintaining a flag whether there are any multi-byte characters in the file may be a large win for many cases.

    Its difficult to estimate which particular optimizations are going to be a win. Providing raw access allows client code to try different strategies. It may be better to provide a slightly cooked API GetRelativePositionUsingLineIndex that does something reasonable while still retaining GetRelativePosition as raw access.

    It'd likely be worth optimizing the UTF-8 forward moving case in GetRelativePosition to avoid some of the NextPosition overhead.

     

    Related

    Commit: [01aab5]

    • Neil Hodgson

      Neil Hodgson - 2019-04-06

      by returning the index line start and index line start of the next line in one call, avoiding the log(n) access

      Actually, that one doesn't make sense but I'm sure there are ways to use linear searches with better memory locality in some situations instead of binary searches.

       
  • Neil Hodgson

    Neil Hodgson - 2019-04-05
    • labels: gtk, a11y --> gtk, a11y, scintilla
    • status: open --> open-fixed
     
  • Neil Hodgson

    Neil Hodgson - 2019-04-06

    There are problems here with encodings other than UTF-8 as the indices are not allocated unless the encoding is UTF-8. This leads to crashes when IndexLineStart called. Uncertain which layer to add checks for this. Could be in ScintillaGTKAccessible to remove all logic with test

    if (sci->pdoc->LineCharacterIndex() & SC_LINECHARACTERINDEX_UTF32)
    

    or could be in CellBuffer/Document.

    There's also an issue with SciTE not calling SCI_SETACCESSIBILITY when the user switches to a UTF-8 encoding.

     
  • Colomban Wendling

    There are problems here with encodings other than UTF-8 as the indices are not allocated unless the encoding is UTF-8. […]

    Oops, I didn't think about this (I designed this in Geany that always use UTF-8). I can re-introduce the custom cache implementation when the encoding is different (or rather, when the cache is missing) so it still works. It adds complexity and is slower, but with the optimizations I made before coming to the patch state it's already reasonable.
    I'm just wondering how I could minimize ovehead for this checks, but maybe it's not so relevant. I'll have to also catch encoding changes or such, anything that can alter the caches.

    Do you think it's better to re-introduce the previous custom caching for non-UTF-8 documents, or disable a11y for those? I don't really like dsiabling without a good reason, but if position cache for non-UTF-8 is planed "soon" it might be OK in the meantime and not worth the effort -- which however can be not so big given that I already have much of the optimization work done.

     
    • Neil Hodgson

      Neil Hodgson - 2019-04-06

      There are 3 cases here: UTF-8, DBCS, and SBCS. Single byte characters sets like Latin1 / ISO8859-1 are quite common for source code and they are very simple to implement: byte position == character position. DBCS is more complex as it is irregular with different ranges used for lead bytes in Japanese, Korean and both Chinese encodings.

      In the long term, core Scintilla should support DBCS with SC_LINECHARACTERINDEX_UTF32 (which is equivalent to UTF16 for DBCS) but that is a fairly low priority.

      For now, I think it is good enough to fallback to 1:1 if Document::LineCharacterIndex() indicates the UTF32 index is not available. Then I note this in the documentation. If the feature is important enough then someone may step forward to implement DBCS support.

       
      • Colomban Wendling

        For now, I think it is good enough to fallback to 1:1 if Document::LineCharacterIndex() indicates the UTF32 index is not available.

        Okay, fair enough and fine by me. Do you think it would make sense for the position cache to return 1:1 if it's not available, or should that be implemented on the consumer side?

         
        • Neil Hodgson

          Neil Hodgson - 2019-04-07

          I think there is more room for optimisation if the check is made in the client code. Attached is a quick fix which may be incomplete but prevented crashes in a brief test.

           
  • Neil Hodgson

    Neil Hodgson - 2019-04-11

    Committed the above index checking fix as [2c8b52].

     

    Related

    Commit: [2c8b52]

  • Colomban Wendling

    Sorry for the delay, handn't time earlier to look at it. I dind't test the change yet, but it looks fine to me.

     
  • Neil Hodgson

    Neil Hodgson - 2019-04-16
    • status: open-fixed --> closed-fixed
     

Log in to post a comment.