Improve performance of PositionCache::MeasureWidths() (for large file)
by ensure the size of PositionCache is always power of two, the hash then can be masked instead of do a modular.
Using a power of two for a hash table can actually lead to worse distribution with a simple hash function. Some authors recommend always using a prime size.
I would want to see benchmark evidence before changing anything here and its unlikely that performing a modulo will be particularly noticeable compared with other aspects such as the loop in PositionCacheEntry::Hash.
Its possible that there are big performance wins in PositionCache but its likely the biggest win would be to simply use a larger default size since memory is much more available than it was 10 years ago when this code was written. If someone wants to work on PositionCache performance then the first stage should be to build a test program (possibly with a mocked but expensive Surface::MeasureWidths) that just exercises this feature. Run a profiler to get a feel for where the time goes then look at the times with different cache sizes on some representative files.
The check for pces.empty() was likely to allow turning off the position cache with SCI_SETPOSITIONCACHE(0) although I don't know if that path actually works.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
That's a very unusual file with 4.5 million NUL bytes followed by a small amount of readable text. Its not a case that represents common usage so does not deserve much special treatment.
If fast performance with files like this is an important use of your application, then the line layout cache level can be increased with SCI_SETLAYOUTCACHE(SC_CACHE_PAGE). This could either be unconditional or be triggered by very wide lines. Due to changes to allow lines to overlap, SC_CACHE_CARET is no longer as effective as it was.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I think a better simplification of the case forcing code would be to unify the two portions (checking and assignment) into a separate function that returned the case forced value then call that in both situations.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
That diff exits the loop early when a mismatch found so style_byte isn't the last value when compared just after the loop.
The current code checks for membership in the set of current word characters for camel case forcing but this appears excessive for the application: displaying ASCII-letter-only keywords in a preferred way. Patch 7474 simplifies this by replacing Document::IsASCIIWordByte with new IsUpperOrLowerCase function. Document::IsASCIIWordByte is only used for this feature so is removed.
Patch 7475 hoists the case forcing code into its own function, simplifying the two loops.
Further simplifications and optimizations could be performed but its unlikely that much additional complexity would be justified by performance increases. Most changes change the line length so don't even go into this code.
Its difficult to find a case where the checking code shows up in a profile. Best example so far is to open a 1 Megabyte C++ file, set layout caching to SC_CACHE_DOCUMENT, turn on wrap (to style/relayout whole file after each change), set identifiers to case:upper, then replace all '=' with '!' to affect many lines without changing their length, then undo and redo this until there is a minute of profile data.
The result is the allSame loop taking up 0.3% of the whole profile and 5% of EditView::LayoutLine. The profile:
Last edit: Neil Hodgson 2019-04-27
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It maybe has no difference. I also found the difference (early exiting) seems only happens on current edited line and depends on the edit position.
with SC_CACHE_PAGE, I only got less than 4% of whole lineLength == ll->numCharsInLineif block where allSame=false.
Not related to this issue, in ScintillaBase.h, only the first method is implemented:
LayoutLine is a very central method and does have big performance effects in some modes but the allSame loop doesn't appear to be a significant contributor to performance so should be written as clearly as possible. At a high level, its just checking whether the characters and styles have changed since the positions were calculated. Case forcing makes it more complicated.
which works but doesn't account for case forcing. It may also move the gap which isn't so nice for a read-only operation although an iterator could be built to make this more natural. Due to the need to access the corresponding style byte for case forcing and the previous character for camel case, it doesn't even fit with a lambda:
An actual loop appears to be the clearest option. Checking allSame at loop head doesn't complicate the code that much and will likely be a tiny win.
At more than 200 lines, it may be worthwhile moving some of the LayoutLine code out into separate methods and checking whether the characters and styles are different would be a reasonable candidate. The wrapping code (or the core wrapping loop without the preparation) might also be worth its own method particularly if it becomes more complex.
in ScintillaBase.h, only the first method is implemented
These were moved to LexInterface / LexState at some point. This code will be modified much more as part of isolating lexing functionality into an optionally separate Lexilla library. I don't want to make other changes wile working on this isolation as it moves the code away from when the isolation work branched.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
with
changed to
maybe
if ((probe != 0) && (len < 30)) {
can be changed toif ((len < 30)) {
.Using a power of two for a hash table can actually lead to worse distribution with a simple hash function. Some authors recommend always using a prime size.
I would want to see benchmark evidence before changing anything here and its unlikely that performing a modulo will be particularly noticeable compared with other aspects such as the loop in PositionCacheEntry::Hash.
Its possible that there are big performance wins in PositionCache but its likely the biggest win would be to simply use a larger default size since memory is much more available than it was 10 years ago when this code was written. If someone wants to work on PositionCache performance then the first stage should be to build a test program (possibly with a mocked but expensive Surface::MeasureWidths) that just exercises this feature. Run a profiler to get a feel for where the time goes then look at the times with different cache sizes on some representative files.
The check for pces.empty() was likely to allow turning off the position cache with SCI_SETPOSITIONCACHE(0) although I don't know if that path actually works.
Screenshots using VS2017 15.9.1 CPU Usage Profiler (SC_TECHNOLOGY_DIRECTWRITE on Win10 i5-4570), with text from https://github.com/zufuliu/notepad2/issues/79.
That's a very unusual file with 4.5 million NUL bytes followed by a small amount of readable text. Its not a case that represents common usage so does not deserve much special treatment.
If fast performance with files like this is an important use of your application, then the line layout cache level can be increased with SCI_SETLAYOUTCACHE(SC_CACHE_PAGE). This could either be unconditional or be triggered by very wide lines. Due to changes to allow lines to overlap, SC_CACHE_CARET is no longer as effective as it was.
Find a redundent loop in EditView::LayoutLine() (line 428-431):
Removed with [40f158].
Related
Commit: [40f158]
Improve readability for EditView::LayoutLine() .
I think a better simplification of the case forcing code would be to unify the two portions (checking and assignment) into a separate function that returned the case forced value then call that in both situations.
That diff exits the loop early when a mismatch found so style_byte isn't the last value when compared just after the loop.
The current code checks for membership in the set of current word characters for camel case forcing but this appears excessive for the application: displaying ASCII-letter-only keywords in a preferred way. Patch 7474 simplifies this by replacing Document::IsASCIIWordByte with new IsUpperOrLowerCase function. Document::IsASCIIWordByte is only used for this feature so is removed.
Patch 7475 hoists the case forcing code into its own function, simplifying the two loops.
Further simplifications and optimizations could be performed but its unlikely that much additional complexity would be justified by performance increases. Most changes change the line length so don't even go into this code.
Committed these with minor changes and documentation as [cdb7af] and [93d24d].
Related
Commit: [93d24d]
Commit: [cdb7af]
If the loop exits early, allSame will be false, so the result
LineLayout::llInvalid
not changed.Other idea (since force casing is rarely used):
Full code is at https://github.com/zufuliu/notepad2/blob/master/scintilla/src/EditView.cxx#L401
Last edit: Zufu Liu 2019-04-26
Its difficult to find a case where the checking code shows up in a profile. Best example so far is to open a 1 Megabyte C++ file, set layout caching to SC_CACHE_DOCUMENT, turn on wrap (to style/relayout whole file after each change), set identifiers to case:upper, then replace all '=' with '!' to affect many lines without changing their length, then undo and redo this until there is a minute of profile data.
The result is the allSame loop taking up 0.3% of the whole profile and 5% of EditView::LayoutLine. The profile:
Last edit: Neil Hodgson 2019-04-27
It maybe has no difference. I also found the difference (early exiting) seems only happens on current edited line and depends on the edit position.
with SC_CACHE_PAGE, I only got less than 4% of whole
lineLength == ll->numCharsInLine
if block where allSame=false.Not related to this issue, in ScintillaBase.h, only the first method is implemented:
LayoutLine is a very central method and does have big performance effects in some modes but the allSame loop doesn't appear to be a significant contributor to performance so should be written as clearly as possible. At a high level, its just checking whether the characters and styles have changed since the positions were calculated. Case forcing makes it more complicated.
Checking the characters isn't that dissimilar to
which works but doesn't account for case forcing. It may also move the gap which isn't so nice for a read-only operation although an iterator could be built to make this more natural. Due to the need to access the corresponding style byte for case forcing and the previous character for camel case, it doesn't even fit with a lambda:
An actual loop appears to be the clearest option. Checking allSame at loop head doesn't complicate the code that much and will likely be a tiny win.
At more than 200 lines, it may be worthwhile moving some of the LayoutLine code out into separate methods and checking whether the characters and styles are different would be a reasonable candidate. The wrapping code (or the core wrapping loop without the preparation) might also be worth its own method particularly if it becomes more complex.
These were moved to LexInterface / LexState at some point. This code will be modified much more as part of isolating lexing functionality into an optionally separate Lexilla library. I don't want to make other changes wile working on this isolation as it moves the code away from when the isolation work branched.
Closing this as the power-of-two change that initiated the issue doesn't appear worth pursuing.
There were some other concerns bought up but they don't seem sufficiently important to keep this issue open.