std::clamp() inside ActionDuration::AddSample() hided the fact long lines used much long time than short lines, and always returns maxDuration (1e-4) when it happens.
In computing linesInAllowedTime for ws == WrapScope::wsIdle inside Editor::WrapLines(), it's wrongly assumed that the approximately 100 lines will only takes 0.01 second.
When wrapping for consecutive long lines (e.g. MySQL dump with extended insert, see https://stackoverflow.com/questions/1293529/how-to-deal-with-enormous-line-lengths-created-by-mysqldump), as linesInAllowedTime is always larger than expected, application will become unresponsive until idle wrapping finished, or idle wrapping entered short lines block.
I added a draft changes at https://github.com/zufuliu/notepad2/commit/6d81c09cca199b5eb10b7e7b5b2fdc97de711892
linesInAllowedTime is changed to minimum of 8 lines (if this still takes long time than expected, unresponsive is not fixed).
std::clamp() in ActionDuration::AddSample() is changed to std::max():
for short lines, they behavior the same.
for long lines, using std::max() seems better reflect the fact, when idle wrapping still on the consecutive long/short lines block, duration is kept at same level; then decreased/increased by several runs when idle wrapping entered short/long lines block.
Since all the std::chrono clocks are wall clocks, not CPU time (or effort) clocks, the clamp is there to avoid being perturbed by transient system overloads.
If just max is used, how rapidly does it converge back to a reasonable value after a large exogenous spike?
Tweaking the settings here are likely to only relieve a minority of cases so the particular cases need measuring. How long are the problematic lines and how much time do they take on particular machines?
The LinesOnScreen term is there to avoid breaking up the visible range into multiple wraps plus (expensive) repaints which could look worse. LinesInAllowedTime removes choice of policy from code concerned with a particular feature and the constraints on that feature.
Its unlikely any changes can be incorporated before Scintilla 5.0.
with following changes in ActionDuration::AddSample():
attachment are results on an i5 for enwiki-20200801-redirect.sql (557 lines, 491 MiB) and enwiki-20200801-flaggedrevs.sql (280 lines, 210 MiB) from https://dumps.wikimedia.org/enwiki/20200801/
The change with std::max() still not fixed the case when first page (from ws == WrapScope::wsVisible) contains only short lines but immediately followed by long lines (e.g. enwiki-20200801-flaggedrevs.sql), which is more common (at least for MySQL dumps, where file starts with short comments and DDL statements).
I added a new method to measure the time for ActionDuration see https://github.com/zufuliu/notepad2/commit/f7e9ef12702c767e739077ea33038acbe0b0934c
Instead of measuring the time for one line, measuring the time for 1024 bytes is more reliable and works better. finding lines need wrapping is then changed to finding lines contains allowed bytes in the time slot. for long lines, wrapping is done line by line, for short lines, it's block by block. comparing codes in
#if ActionDuration_MeasureTimeByBytes ... #endifblocks, change to measure time for 1024 bytes is small.I don't really want to spend much time on this but I'm seeing some different numbers (maybe 40%) to you on enwiki-20200801-flaggedrevs.sql and an almost-5-year-old i7 won't be that much faster than an i5 since this is running on a single core. Perhaps you are running in debug mode or syntax highlighting (which makes PositionCache work better) is turned off.
Per-byte statistics may be a better approach and I'll have a look after Scintilla 5.
This issue is looking at batch size from the context of files with huge lines but the opposite problem is where the adaptive rate came from. Short batches made files with huge numbers of reasonable length lines slow as there was a lot of per-batch overhead from redisplay, scroll bar updates, and application notifications (and applications processing those notifications).
Aspects of timing that should be further explored include overhead and accuracy.
A different technique for maintaining smoothness is to give short operations a time budget (likely between 5 and 50 milliseconds) then stopping once that budget has been exhausted. A clock read is likely a system call so is not something that should be done too often - with common scenarios with short lines and good caching once per line is probably too often. There may have to be a limit to clock reading with some (dynamically calculated?) minimum amount of processing (bytes wrapped or styled) between reads.
MinGW was particularly imprecise with IIRC the 60Hz system tick clock being used for high_resolution_clock. That may have been fixed but it made the minimum period (17 ms) close to the desired short operation duration. It can lead to measurement producing many zero values and makes stopping after a budget not work well. If all platforms had a precision better than 5 milliseconds then the budget technique may be OK. Without precision, indirect budgets like the current scheme can limit operations more effectively.
Since the benefits of different designs depends on overhead and accuracy, measurements of these are needed for a wide variety of targets. While simple slow machines may seem to be the worst case, its possible that more complex systems may actually be more of a problem. Perhaps a NUMA machine (I don't have access to one) requires complex synchronization between CPUs when reading a clock or requires waking up other processes or even hardware. Or a virtualized machine only receives a subset of host clock ticks as an optimization.
high_resolution_clock in MinGW is not fixed (checked 10.2), it's system_clock.
I think we should use https://en.cppreference.com/w/cpp/chrono/steady_clock instead.
https://github.com/microsoft/STL/blob/master/stl/inc/chrono#L577
https://github.com/microsoft/STL/blob/master/stl/inc/chrono#L621
https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/src/c%2B%2B11/chrono.cc
https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/std/chrono#L1163
[1] in libc++, high_resolution_clock is system_clock or steady_clock depends on
_LIBCPP_HAS_NO_MONOTONIC_CLOCK.https://github.com/llvm/llvm-project/blob/master/libcxx/include/chrono#L1585
https://github.com/llvm/llvm-project/blob/master/libcxx/src/chrono.cpp
Edit: MSVC and libc++ on Windows use GetSystemTimePreciseAsFileTime for system_clock when it's available. mingw-w64 gettimeofday (and
clock_gettime(CLOCK_REALTIME)) always use GetSystemTimeAsFileTime,clock_gettime(CLOCK_MONOTONIC)is same as QueryPerformanceCounter.https://github.com/mirror/mingw-w64/blob/master/mingw-w64-crt/misc/gettimeofday.c
https://github.com/mirror/mingw-w64/blob/master/mingw-w64-libraries/winpthreads/src/clock.c
Last edit: Zufu Liu 2020-09-14
Originally, ElapsedTime (the per-platform implementation before std::chrono) was for debugging performance issues and didn't affect any user functionality. When the number of lines handled was bounded, it was reused because it was convenient. The two jobs are really different. Bounding short operations only needs around 5 millisecond accuracy as it is trying to maximize user perceived smoothness and users are unlikely to perceive operations shorter than 50 milliseconds in most cases (except twitch games). When experimenting with different approaches in code, measuring processing a single line or lexeme may need better accuracy like QueryPerformanceCounter.
Its unlikely _LIBCPP_HAS_NO_MONOTONIC_CLOCK is worth worrying about. Its tied to no threading support which may mean it is for cases like drivers, not userland code.
http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20141201/119573.html
Even if steady_clock is unavailable in some situations that may occur for building Scintilla, it should be directly checked for instead of using a barely documented library implementation symbol.
Changed ElapsedPeriod to use steady_clock when available.
See discussion on other project https://github.com/cocos2d/cocos2d-x/issues/16046
By testing enwiki-20200801-langlinks.sql (1491 lines, 1.37 GiB) line by line, I found Arabic (e.g. line 287 - 288), Hebrew (e.g. line 633 - 645) and Hindi (e.g. line 646 - 651) takes much more time than Latin.
Per-byte statistics looks better, as complex scripts takes more bytes to encode and more time to decode.
log added into Editor::WrapLines().
script to sort log lines by time
Changed _LIBCPP_HAS_NO_MONOTONIC_CLOCK to NO_CXX11_STEADY_CLOCK.
Added a incomplete table for "Preprocessors affect building" in ScintillaDoc.html.
Simplified the loop inside LineLayout::SetLineStart() since newLineStarts is already zeroed.
Changed to steady clock with [3932d3] , updated documentation [8feaac] , and changed loop to std::copy [673cd8] .
Related
Commit: [3932d3]
Commit: [673cd8]
Commit: [8feaac]
Committed with per-byte measurement and estimation as [8b128a] and [89dd67] .
The behaviour with this change shows that there is significant differences in rates of progress per-byte depending on factors like whether the styled text contains many identical lexemes that are cached well by PositionCache or have most layout data cached in LineLayoutCache. This makes it more important that the current rate should remain clamped so it doesn't got too extreme when hitting an easy or difficult patch.
Related
Commit: [89dd67]
Commit: [8b128a]
How about change Document::StyleToAdjustingLineDuration() to following:
EnsureStyledTo() might be no-op.
What scenarios does this affect? Reentrance (enteredStyling != 0) is rare so unlikely to play an important role.