Scintilla / Feature Requests / #1373 Improve ActionDuration for idle wrapping for consecutive long lines

Zufu Liu - 2020-08-29

I added a draft changes at https://github.com/zufuliu/notepad2/commit/6d81c09cca199b5eb10b7e7b5b2fdc97de711892
linesInAllowedTime is changed to minimum of 8 lines (if this still takes long time than expected, unresponsive is not fixed).
std::clamp() in ActionDuration::AddSample() is changed to std::max():
for short lines, they behavior the same.
for long lines, using std::max() seems better reflect the fact, when idle wrapping still on the consecutive long/short lines block, duration is kept at same level; then decreased/increased by several runs when idle wrapping entered short/long lines block.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2020-08-31

Since all the std::chrono clocks are wall clocks, not CPU time (or effort) clocks, the clamp is there to avoid being perturbed by transient system overloads.

If just max is used, how rapidly does it converge back to a reasonable value after a large exogenous spike?

Tweaking the settings here are likely to only relieve a minority of cases so the particular cases need measuring. How long are the problematic lines and how much time do they take on particular machines?

The LinesOnScreen term is there to avoid breaking up the visible range into multiple wraps plus (expensive) repaints which could look worse. LinesInAllowedTime removes choice of policy from code concerned with a particular feature and the constraints on that feature.

Its unlikely any changes can be incorporated before Scintilla 5.0.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zufu Liu - 2020-09-02

with following changes in ActionDuration::AddSample():

const double duration_ = alpha * durationOne + (1.0 - alpha) * duration; printf("%s actions=%.9f / %zd, one=%.9f, value=%.9f, [%.9f, %f, %f]\n", __func__, durationOfActions, numberActions, durationOne, duration_, duration, minDuration, maxDuration);

attachment are results on an i5 for enwiki-20200801-redirect.sql (557 lines, 491 MiB) and enwiki-20200801-flaggedrevs.sql (280 lines, 210 MiB) from https://dumps.wikimedia.org/enwiki/20200801/

The change with std::max() still not fixed the case when first page (from ws == WrapScope::wsVisible) contains only short lines but immediately followed by long lines (e.g. enwiki-20200801-flaggedrevs.sql), which is more common (at least for MySQL dumps, where file starts with short comments and DDL statements).

I added a new method to measure the time for ActionDuration see https://github.com/zufuliu/notepad2/commit/f7e9ef12702c767e739077ea33038acbe0b0934c
Instead of measuring the time for one line, measuring the time for 1024 bytes is more reliable and works better. finding lines need wrapping is then changed to finding lines contains allowed bytes in the time slot. for long lines, wrapping is done line by line, for short lines, it's block by block. comparing codes in #if ActionDuration_MeasureTimeByBytes ... #endif blocks, change to measure time for 1024 bytes is small.

WrapLines-0902.log
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2020-09-02

I don't really want to spend much time on this but I'm seeing some different numbers (maybe 40%) to you on enwiki-20200801-flaggedrevs.sql and an almost-5-year-old i7 won't be that much faster than an i5 since this is running on a single core. Perhaps you are running in debug mode or syntax highlighting (which makes PositionCache work better) is turned off.

AddSample actions=4.681963300 / 83, one=0.056409196, value=0.014177299, [0.000100000, 0.000001, 0.000100] AddSample actions=21.796871600 / 132, one=0.165127815, value=0.041356954, [0.000100000, 0.000001, 0.000100] AddSample actions=8.828542800 / 65, one=0.135823735, value=0.034030934, [0.000100000, 0.000001, 0.000100]

Per-byte statistics may be a better approach and I'll have a look after Scintilla 5.

This issue is looking at batch size from the context of files with huge lines but the opposite problem is where the adaptive rate came from. Short batches made files with huge numbers of reasonable length lines slow as there was a lot of per-batch overhead from redisplay, scroll bar updates, and application notifications (and applications processing those notifications).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2020-09-13

Aspects of timing that should be further explored include overhead and accuracy.

A different technique for maintaining smoothness is to give short operations a time budget (likely between 5 and 50 milliseconds) then stopping once that budget has been exhausted. A clock read is likely a system call so is not something that should be done too often - with common scenarios with short lines and good caching once per line is probably too often. There may have to be a limit to clock reading with some (dynamically calculated?) minimum amount of processing (bytes wrapped or styled) between reads.

MinGW was particularly imprecise with IIRC the 60Hz system tick clock being used for high_resolution_clock. That may have been fixed but it made the minimum period (17 ms) close to the desired short operation duration. It can lead to measurement producing many zero values and makes stopping after a budget not work well. If all platforms had a precision better than 5 milliseconds then the budget technique may be OK. Without precision, indirect budgets like the current scheme can limit operations more effectively.

Since the benefits of different designs depends on overhead and accuracy, measurements of these are needed for a wide variety of targets. While simple slow machines may seem to be the worst case, its possible that more complex systems may actually be more of a problem. Perhaps a NUMA machine (I don't have access to one) requires complex synchronization between CPUs when reading a clock or requires waking up other processes or even hardware. Or a virtualized machine only receives a subset of host clock ticks as an optimization.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

high_resolution_clock in MinGW is not fixed (checked 10.2), it's system_clock.
I think we should use https://en.cppreference.com/w/cpp/chrono/steady_clock instead.

clock	MSVC	libstdc++	libc++	libc++ win32
steady_clock	QueryPerformanceCounter	clock_gettime	clock_gettime	QueryPerformanceCounter
system_clock	GetSystemTimeAsFileTime	gettimeofday	gettimeofday	GetSystemTimeAsFileTime
high_resolution_clock	steady_clock	system_clock	[1]	[1]

https://github.com/microsoft/STL/blob/master/stl/inc/chrono#L577
https://github.com/microsoft/STL/blob/master/stl/inc/chrono#L621

https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/src/c%2B%2B11/chrono.cc
https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/std/chrono#L1163

[1] in libc++, high_resolution_clock is system_clock or steady_clock depends on _LIBCPP_HAS_NO_MONOTONIC_CLOCK.
https://github.com/llvm/llvm-project/blob/master/libcxx/include/chrono#L1585
https://github.com/llvm/llvm-project/blob/master/libcxx/src/chrono.cpp

Edit: MSVC and libc++ on Windows use GetSystemTimePreciseAsFileTime for system_clock when it's available. mingw-w64 gettimeofday (and clock_gettime(CLOCK_REALTIME)) always use GetSystemTimeAsFileTime, clock_gettime(CLOCK_MONOTONIC) is same as QueryPerformanceCounter.

https://github.com/mirror/mingw-w64/blob/master/mingw-w64-crt/misc/gettimeofday.c
https://github.com/mirror/mingw-w64/blob/master/mingw-w64-libraries/winpthreads/src/clock.c

Last edit: Zufu Liu 2020-09-14

Neil Hodgson - 2020-09-14

Originally, ElapsedTime (the per-platform implementation before std::chrono) was for debugging performance issues and didn't affect any user functionality. When the number of lines handled was bounded, it was reused because it was convenient. The two jobs are really different. Bounding short operations only needs around 5 millisecond accuracy as it is trying to maximize user perceived smoothness and users are unlikely to perceive operations shorter than 50 milliseconds in most cases (except twitch games). When experimenting with different approaches in code, measuring processing a single line or lexeme may need better accuracy like QueryPerformanceCounter.

Its unlikely _LIBCPP_HAS_NO_MONOTONIC_CLOCK is worth worrying about. Its tied to no threading support which may mean it is for cases like drivers, not userland code.
http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20141201/119573.html
Even if steady_clock is unavailable in some situations that may occur for building Scintilla, it should be directly checked for instead of using a barely documented library implementation symbol.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zufu Liu - 2020-09-13

Changed ElapsedPeriod to use steady_clock when available.
See discussion on other project https://github.com/cocos2d/cocos2d-x/issues/16046

ElapsedPeriod.h

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

By testing enwiki-20200801-langlinks.sql (1491 lines, 1.37 GiB) line by line, I found Arabic (e.g. line 287 - 288), Hebrew (e.g. line 633 - 645) and Hindi (e.g. line 646 - 651) takes much more time than Latin.
Per-byte statistics looks better, as complex scripts takes more bytes to encode and more time to decode.

log added into Editor::WrapLines().

const double duration = epWrapping.Duration();
printf("line=%zd, actions=%.9f / %zd\n", lineToWrap, duration, actions);
durationWrapOneLine.AddSample(actions, duration);

script to sort log lines by time

import sys
import os.path

def sort_line_time(path):
    time_map = {}
    for line in open(path).readlines():
        if line.startswith('line='):
            start = line.index('actions=')
            start += len('actions=')
            end = line.index('/', start)
            duration = float(line[start:end])
            key = round(duration, 1)
            if key in time_map:
                time_map[key].append(line)
            else:
                time_map[key] = [line]

    lines = []
    for key, value in sorted(time_map.items(), reverse=True):
        lines.extend(sorted(value))

    path, ext = os.path.splitext(path)
    path = path + '-sorted' + ext
    with open(path, 'w') as fd:
        fd.write(''.join(lines))

sort_line_time(sys.argv[1])

Zufu Liu - 2020-09-15

Changed _LIBCPP_HAS_NO_MONOTONIC_CLOCK to NO_CXX11_STEADY_CLOCK.
Added a incomplete table for "Preprocessors affect building" in ScintillaDoc.html.
Simplified the loop inside LineLayout::SetLineStart() since newLineStarts is already zeroed.

ElapsedPeriod.diff

ElapsedPeriod.h

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2021-04-29

Changed to steady clock with [3932d3] , updated documentation [8feaac] , and changed loop to std::copy [673cd8] .

Related

Commit: [3932d3]
Commit: [673cd8]
Commit: [8feaac]

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2021-05-08

Committed with per-byte measurement and estimation as [8b128a] and [89dd67] .

The behaviour with this change shows that there is significant differences in rates of progress per-byte depending on factors like whether the styled text contains many identical lexemes that are cached well by PositionCache or have most layout data cached in LineLayoutCache. This makes it more important that the current rate should remain clamped so it doesn't got too extreme when hitting an easy or difficult patch.

Related

Commit: [89dd67]
Commit: [8b128a]

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2021-05-08

labels: --> scintilla, idle

Group: Initial --> Committed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zufu Liu - 2021-05-09

How about change Document::StyleToAdjustingLineDuration() to following:

const Sci::Position bytesBeingStyled = GetEndStyled() - stylingStart; durationStyleOneByte.AddSample(bytesBeingStyled, epStyling.Duration());

EnsureStyledTo() might be no-op.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Neil Hodgson - 2021-05-09
  
  What scenarios does this affect? Reentrance (enteredStyling != 0) is rare so unlikely to play an important role.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2021-06-24

status: open --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Improve ActionDuration for idle wrapping for consecutive long lines

Group

Searches

Help

#1373 Improve ActionDuration for idle wrapping for consecutive long lines

Related

Discussion

Related

Related