Menu

#1563 Optimize selection serialization

Committed
closed
5
2025-11-10
2025-06-27
Zufu Liu
No

Some ideas to optimize selection serialization and deserialization, may not worth it.

  1. deserialization can be implemented without forward lookup sv.find(tag) by change serialization to following form:
[selType] [# mainRange ,] [anchor [v virtualSpace] [- caret [v virtualSpace]]] [, [anchor [v virtualSpace] [- caret [v virtualSpace]]]]

std::from_chars() returns pointer after all digits, it can be used to advance std::string_view sv to next tag or separator character.

  1. serialization can be implemented with local buffer + std::to_chars() (which returns pointer after last written byte), when local buffer does not have enough space to write a SelectionRange, copy local buffer to result string.

  2. number (only non-negative size_t and Sci::Position) can be serialized in more compact form, e.g. with following ULEB128 like encoding:

char *to_string(char *p, size_t value) noexcept {
    do {
        *p++ = static_cast<char>(0x80 | (value & 0x7f));
        value >>= 7;
    } while (value);
    return p;
}

Discussion

  • Neil Hodgson

    Neil Hodgson - 2025-06-28

    Putting the main range first may be OK but it adds an extra ','. It went at the end as it's not needed for some uses so could be optional.

    Using from_chars_result::ptr for advancement is good if it can be done without adding much more code. Its unlikely string_view::find is expensive in any realistic scenarios.

    1. Most of the detail level SelectionPosition::ToString, SelectionRange::ToString are going to fit inside the small string optimization limit and avoid allocation. Without some strong problem cases, complicating the code with explicit buffer management isn't worth it.

    2. Readable, easily debugged text seems reasonable for the intensity of use of this feature.

    to_string above should include masking the last byte inside the function as its confusing otherwise. Unless it's relying on low-byte values always being structuring punctuation.

    The main alternative I've considered is using a single string for the whole history with '\n' between items to make it easy to serialize the whole history to/from files. Either with explicit <index>: prefixes on lines or many empty lines for multi-change steps. Since its mostly treated as a stack, almost all work occurs just on the end so there wouldn't be much scanning.

     
  • Zufu Liu

    Zufu Liu - 2025-06-28

    Putting the main range first may be OK but it adds an extra ','. It went at the end as it's not needed for some uses so could be optional.

    Here is at least one SelectionRange, the comma is needed to separate numbers. mainRange is decoded first and used for both branches inside Selection(std::string_view sv), so it looks good to encode it first.

    using sv.find() (memchr like, should has not performance problem) to separate numbers / ranges looks to me just decode from random position and discard any non-digits, so not strict.

    Unless it's relying on low-byte values always being structuring punctuation.

    All number has prefixed with a tag or separator, the binary result with above to_string can't be saved to external file.

     
  • Zufu Liu

    Zufu Liu - 2025-06-29

    Patch without forward lookup, it makes Scintilla.dll smaller.

     
    • Neil Hodgson

      Neil Hodgson - 2025-07-01

      There's some sv.front() that may need protection with !sv.empty().

      The unit tests in scintilla\test\unit\testSelection.cxx need updating.

       
  • Zufu Liu

    Zufu Liu - 2025-07-01

    Updated test code, removed redundant res.ptr == sv.data() check in Selection-0629.diff (std::from_chars()returns errc::invalid_argument when no digit is found) .

    sv.empty() check for mainRange block can be omitted since here should be at least one digit after selType or mainRange.

     

    Last edit: Zufu Liu 2025-07-01
  • Neil Hodgson

    Neil Hodgson - 2025-07-02
    • Group: Initial --> Committed
     
  • Neil Hodgson

    Neil Hodgson - 2025-07-02

    Committed with [b19406].

     

    Related

    Commit: [b19406]

  • Neil Hodgson

    Neil Hodgson - 2025-11-10
    • status: open --> closed
     

Log in to post a comment.