Menu

#1347 Optimize CellBuffer::BasicInsertString()

Committed
closed
5
2020-06-02
2020-04-13
Zufu Liu
No

First part (the BasicInsertString.diff) is to Optimize CellBuffer::BasicInsertString() to only patch up CRLF before the loop to insert lines, this improved loading file with CR+LF line endings (commonly used on Windows).
For non-UTF-8 line endings, the loop can be optimized with AVX2, SSE2 or table loop to make detection for end of line more efficient; The code implements these optimizations is available at https://github.com/zufuliu/notepad2/blob/master/scintilla/src/CellBuffer.cxx#L987

This is a small optimization, allocation for starts or call perLine->InsertLine() in plv->InsertLine() actually consumes more time, especially the later.

I think perLine->InsertLine() can be disabled (by set it to nullptr) when there is no active per-line data (i.e. length is zero), this avoid calling Document::InsertLine() on each plv->InsertLine(). On the first time application adding text (for most applications, this is the whole document loaded from disk), there would have no per-line data before the document been styled.

Implementation brief idea: in Document::InsertString(), check perLineData to see whether there are active per-line data, if no then call cb.SetPerLine(nullptr), cb.InsertString() and cb.SetPerLine(this); else directly call cb.InsertString(); The code implements this change is available at https://github.com/zufuliu/notepad2/blob/master/scintilla/src/Document.cxx#L1212

1 Attachments

Related

Feature Requests: #1370

Discussion

1 2 > >> (Page 1 of 2)
  • Zufu Liu

    Zufu Liu - 2020-04-13

    Replace the loop in ContractionState<LINE>::InsertLines() with InsertSpace() and FillRange() will make it even faster. The loop in ContractionState<LINE>::DeleteLines() can be changed to DeleteRange() too,

     
  • Neil Hodgson

    Neil Hodgson - 2020-04-13

    Do you have timing results for this change?

     
  • Zufu Liu

    Zufu Liu - 2020-04-14

    OK, I will try to find some public big files on the Internet and do the timing.

    off topic:
    Patch lambda-noexcept.diff changes all lambda to noexcept, they were found by rg "\]\(" command, https://github.com/BurntSushi/ripgrep
    Other noexcept can be found with Visual Studio 2019's Code Analysis, like bool operator()(int a, int b) (can be changed to const noexcept) in Sorter (in AutoComplete.cxx).

    Edit: ripgrep doesn't support .hgignore, but the silver searcher does (command ag "\]\("). https://github.com/ggreer/the_silver_searcher or https://github.com/k-takata/the_silver_searcher-win32

     

    Last edit: Zufu Liu 2020-04-14
  • Neil Hodgson

    Neil Hodgson - 2020-04-14

    Isn't it possible to avoid duplicating logic for the last character by running the loop one more iteration but guarding aginst reading past end?

     
  • Zufu Liu

    Zufu Liu - 2020-04-14

    It can be change to that with following changes:

    unsigned char ch = ' ';
    const char * const end = s + insertLength;
    while (ptr < end) {
        ch = *ptr++;
        switch (ch) {
        case '\r':
            if (ptr < end && *ptr == '\n') {
                ++ptr;
            }
            [[fallthrough]];
    
     

    Last edit: Zufu Liu 2020-04-14
  • Zufu Liu

    Zufu Liu - 2020-04-14

    Whole changes to avoid last character, the performance lost is very subtle (5ms for a 4,962,027 lines file with CR+LF line endings on i5).

    Type Time
    old loop 231ms
    with testing last character 169ms
    avoid testing last character 174ms
    SSE2 73ms
    AVX2 69ms
     

    Last edit: Zufu Liu 2020-04-14
  • Neil Hodgson

    Neil Hodgson - 2020-04-15

    Adding IsActive to PerLine appears good. The modularity appears better to me if CellBuffer decides how to respond to PerLine::IsActive by calling that early in BasicInsertString then passing the result as an extra flag to CellBuffer::InsertLine and LineVector::InsertLine, repurposing the lineStart argument as a bit set.

     
    • Zufu Liu

      Zufu Liu - 2020-04-15

      Sorry, I seems forgot something. Optimize out perLine inside CellBuffer::BasicInsertString() is not worth for most normal operations: after document been lexered, there may have line levels or line states, so perLine is likely always active. And it also depends on document content, as it only been called once per-line. So I only optimize out it in Document::InsertString() for large chunk of string.

       
  • Neil Hodgson

    Neil Hodgson - 2020-04-15

    While there is a significant performance improvement with the SSE/AVX2 code, it does make the code more complex and difficult to change. More line end modes could be added in the future: form feed and vertical tab may be treated as line ends or isolated carriage returns could be treated as not line ends.

    A mode that only allowed \n and \r\n line ends would cover almost all files (old-style Mac and other archaic system files are now quite rare) and be simpler and perform better as line splitting can be done after every \n. It would match the behaviour of some other editors such as vi. Some lexers may fail against such files though (if they assume an isolated \r increments the line number) so could be validated and marked as OK.

    Most SSE code I've seen does an initial loop to achieve aligned loads before the main loop assumes aligned loads whereas this patch does (potentially) unaligned loads. Since the buffer is provided by the application, it can choose to align. Have you looked at the speed difference between aligned and unaligned buffers with this code?

    For now, the SSE change doesn't appear worth including to me but that could change in the future.

     
  • Zufu Liu

    Zufu Liu - 2020-04-15

    MSVC seems never generates aligned loading when the alignment is not known as compile time even with __assume (tested assembly generated for IsUTF8() and IsUTF7() in https://github.com/zufuliu/notepad2/blob/master/src/EditEncoding.c ).
    For end of line detection, because following code, aligned loading is not possible.

    if (*next == '\n') {
        // CR+LF across boundary
        ++next;
    }
    

    AVX2 is 7 years old, while SSE2 is already 20 years old, all x64 CPU supports it. I think the SSE2/AVX2 code (about 60 lines) can be inserted before while (ptr < end) in BasicInsertString2.diff, possible after moving to C++20, where there is a portable CTZ (std::countr_zero() in <bit>).

    How about BasicInsertString2.diff (optimize patch up out of the loop) in in https://sourceforge.net/p/scintilla/feature-requests/1347/#9c49 I think it stil make code faster.

     

    Last edit: Zufu Liu 2020-04-17
    • Neil Hodgson

      Neil Hodgson - 2020-04-19

      The current plan for C++20 is for core Scintilla to start using it 12 months after strong support of almost all C++20 features in the 3 main compilers.

       
  • Neil Hodgson

    Neil Hodgson - 2020-04-18

    For InsertLine, it may be faster to buffer changes then make multiple at once replacing InsertLine with

    void InsertLines(Sci::Line lineFirst, const Sci::Position *positions, size_t lines, bool lineStart);
    

    and continuing this into PerLine which is likely to improve performance in other cases such as when some of the per-line data is active.

    When avoiding many reallocations of the line starts, the number of lines can be estimated at the end of BasicInsertString or in the Document code:

    // If this is the first block loaded and huge body allocated
    // then estimate line size and thus number of lines likely
    // then allocate that number of lines. May avoid much resizing
    // of line starts array. (with obvious added methods)  
    if ((position == 0) && (substance.AllocatedLength() > 10'000'000)) {
        const size_t lineSizeMean = (substance.Length() / Lines());
        // Over-allocate lines by 10% in case first block non-typical.
        const size_t linesDoc = substance.AllocatedLength() / lineSizeMean * 11 / 10;
        plv->AllocateLines(linesDoc);
    }
    
     
  • Zufu Liu

    Zufu Liu - 2020-04-18

    Change to InsertLines() indeed make the function faster, I added the changes (without adding InsertLines to PerLine) in https://github.com/zufuliu/notepad2/commit/acc51b7ff526515290bc6dbcc582aac41e2f7e32
    After some tests, it seems the cache count affects performance, too small or too large cache count will decrease the improvement.
    Tomorrow I will measure (by enable AttachConsole() in wWinMain() in Notepad2.c, and uncomment ElapsedPeriod related lines in CellBuffer.cxx) the change on i5 with the same file used in above table.

    I added SetInitLineCount() to LineVector in https://github.com/zufuliu/notepad2/commit/09de3a28b992c90e586b351c83a847d005ca9a91 to make starts not resizing. For Notepad2, the exact total line count is known after line endings detection (EditDetectEOLMode() in Edit.c, implemented with AVX2/SSE2 + POPCNT or table lookup).

    CellBuffer::ResetLineEnds() can take advantage of current line count or allocated size.

     
  • Zufu Liu

    Zufu Liu - 2020-04-19

    Add InsertLines(Sci::Line lineFirst, Sci::Line lineCount) to PerLine improved the performance a lot, nearly same effect as set perLine to nullptr.
    https://github.com/zufuliu/notepad2/commit/1472bafa24e1c7e7050729241d9088bfd519c1c9

    The change can be measured by change (in CellBuffer.h)

    // set perLine to nullptr
    #define Enable_WithoutPerLine       1
    #define EnablePerLine_InsertLines   0
    // without InsertLines
    #define Enable_WithoutPerLine       0
    #define EnablePerLine_InsertLines   0
    // with InsertLines
    #define Enable_WithoutPerLine       0
    #define EnablePerLine_InsertLines   1
    

    LineMarkers::InsertLines() and LineTabstops::InsertLines() have side effect: InsertValue() doesn't accept nullptr, for simplicity, InsertEmpty() is used.

     
  • Zufu Liu

    Zufu Liu - 2020-04-19

    I seems got strange result after measuring Notepad2 on i5 with enwiki-20200401-pages-articles-multistream-index.txt.bz2 from https://dumps.wikimedia.org/enwiki/20200401/
    The file is 894 MiB, 20,128,864 lines (only LF).

     
  • Neil Hodgson

    Neil Hodgson - 2020-04-19

    An external SCI_ API for setting the initial line count doesn't work as well for SciTE as it is using the ILoader interface from a background thread and loading 128K blocks. Blocks are used to avoid having an extra copy of the file in memory. When loading huge files, its common to be near the limit of physical RAM in the machine so an extra copy may switch to paging, possibly thrashing. Having the estimation inside BasicInsertString meant it was applicable to SciTE and to other applications without modifying them. Its probably better to leave this to the application and add another method to ILoader in the next major release. It should be possible to call SetInitLineCount at any time without destroying any data (it may be called after loading the first block) so could be called something similar to AllocateLines.

    Inserting multiple partitions at once also helps. This is made more complex by the partitions being sized by either int or ptrdiff_t so there is one that method that does a bulk copy and another that performs casts. This patch also includes some commented out code that moved whether the line indices are active out to LineVector but that isn't important once the lines are being batched.

    diff -r 7a8ff85aa5a8 src/CellBuffer.cxx
    --- a/src/CellBuffer.cxx    Fri Apr 17 20:26:40 2020 +1000
    +++ b/src/CellBuffer.cxx    Mon Apr 20 07:51:14 2020 +1000
    @@ -65,6 +65,7 @@
        virtual void SetPerLine(PerLine *pl) noexcept = 0;
        virtual void InsertText(Sci::Line line, Sci::Position delta) noexcept = 0;
        virtual void InsertLine(Sci::Line line, Sci::Position position, bool lineStart) = 0;
    
    +   virtual void InsertLines(Sci::Line line, const Sci::Position *positions, size_t lines, bool lineStart) = 0;
        virtual void SetLineStart(Sci::Line line, Sci::Position position) noexcept = 0;
        virtual void RemoveLine(Sci::Line line) = 0;
        virtual Sci::Line Lines() const noexcept = 0;
    @@ -177,6 +178,29 @@
                perLine->InsertLine(line);
            }
        }
    +   void InsertLines(Sci::Line line, const Sci::Position *positions, size_t lines, bool lineStart) override {
    +       const POS lineAsPos = static_cast<POS>(line);
    +       if constexpr (sizeof(Sci::Position) == sizeof(POS)) {
    +           starts.InsertPartitions(lineAsPos, positions, lines);
    +       } else {
    +           starts.InsertPartitionsWithCast(lineAsPos, positions, lines);
    +       }
    +       /*
    +       if (activeIndices) {
    +           if (activeIndices & SC_LINECHARACTERINDEX_UTF32) {
    +               startsUTF32.InsertLines(line, lines);
    +           }
    +           if (activeIndices & SC_LINECHARACTERINDEX_UTF16) {
    +               startsUTF16.InsertLines(line, lines);
    +           }
    +       }
    +       if (perLine) {
    +           if ((line > 0) && lineStart)
    +               line--;
    +           perLine->InsertLines(line, lines);
    +       }
    +       */
    +   }
        void SetLineStart(Sci::Line line, Sci::Position position) noexcept override {
            starts.SetPartitionStartPosition(static_cast<POS>(line), static_cast<POS>(position));
        }
    diff -r 7a8ff85aa5a8 src/CellBuffer.h
    --- a/src/CellBuffer.h  Fri Apr 17 20:26:40 2020 +1000
    +++ b/src/CellBuffer.h  Mon Apr 20 07:51:14 2020 +1000
    @@ -166,6 +166,7 @@
        Sci::Line LineFromPosition(Sci::Position pos) const noexcept;
        Sci::Line LineFromPositionIndex(Sci::Position pos, int lineCharacterIndex) const noexcept;
        void InsertLine(Sci::Line line, Sci::Position position, bool lineStart);
    +   void InsertLines(Sci::Line line, const Sci::Position *position, size_t lines, bool lineStart);
        void RemoveLine(Sci::Line line);
        const char *InsertString(Sci::Position position, const char *s, Sci::Position insertLength, bool &startSequence);
    
    diff -r 7a8ff85aa5a8 src/Partitioning.h
    --- a/src/Partitioning.h    Fri Apr 17 20:26:40 2020 +1000
    +++ b/src/Partitioning.h    Mon Apr 20 07:51:14 2020 +1000
    @@ -122,6 +122,24 @@
            stepPartition++;
        }
    
    
    +   void InsertPartitions(T partition, const T *positions, size_t length) {
    +       if (stepPartition < partition) {
    +           ApplyStep(partition);
    +       }
    +       body->InsertFromArray(partition, positions, 0, length);
    +       stepPartition += static_cast<T>(length);
    +   }
    +
    +   void InsertPartitionsWithCast(T partition, const ptrdiff_t *positions, size_t length) {
    +       // Used for 64-bit builds when T is 32-bits
    +       if (stepPartition < partition) {
    +           ApplyStep(partition);
    +       }
    +       for (size_t i = 0; i < length; i++)
    +           body->Insert(partition+i, static_cast<T>(positions[i]));
    +       stepPartition += static_cast<T>(length);
    +   }
    +
        void SetPartitionStartPosition(T partition, T pos) noexcept {
            ApplyStep(partition+1);
            if ((partition < 0) || (partition > body->Length())) {
    
     
  • Neil Hodgson

    Neil Hodgson - 2020-04-19

    The UTF-8 line ends case can be optimized because the bytes that end LS, PS, and NEL are 0xa8, 0xa9, and 0x85 so only performing the complex test for non-ASCII bytes like:

    if (utf8LineEnds && (ch >= 0x80)) {
        const unsigned char back3[3] = { chBeforePrev, chPrev, ch };
        if (UTF8IsSeparator(back3) || UTF8IsNEL(back3 + 1)) {
    
     
  • Zufu Liu

    Zufu Liu - 2020-04-20

    Add (ch >= 0x80) seems does nothing, I end up with a table lookup like

    uint8_t eolTable[256]{};
    eolTable[static_cast<uint8_t>('\n')] = 1;
    eolTable[static_cast<uint8_t>('\r')] = 2;
    if (utf8LineEnds) {
        // see UniConversion.h for LS, PS and NEL
        eolTable[0x85] = 4;
        eolTable[0xa8] = 3;
        eolTable[0xa9] = 3;
    }
    

    change to InsertPartitions() improved a lot for Win32 builds;
    the table lookup improved a lot for UTF-8 line ends; form feed or line feed can be added in the further when needed.

    attached test results.

     
    • Neil Hodgson

      Neil Hodgson - 2020-04-20

      The reason for the good speedup in Win32 builds is that it matches the sizeof(Sci::Position) == sizeof(POS) condition so calls InsertFromArray. The same would occur for a 64-bit build with largeDocument set.

       
  • Zufu Liu

    Zufu Liu - 2020-04-20

    Build Scintilla with attached changes (table lookup) indeed make SciTE faster.

    inserting single character is handled at the end of loop.

     
    • Neil Hodgson

      Neil Hodgson - 2020-04-20

      While nice and fast, that change doesn't work with some Unicode line end manipulations. test/simpleTests.py has several tests for LS, PS, and NEL and the results follow. I couldn't quickly see the problem but hope to get to it later. simpleTests.py can be run after building the DLL and require matching bit widths: 32-bit Python with a 32-bit DLL or both 64-bit. To see which particular test step caused the failure, add a 'msg=' argument to assertEquals.

      >pyw -u "simpleTests.py"
      simpleTests
      .................................................................................................................................F.F...............F.F........................................
      ======================================================================
      FAIL: testNELBreakApartEOL (simpleTests.TestSimple)
      ----------------------------------------------------------------------
      Traceback (most recent call last):
        File "C:\u\hg\scintilla\test\simpleTests.py", line 481, in testNELBreakApartEOL
          self.assertEquals(self.ed.LineCount, 2)
      AssertionError: 1 != 2
      
      ======================================================================
      FAIL: testNELFragmentedEOLStart (simpleTests.TestSimple)
      ----------------------------------------------------------------------
      Traceback (most recent call last):
        File "C:\u\hg\scintilla\test\simpleTests.py", line 456, in testNELFragmentedEOLStart
          self.assertEquals(self.ed.LineCount, 2)
      AssertionError: 1 != 2
      
      ======================================================================
      FAIL: testUBreakApartEOL (simpleTests.TestSimple)
      ----------------------------------------------------------------------
      Traceback (most recent call last):
        File "C:\u\hg\scintilla\test\simpleTests.py", line 399, in testUBreakApartEOL
          self.assertEquals(self.ed.LineCount, 2)
      AssertionError: 1 != 2
      
      ======================================================================
      FAIL: testUFragmentedEOLStart (simpleTests.TestSimple)
      ----------------------------------------------------------------------
      Traceback (most recent call last):
        File "C:\u\hg\scintilla\test\simpleTests.py", line 374, in testUFragmentedEOLStart
          self.assertEquals(self.ed.LineCount, 2)
      AssertionError: 1 != 2
      
      ----------------------------------------------------------------------
      Ran 190 tests in 0.230s
      
      FAILED (failures=4)
      
      self.assertEquals(self.ed.LineCount, 2, msg=f"Failed with {i}")
      
       

      Last edit: Neil Hodgson 2020-04-21
  • Zufu Liu

    Zufu Liu - 2020-04-21

    originally code

    simpleTests2
    testNELBreakApartEOL
    BasicInsertString insertLength=4, chBeforePrev=00, chPrev=00
            78 c2 85 79
            InsertLine line=1, position=3, lineStart=1
    BasicInsertString insertLength=1, chBeforePrev=00, chPrev=00
            78
            RemoveLine line=1
    BasicInsertString insertLength=1, chBeforePrev=00, chPrev=78
            c2
            InsertLine line=1, position=3, lineStart=0
            RemoveLine line=1
    BasicInsertString insertLength=1, chBeforePrev=78, chPrev=c2
            85
            InsertLine line=1, position=3, lineStart=0
    BasicInsertString insertLength=1, chBeforePrev=c2, chPrev=85
            79
    .
    ----------------------------------------------------------------------
    Ran 1 test in 0.016s
    
    OK
    

    table lookup

    simpleTests2
    testNELBreakApartEOL
    BasicInsertString insertLength=4, chBeforePrev=00, chPrev=00
            78 c2 85 79
            InsertLine line=1, position=3, lineStart=1
    BasicInsertString insertLength=1, chBeforePrev=00, chPrev=00
            78
            RemoveLine line=1
    BasicInsertString insertLength=1, chBeforePrev=00, chPrev=78
            c2
    F
    ======================================================================
    FAIL: testNELBreakApartEOL (simpleTests2.TestSimple2)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "F:\Libs\scintilla\test\simpleTests2.py", line 47, in testNELBreakApartEOL
        self.assertEquals(self.ed.LineCount, 2)
    AssertionError: 1 != 2
    
    ----------------------------------------------------------------------
    Ran 1 test in 0.003s
    
    FAILED (failures=1)
    
     
  • Zufu Liu

    Zufu Liu - 2020-04-21

    Updated log, it seems something is wrong after the loop.
    old code

    testNELBreakApartEOL
    BasicInsertString insertLength=4, chBeforePrev=00, chPrev=00
            78 c2 85 79
            InsertLine line=1, position=3, lineStart=1
            chBeforePrev=85, chPrev=79, ch=79, chAfter=00
    BasicInsertString insertLength=1, chBeforePrev=00, chPrev=00
            78
            chBeforePrev=00, chPrev=78, ch=78, chAfter=c2
            RemoveLine line=1
    BasicInsertString insertLength=1, chBeforePrev=00, chPrev=78
            c2
            chBeforePrev=78, chPrev=c2, ch=c2, chAfter=85
            InsertLine line=1, position=3, lineStart=0
            RemoveLine line=1
    BasicInsertString insertLength=1, chBeforePrev=78, chPrev=c2
            85
            InsertLine line=1, position=3, lineStart=0
            chBeforePrev=c2, chPrev=85, ch=85, chAfter=79
    BasicInsertString insertLength=1, chBeforePrev=c2, chPrev=85
            79
            chBeforePrev=85, chPrev=79, ch=79, chAfter=00
    

    table lookup

    testNELBreakApartEOL
    BasicInsertString insertLength=4, chBeforePrev=00, chPrev=00
            78 c2 85 79
            InsertLine line=1, position=3, lineStart=1
            chBeforePrev=c2, chPrev=85, ch=79, chAfter=00
    BasicInsertString insertLength=1, chBeforePrev=00, chPrev=00
            78
            chBeforePrev=00, chPrev=00, ch=78, chAfter=c2
            RemoveLine line=1
    BasicInsertString insertLength=1, chBeforePrev=00, chPrev=78
            c2
            chBeforePrev=00, chPrev=78, ch=c2, chAfter=85
    
     
  • Zufu Liu

    Zufu Liu - 2020-04-21

    the failure can be fixed by assigning chBeforePrev and chPrev like:

        } else if (utf8LineEnds && !UTF8IsAscii(chAfter)) {
            chBeforePrev = chPrev;
            chPrev = ch;
            // May have end of UTF-8 line end in buffer and start in insertion
    

    in originally code, they are assigned at the of the loop; but in table lookup code, they not assigned after last character

        ch = *end;
        if (ptr == end) {
    
     
  • Zufu Liu

    Zufu Liu - 2020-04-21

    Whole changes with SSE2 and AVX2 optimization for non-UTF-8 line endings, don't need C++20.
    I will check whether Intel C++ could generate TZCNT (_tzcnt_u32) without any flags.

     
1 2 > >> (Page 1 of 2)

Log in to post a comment.

MongoDB Logo MongoDB