Menu

#1533 Optimize brace match

Initial
closed
5
2024-12-18
2024-11-12
Zufu Liu
No

I find Document::BraceMatch() can be optimized while try to fix https://github.com/zufuliu/notepad4/issues/911

  1. Style comparison can be delayed to when match brace is found, this avoided some unneeded calls and compares.
  2. Call to NextPosition() (takes much time than style comparison) can be eliminated:
    2.1 UTF-8 and single byte code pages are safe for finding ASCII braces in both directions, a single position += direction; is enough.
    2.2 For DBCS code pages, ASCII is safe for forward search, bytes less than minimum trail byte are safe for backward search.

First approach is making faster version (only position += direction;) for UTF-8 and single byte code pages like Document::FindText().

I'm using following safe char approach for Notepad4, for UTF-8, it has no obvious performance lost.

safeChar = 0xff;
if (dbcsCodePage != 0 && dbcsCodePage != CpUtf8) {
    safeChar = (direction >= 0) ? 0x80 : 0x30 /*get min trailByte - 1*/;
}
if (chAtPos <= safeChar) {
    position += direction;
} else {
    // NextPosition(position, direction)
}

This approach (which different safe char values) can be used to simplify following block in Document::FindText():

// current code
if (forward && UTF8IsAscii(leadByte)) {
    pos++;
} else {
    if (dbcsCodePage) {
        if (!NextCharacter(pos, increment)) {
            break;
        }
    } else {
        pos += increment;
    }
}

// safe char
if (leadByte <= safeChar) {
    pos += increment;
} else if (!NextCharacter(pos, increment)) {
    break;
}

Discussion

  • Zufu Liu

    Zufu Liu - 2024-11-12

    First patch to delay style comparison:

    @@ -2840,14 +2840,12 @@
        position = useStartPos ? startPos : NextPosition(position, direction);
        while ((position >= 0) && (position < LengthNoExcept())) {
            const char chAtPos = CharAt(position);
    -       const int styAtPos = StyleIndexAt(position);
    -       if ((position > GetEndStyled()) || (styAtPos == styBrace)) {
    -           if (chAtPos == chBrace)
    -               depth++;
    -           if (chAtPos == chSeek)
    -               depth--;
    -           if (depth == 0)
    -               return position;
    +       if (chAtPos == chBrace || chAtPos == chSeek) {
    +           if ((position > GetEndStyled()) || (StyleIndexAt(position) == styBrace)) {
    +               depth += (chAtPos == chBrace) ? 1 : -1;
    +               if (depth == 0)
    +                   return position;
    +           }
            }
            const Sci::Position positionBeforeMove = position;
            position = NextPosition(position, direction);
    

    Full change for my safe char approach:
    https://github.com/zufuliu/notepad4/commit/7e449bcb1e5a02096ba38eb7f1246199de147beb

     
  • Zufu Liu

    Zufu Liu - 2024-11-12
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -5,7 +5,7 @@
     2.1 UTF-8 and single byte code pages are safe for finding ASCII braces in both directions, a single `position += direction;` is enough.
     2.2 For DBCS code pages, ASCII is safe for forward search, bytes less than minimum trail byte are safe for backward search.
    
    -First approach is making first version (only `position += direction;`) for UTF-8 and single byte code pages like `Document::FindText()`.
    +First approach is making faster version (only `position += direction;`) for UTF-8 and single byte code pages like `Document::FindText()`.
    
     I&#39;m using following safe char approach for Notepad4, for UTF-8, it has no obvious performance lost.
     ```cpp
    
     
  • Zufu Liu

    Zufu Liu - 2024-11-14

    it has no obvious performance lost.

    I measured in the wrong place, results (from fast to slow) after moving ElapsedPeriod code into case Message::BraceMatch: in Editor::WndProc() to measure Document::BraceMatch() method call:
    1. only a single block (e.g. treat DBCS as ASCII safe) with position += direction;.
    2. two blocks, first one is ASCII safe, move DBCS safe char approach into separate function and call it.
    3. two blocks, first one is ASCII safe, DBCS safe char approach as later block.
    4. single safe char approach.

     
  • Neil Hodgson

    Neil Hodgson - 2024-11-14

    Moving the style check (BraceMatch-1112.diff) is good. I have a 4.3 GB file used for testing containing repeated copies of scite/src/SciTEBase.cxx and typing '{' at the start to produce a match failure with the '{' coloured red takes 22 seconds with current code as it scans the whole file. This change reduces the time to 17.4 seconds.

    Using safeChar reduces this further to 10.8 seconds, so half the current code's time. Not checking the character and always doing position += direction; improves the time to 8.6 seconds. It would be fine to use specialized loops, checking for DBCS and using safeChar in its loop but not for the more common encodings.

    However, I've recently been looking at the case-sensitive search code which does a similar scan and is much faster due to calling memchr which is optimized and checks 32 bytes at a time in its inner loop. Searching for each instance of an identifier that starts with an unpopular character takes 0.39 seconds which is 85% of the memory bandwidth on the test machine which has base-speed DDR4. An implementation that looked through the search string for the least common character (SaveAsXML -> search for X) could use this to improve performance.

    A variant of memchr that looked for either of the chBrace and chSeek would be great here as it would work correctly in UTF-8 and 8-bit encodings with the set of supported brace characters. However, for portability, I don't want to add SSE code like Microsoft's memchr implementation. This could be implemented by calling memchr for both characters and returning the minimum.

    const void *mem2chr(const void *s, int c1, int c2, size_t n) {
        const void *m1 = memchr(s, c1, n);
        const void *m2 = memchr(s, c2, n);
        if (!m1)
            return m2;
        if (!m2)
            return m1;
        // Neither null so return first
        return std::min(m1, m2);
    }
    

    Calling mem2chr on the whole document may run into bad worst cases where there aren't any more of one or not for a long distance. A limit could be placed on the distance checked for both then moved on if there isn't a match. Something like this which hasn't been tested much so may contain major bugs.

    // SplitFindChar but looking for one of two char for use when brace matching
    constexpr size_t findEitherBlock = 1024;
    ptrdiff_t SplitFindEitherChar(const SplitView &view, size_t start, size_t length, int ch1, int ch2) noexcept {
        const size_t end = start + length;
        while (start < end) {
            if (start < view.length1) {
                const size_t range1Length = std::min({ length, view.length1 - start, findEitherBlock });
                const char *match = static_cast<const char *>(mem2chr(view.segment1 + start, ch1, ch2, range1Length));
                if (match) {
                    return match - view.segment1;
                }
                start += range1Length;
                length -= range1Length;
            } else {
                const size_t range2Length = std::min({ length, view.length - start, findEitherBlock });
                const char *match2 = static_cast<const char *>(mem2chr(view.segment2 + start, ch1, ch2, range2Length));
                if (match2) {
                    return match2 - view.segment2;
                }
                start += range2Length;
                length -= range2Length;
            }
        }
        return -1;
    }
    

    Performance is sensitive to the findEitherBlock value but was best at 1024 with 2.8 seconds. If this code handled all cases then I'd want it even though it is starting to get bulky but it doesn't handle backwards scanning which is more complex as Visual C++ doesn't implement memrchr. DBCS isn't handled but that may be fixable with an inside-character check on the result. DBCS is also a small proportion of use so using safeChar for DBCS and accepting much better optimization of the common encodings is fine.

    There are further potential optimizations: the 'losing' value from the min is forgotten and will eventually be needed so could be remembered.

    Since waiting even a second for the brace to turn red would make brace matching more annoying than useful and 22 seconds is dreadful, there should be some limit to how far brace matching will scan.

     

    Last edit: Neil Hodgson 2024-11-14
  • Zufu Liu

    Zufu Liu - 2024-11-15

    DBCS is also a small proportion of use so using safeChar for DBCS and accepting much better optimization of the common encodings is fine.

    Even for DBCS, ( and ) are not trail byte in all DBCS code pages, < and > are only trail byte in Johab. if (chBrace <= asciiBackwardSafeChar) then use specialized loops.

    there should be some limit to how far brace matching will scan.

    I was thought adding a timeout (e.g. 200ms) instead of byte limit, after checked every 1MB bytes, if timeout and remaining text lager than a threshold, then returns -position to indicate timeout instead of miss match.

    Using SplitFindEitherChar() is indeed faster, so I'm planning to add SSE version (faster than mem2chr, not test strpbrk) for Notepad4, something like following (lacks boundary check for now):

    char operator[](size_t position) const noexcept {
        if (position < length1) {
            return segment1[position];
        }
        return segment2[position];
    }
    
    uint32_t SplitFindEitherChar(const SplitView &cbView, ptrdiff_t &position, __m128i mmBrace, __m128i mmSeek) noexcept {
        size_t start = position;
        uint32_t mask = 0;
        if (start < cbView.length1) {
            const __m128i *ptr = reinterpret_cast<const __m128i *>(cbView.segment1 + start);
            do {
                const __m128i chunk = _mm_loadu_si128(ptr);
                mask = _mm_movemask_epi8(_mm_or_si128(_mm_cmpeq_epi8(chunk, mmBrace), _mm_cmpeq_epi8(chunk, mmSeek)));
                if (mask != 0) {
                    position = start;
                    return mask;
                }
                ptr++;
                start += sizeof(chunk);
            } while (start < cbView.length1);
            start = cbView.length1;
        }
        if (start < cbView.length) {
            const __m128i *ptr = reinterpret_cast<const __m128i *>(cbView.segment2 + start);
            do {
                const __m128i chunk = _mm_loadu_si128(ptr);
                mask = _mm_movemask_epi8(_mm_or_si128(_mm_cmpeq_epi8(chunk, mmBrace), _mm_cmpeq_epi8(chunk, mmSeek)));
                if (mask != 0) {
                    break;
                }
                ptr++;
                start += sizeof(chunk);
            } while (start < cbView.length);
        }
        position = start;
        return mask;
    }
    
    const __m128i mmBrace = _mm_set1_epi8(chBrace);
    const __m128i mmSeek = _mm_set1_epi8(chSeek);
    while (position < length) {
        uint32_t mask = SplitFindEitherChar(cbView, position, mmBrace, mmSeek);
        Sci::Position index = position;
        position += sizeof(mmBrace);
        while (mask) {
            if (mask & 1) {
                if ((index > GetEndStyled()) || (StyleIndexAt(index) == styBrace)) {
                    const unsigned char chAtPos = cbView[index];
                    depth += (chAtPos == chBrace) ? 1 : -1;
                    if (depth == 0) {
                        return index;
                    }
                }
            }
            index++;
            mask >>= 1;
        }
    }
    
     
  • Neil Hodgson

    Neil Hodgson - 2024-11-16

    Committed the style optimization as [7a8105]. A version of the safe char approach committed as [1ce781] with new DBCSMinTrailByte method.

     

    Related

    Commit: [1ce781]
    Commit: [7a8105]

  • Neil Hodgson

    Neil Hodgson - 2024-11-16

    Shorter and around 6% faster version of SplitFindEitherChar. Speed due to inlining 2 argument version of std::min where initializer list version is a call. This isn't being committed but may be worked on further.

    ptrdiff_t SplitFindEitherChar(const SplitView &view, size_t start, size_t length, int ch1, int ch2) noexcept {
        while (length > 0) {
            const bool scanFirst = start < view.length1;
            const char * const segment = scanFirst ? view.segment1 : view.segment2;
            const size_t segmentLength = scanFirst ? view.length1 : view.length;
            const size_t rangeLength = std::min(std::min(length, segmentLength - start), findEitherBlock);
            const char *match = static_cast<const char *>(mem2chr(segment + start, ch1, ch2, rangeLength));
            if (match) {
                return match - segment;
            }
            start += rangeLength;
            length -= rangeLength;
        }
        return -1;
    }
    
     
  • Zufu Liu

    Zufu Liu - 2024-11-17

    Just micro-optimization, I'm currently use something like following to avoid NextPosition() for backward DBCS match when brace is found:

    @@ -2856,9 +2856,7 @@
        if (chSeek == '\0')
            return - 1;
        const int styBrace = StyleIndexAt(position);
    -   int direction = -1;
    -   if (chBrace == '(' || chBrace == '[' || chBrace == '{' || chBrace == '<')
    -       direction = 1;
    +   const int direction = (chBrace < chSeek) ? 1 : -1;
        int depth = 1;
        position = useStartPos ? startPos : NextPosition(position, direction);
    
    @@ -2876,14 +2874,13 @@
                    if (depth == 0)
                        return position;
                }
    -       }
    -       if (chAtPos <= maxSafeChar) {
    +           position += direction;
    +       } else if (chAtPos <= maxSafeChar) {
                position += direction;
            } else {
    -           const Sci::Position positionBeforeMove = position;
    -           position = NextPosition(position, direction);
    -           if (position == positionBeforeMove)
    +           if (!NextCharacter(position, direction)) {
                    break;
    +           }
            }
        }
        return - 1;
    

    NextCharacter() is used to avoid MSVC treat return -1; as hot path and reorder it before style comparison block, so this change may not worth it.

    Will try the short version SplitFindEitherChar().

     
  • Zufu Liu

    Zufu Liu - 2024-11-18

    Single std::min seems is also working:

    ptrdiff_t SplitFindEitherChar(const SplitView &view, size_t start, size_t length, int ch1, int ch2) noexcept {
        constexpr size_t findEitherBlock = 1024;
        do {
            const bool scanFirst = start < view.length1;
            const char * const segment = scanFirst ? view.segment1 : view.segment2;
            const size_t segmentLength = scanFirst ? view.length1 : length;
            const size_t rangeLength = std::min(segmentLength - start, findEitherBlock);
            const char *match = static_cast<const char *>(mem2chr(segment + start, ch1, ch2, rangeLength));
            if (match) {
                return match - segment;
            }
            start += rangeLength;
        } while (start < length)
        return -1;
    }
    
    while (position < length) {
        position = SplitFindEitherChar(view, position, length, chBrace, chSeek);
        if (position < 0) {
            break;
        }
        if ((position > GetEndStyled()) || (StyleIndexAt(position) == styBrace)) {
        }
        position += 1;
    }
    
     
  • Zufu Liu

    Zufu Liu - 2024-11-20

    std::string_view::find_first_of() like SplitFindEitherChar(), it's faster than current code, slower and less magic than mem2chr(), can be made to work for backward search without memrchr().

    ptrdiff_t SplitFindEitherChar(const SplitView &view, size_t start, const bool (&charSet)[256]) noexcept {
        if (start < view.length1) {
            const char *ptr = view.segment1 + start;
            const char *end = view.segment1 + view.length1;
            do {
                const unsigned char ch = *ptr;
                if (charSet[ch]) {
                    return ptr - view.segment1;
                }
                ++ptr;
            } while (ptr < end);
            start = view.length1;
        }
        if (start < view.length) {
            const char *ptr = view.segment2 + start;
            const char *end = view.segment2 + view.length;
            do {
                const unsigned char ch = *ptr;
                if (charSet[ch]) {
                    return ptr - view.segment2;
                }
                ++ptr;
            } while (ptr < end);
        }
        return -1;
    }
    
     
  • Zufu Liu

    Zufu Liu - 2024-11-23

    inlined SplitFindEitherChar():

    while (position < length) {
        const bool scanFirst = static_cast<size_t>(position) < cbView.length1;
        const Sci::Position segmentLength = scanFirst ? cbView.length1 : length;
        const char * const segment = scanFirst ? cbView.segment1 : cbView.segment2;
        constexpr size_t findEitherBlock = 1024;
        const char *match = nullptr;
        do {
            const size_t rangeLength = std::min<size_t>(segmentLength - position, findEitherBlock);
            match = static_cast<const char *>(mem2chr(segment + position, chBrace, chSeek, rangeLength));
            if (match) {
                break;
            }
            position += rangeLength;
        } while (position < segmentLength);
        if (match) {
            position = match - segment;
            if (position > GetEndStyled() || StyleIndexAt(position) == styBrace) {
                const unsigned char chAtPos = *match;
                depth += (chAtPos == chBrace) ? 1 : -1;
                if (depth == 0) {
                    return position;
                }
            }
            position++;
        }
    }
    
     

    Last edit: Zufu Liu 2024-11-24
  • Zufu Liu

    Zufu Liu - 2024-11-23

    Optimize forward brace match is worth for following satiations:
    1. Open large JSON file and match top level { or [ at file begging.
    2. Open large XML file and match < (<?xml) at file begging, lead to whole document scanning due to ?> currently colored with different style.

     
  • Zufu Liu

    Zufu Liu - 2024-11-26

    Patch optimized DBCS brace match:

    @@ -2863,29 +2857,23 @@
        int depth = 1;
        position = useStartPos ? startPos : NextPosition(position, direction);
    
    -   // Avoid complex NextPosition call for bytes that may not cause incorrect match
    +   // Avoid using MovePositionOutsideChar to check DBCS trail byte
        unsigned char maxSafeChar = 0xff;
        if (dbcsCodePage != 0 && dbcsCodePage != CpUtf8) {
    -       maxSafeChar = (direction >= 0) ? 0x80 : DBCSMinTrailByte() - 1;
    +       maxSafeChar = DBCSMinTrailByte() - 1;
        }
    
        while ((position >= 0) && (position < LengthNoExcept())) {
            const unsigned char chAtPos = CharAt(position);
            if (chAtPos == chBrace || chAtPos == chSeek) {
    -           if ((position > GetEndStyled()) || (StyleIndexAt(position) == styBrace)) {
    +           if (((position > GetEndStyled()) || (StyleIndexAt(position) == styBrace)) &&
    +               (chAtPos <= maxSafeChar || position == MovePositionOutsideChar(position, direction, false))) {
                    depth += (chAtPos == chBrace) ? 1 : -1;
                    if (depth == 0)
                        return position;
                }
            }
    -       if (chAtPos <= maxSafeChar) {
    -           position += direction;
    -       } else {
    -           const Sci::Position positionBeforeMove = position;
    -           position = NextPosition(position, direction);
    -           if (position == positionBeforeMove)
    -               break;
    -       }
    +       position += direction;
        }
        return - 1;
     }
    
    1. reduced maxSafeCharcompare to only when brace with same style is found, so should make UTF-8 and single byte code pages a lettle faster.
    2. use MovePositionOutsideChar() to check whether current brace is trail byte, if it is then skip it. This is faster (especially for backward match) than using NextPosition(), as there are only a few CJK characters using braces as trail byte, and in general programming text, when matching brace operators, it's unlikely preceding character is dual-byte. with [feature-requests:#1535] it's even faster, still needs to figure out why backward NextPosition() is much slow.

    Here is my script to make large JSON test file:

    import json
    
    path = r'D:\Program Files\Microsoft Visual Studio\Packages\_Instances\953ef33b\catalog.json'
    doc = open(path, encoding='utf-8').read()
    catalog = json.loads(doc)
    
    big = {}
    for i in range(51):
        big[i] = catalog
    pretty = json.dumps(big, ensure_ascii=False, indent='\t')
    pretty = '\n[\n(\n<\n' + pretty + '\n>\n)\n]\n'
    with open('big.json', 'w', encoding='utf-8') as fd:
        fd.write(pretty)
    
    pretty = pretty.encode('gbk', 'backslashreplace')
    with open('big-gbk.json', 'wb') as fd:
        fd.write(pretty)
    
     

    Related

    Feature Requests: #1535

  • Neil Hodgson

    Neil Hodgson - 2024-11-28

    A slightly different approach is to separate the search for the terminating match chSeek from counting the depth-increasing chBrace as this allows use of the optimized memchr through SplitFindChar or reverse case std::string_view::find_last_of. This reduces forward searches from 2.8 to 2.0 seconds for the unmatched '{' example above. The reverse search in the same file takes approximately 4 seconds but this varies depending on some implementation details I'm unclear on (from 3.4 to 4.4 seconds).

    First define a reverse version of SplitFindChar. This takes an end argument instead of a length as there were negative-length termination condition issues at one time but that may be worked on more as the functions should match.

    ptrdiff_t SplitFindCharReverse(const SplitView &view, size_t start, size_t end, int ch) noexcept {
        if (start > end) {
            return -1;
        }
        const char chAsChar = static_cast<char>(ch);
        const size_t length = end - start;
        size_t range1Length = 0;
        if (start < view.length1) {
            range1Length = std::min(length, view.length1 - start);
        }
        if (range1Length != length) {
            const std::string_view v2(view.segment2 + start + range1Length, length - range1Length);
            const size_t match2 = v2.find_last_of(chAsChar);
            if (match2 != std::string_view::npos) {
                return match2 + start + range1Length;
            }
        }
        const std::string_view v1(view.segment1 + start, range1Length);
        const size_t match = v1.find_last_of(chAsChar);
        if (match != std::string_view::npos) {
            return match + start;
        }
        return -1;
    }
    

    The main loop was unfused into forwards and backwards variants as they use different calls.

        if (direction > 0) {
            while (position < LengthNoExcept()) {
                // Find next chSeek
                Sci::Position posSeek = position;
                for (;;) {
                    posSeek = SplitFindChar(cbView, posSeek, LengthNoExcept() - posSeek, chSeek);
                    if (posSeek < 0) { // No matching chSeek
                        return -1;
                    }
                    if ((posSeek > GetEndStyled()) || (StyleIndexAt(posSeek) == styBrace)) {
                        break;
                    }
                }
                for (Sci::Position posBrace = position + 1; posBrace < posSeek; posBrace++) {
                    posBrace = SplitFindChar(cbView, posBrace, posSeek - posBrace, chBrace);
                    if (posBrace < 0) {
                        break;
                    }
                    if ((posBrace > GetEndStyled()) || (StyleIndexAt(posBrace) == styBrace)) {
                        depth++;
                    }
                }
                depth--;
                if (depth == 0) {
                    return posSeek;
                }
                position = posSeek + 1;
            }
        } else {
            while (position >= 0) {
                // Find next chSeek
                Sci::Position posSeek = position;
                for (;;) {
                    posSeek = SplitFindCharReverse(cbView, 0, posSeek, chSeek);
                    if (posSeek < 0) { // No matching chSeek
                        return -1;
                    }
                    if ((posSeek > GetEndStyled()) || (StyleIndexAt(posSeek) == styBrace)) {
                        break;
                    }
                }
                for (Sci::Position posBrace = posSeek + 1; posBrace < position; posBrace++) {
                    posBrace = SplitFindChar(cbView, posBrace, position - posBrace, chBrace);
                    if (posBrace < 0) {
                        break;
                    }
                    if ((posBrace > GetEndStyled()) || (StyleIndexAt(posBrace) == styBrace)) {
                        depth++;
                    }
                }
                depth--;
                if (depth == 0) {
                    return posSeek;
                }
                position = posSeek;
            }
        }
    

    This is just the implementation when there is no potential for character fragment confusion: for DBCS seeking non-safe braces, a variant of the current code would be used.

    Since this is quite long, a lambda was tried for the counting part. This penalized the forwards case 5% but it improved the reverse case 20%. Its better than inlining the same code which I don't understand unless there is a mistake.

        auto countBrace = [this, &cbView, &depth, chBrace, styBrace](Sci::Position start, Sci::Position end) {
            for (Sci::Position posBrace = start; posBrace < end; posBrace++) {
                posBrace = SplitFindChar(cbView, posBrace, end - posBrace, chBrace);
                if (posBrace < 0) {
                    break;
                }
                if ((posBrace > GetEndStyled()) || (StyleIndexAt(posBrace) == styBrace)) {
                    depth++;
                }
            }
        };
    
     
  • Zufu Liu

    Zufu Liu - 2024-11-28

    Find next chSeek block may entered into infinite loop when the found chSeek has different style. Alos it may be worth to cache GetEndStyled() and LengthNoExcept() on stack.

    main loop with DBCS and infinite loop fixed:

    int depth = 1;
    position = useStartPos ? startPos : position + direction;
    const Sci::Position endStylePos = GetEndStyled();
    const Sci::Position length = LengthNoExcept();
    const SplitView cbView = cb.AllView();
    
    if (dbcsCodePage == 0 || dbcsCodePage == CpUtf8 || chBrace < DBCSMinTrailByte()) {
        if (direction > 0) {
            while (position < length) {
                // Find next chSeek
                Sci::Position posSeek = position;
                for (;;) {
                    posSeek = SplitFindChar(cbView, posSeek, length - posSeek, chSeek);
                    if (posSeek < 0) { // No matching chSeek
                        return -1;
                    }
                    if ((posSeek > endStylePos) || (StyleIndexAt(posSeek) == styBrace)) {
                        break;
                    }
                    posSeek++;
                    if (posSeek == length) {
                        return -1;
                    }
                }
                for (Sci::Position posBrace = position + 1; posBrace < posSeek; posBrace++) {
                    posBrace = SplitFindChar(cbView, posBrace, posSeek - posBrace, chBrace);
                    if (posBrace < 0) {
                        break;
                    }
                    if ((posBrace > endStylePos) || (StyleIndexAt(posBrace) == styBrace)) {
                        depth++;
                    }
                }
                depth--;
                if (depth == 0) {
                    return posSeek;
                }
                position = posSeek + 1;
            }
        } else {
            position += useStartPos ? 0 : 1;
            while (position >= 0) {
                // Find next chSeek
                Sci::Position posSeek = position;
                for (;;) {
                    posSeek = SplitFindCharReverse(cbView, 0, posSeek, chSeek);
                    if (posSeek < 0) { // No matching chSeek
                        return -1;
                    }
                    if ((posSeek > endStylePos) || (StyleIndexAt(posSeek) == styBrace)) {
                        break;
                    }
                    posSeek--;
                    if (posSeek < 0) {
                        return -1;
                    }
                }
                for (Sci::Position posBrace = posSeek + 1; posBrace < position; posBrace++) {
                    posBrace = SplitFindChar(cbView, posBrace, position - posBrace, chBrace);
                    if (posBrace < 0) {
                        break;
                    }
                    if ((posBrace > endStylePos) || (StyleIndexAt(posBrace) == styBrace)) {
                        depth++;
                    }
                }
                depth--;
                if (depth == 0) {
                    return posSeek;
                }
                position = posSeek;
            }
        }
    } else {
        while ((position >= 0) && (position < length)) {
            const unsigned char chAtPos = cbView.CharAt(position);
            if (chAtPos == chBrace || chAtPos == chSeek) {
                if (((position > endStylePos) || (StyleIndexAt(position) == styBrace)) &&
                    (position == MovePositionOutsideChar(position, direction, false))) {
                    depth += (chAtPos == chBrace) ? 1 : -1;
                    if (depth == 0)
                        return position;
                }
            }
            position += direction;
        }
    }
    
    return -1;
    
     
  • Neil Hodgson

    Neil Hodgson - 2024-11-29

    That variant has trouble with {{}}. It's easy to have off-by-one problems with empty constructs. My test file is:

    if (1) {
        if () {}
        if (1) {
            if (1) {
                if () {{}}
            }
        }
    }
    

    This went back to working with the setup of position returning to

    position = useStartPos ? startPos : position;
    

    for the main part and the position += adjustment removed from reverse. The initially-adjusted case is moved into the DBCS hard case.

    position = useStartPos ? startPos : position + direction;
    
     
  • Zufu Liu

    Zufu Liu - 2024-11-29

    It still has off-by-one. following is my change fixed off-by-one:

    - position = useStartPos ? startPos : position;
    + position = useStartPos ? startPos : position + direction;
    
    - for (Sci::Position posBrace = position + 1; posBrace < posSeek; posBrace++) {
    + for (Sci::Position posBrace = position; posBrace < posSeek; posBrace++) {
    
    - position += useStartPos ? 0 : 1;
    + position += 1; // include the position to find chSeek
    

    test case for BraceMatchNext() is not good, probably can be removed.

     
  • Neil Hodgson

    Neil Hodgson - 2024-11-30

    Committed test with [4b011b].

     

    Related

    Commit: [4b011b]

  • Neil Hodgson

    Neil Hodgson - 2024-12-18
    • status: open --> closed
     

Log in to post a comment.