Split from [feature-requests:#1500] as a bug report.
RESearch substitution failed for backward search due to bopat and eopat been cleared inside search.Execute().bopat and eopat may contains position in the middle of a character for RESearch and ByteIterator, this can be fixed by call MovePositionOutsideChar() to ensure only whole characters selected (already did for first search.eopat[0]). RESearch backward search does not return longest match (e.g. for \w+ only find one letter/byte). this can be fixed byMatchOnLines(), change the code to Execute(di, search.eopat[0], endOfLine), then fix infinite loop for empty match like \<.eopat[0], then first one is the longest). the code is better to use search.Execute(di, doc->NextPosition(pos, 1), endOfLine) to avoid search from middle of character (e.g. for \w+).#ifdef REGEX_MULTILINE block inside an if forward search (infinite for backward search) to make it usable.if (match[0].first == match[0].second) is doing, as match is not changed by above std::regex_search().
This is related to [bugs:#2157] change
bolto line start position or changeBOWto ignorebolwould fix the infinite loop.Related
Bugs:
#2157backward search for empty match
\bfailed.I push other fixes to https://github.com/zufuliu/notepad2/commit/396089067d63af1dd947801dbbeb9c3560ad1c5d
Diff:
Related
Bugs:
#2244First patch (split out from SubstituteByPosition-1011-2.diff) to fix substitution failure for RESearch backward search, and adding test case.
Probably worthwhile using
std::arrayforbopatandeopatso they can be copied easilybopat = search.bopat.There is warning for
bopat = search.bopat:warning C4701: potentially uninitialized local variable 'bopat' usedUpdated patch to use
std::array.Fixed header order for
<array>.Committed with [82822d] and preparatory [c5acbb].
std::array::fillisn'tnoexceptso propagated toRESearch::Clear.Related
Commit: [82822d]
Commit: [c5acbb]
There is no
bugprone-exception-escapewarning from clang-tidy and VS Code Analysis.move
[[fallthrough]];out from the brace (a compound statement) fixed VS Code Analysis warning:warning C26819: Unannotated fallthrough between switch labels (es.78).VS Code Analysis before removing
noexcept:No warnings from 17.7.5, maybe because I use C++latest.
This (current hg state) fails on Ubuntu Linux 23.10 with g++ 13.2:
It passed on Windows msys2 GCC 13.2 with both UTF8Iterator.
It appears to be a
localeissue with only ASCII characters in thealnumset.workaround based on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63776
OK, [b11098].
Related
Commit: [b11098]
If CRLF is still an issue then this really is just for experimenters not users.
OK, not full tested the three engines.
Yes, each tagged part should be extended to whole characters.
Add
MovePositionOutsideChar()forRESearchandByteIterator.This is a more significant and dangerous change than I first thought. It isn't covered by examples or tests. Moving backwards has a potential for no-progress infinite loops.
RESearchandByteIteratorare byte-string oriented and are unlikely to be made DBCS-safe with minor tweaks. DBCS-compliant regex would be a larger project.There are simple cases with a bracketed regular expression where the visual match doesn't match the tag values so can lose text: replacing
(!.)with[\1], for example. Extending the end of tag ranges (eopats) similarly to the main match is safer.Change for
ByteIteratorseems is safe asPos()is only used to construct finaleopatnot change matching behavior.RESearch.bopatcan be changed at end ofExecute()?Changed
BOTtobopat[static_cast<int>(*ap++)] = ci.MovePositionOutsideChar(lp, -1);to not decreaselp, but ensure tag range include whole character. added test (including failure example for DBCS).I don't understand when replace
(!.)with[\1]would cause data lose, as dot now matches a whole character instead of single byte.Not sure whether is worth to fix DBCS problem (match started from trailing ASCII byte), a simple fix/improvement is run
RESearch::Execute()switch block in a loop, when match is found butlpis inside middle of character, try to match from next character. This is just few lines of change though.it's actually much simple: