Win32: ListBoxX::GetDesiredRect should use correct technology to measure width
Fixed in v5.5.6.
Update for endStyled += length; isn't consistent: one before the notification one after.
OK.
truncation can be used to simplify lexer: StyleContext could use styler.ColourTo(currentPos - 1, state); instead of styler.ColourTo(currentPos - ((currentPos > lengthDocument) ? 2 : 1), state);. trailing word classify block (e.g. in LexHTML) could be omitted by increase end position/length like StyleContext does. for LexHTML it's only needed for PHP where ?> can be omitted and is preferred to avoid extra content.
(position < 0) || (length <= 0) || ((position + length) > style.Length()) assuming initial (after call Document::StartStyling()) endStyled is within document position range, this changed previously truncation behavior, though updated endStyled may larger then document length. Setting styles for positions outside the range of the buffer is safe and has no effect.
Source code with your CopyBytes is shorter (msvc cl /utf-8 /W4 /c /EHsc /std:c++20 /O2 /GS- /GR- /FAcs /DNDEBUG /I../include CellBuffer.cxx not inline CopyBytes), but msvc generated asm with my SetStylesRange is shorter. changes compared to SetStyles0319Range.diff. - ++length; - if (index < length) { + if (index <= length) { + ++length; template <bool segment1> void SetStyleRange(char *data, char styleValue, Sci::Position length, StyleChangeRange &range, Sci::Position position) noexcept { Sci::Position...
parallelized (also find_first_not_of and find_last_not_of) looks overkill for a 4KB range.
segment1 template argument is attempt to make compiler inline the function (as it only instantiated once), without it msvc will generate a function call for second segment (though removed range.Empty() check for first segment).
Return style changed range version of above changes.
It can be changed to return a pair/range to simplify function signature, if the range is empty (end position is zero), then no style changes.
A different implementation for precise range changes: bool SetStyleRange(char *data, char styleValue, Sci::Position length, Sci::Position &startMod, Sci::Position &endMod) noexcept { Sci::Position index = 0; while (index < length && data[index] == styleValue) { ++index; } --length; while (length > index && data[length] == styleValue) { --length; } ++length; if (index < length) { memset(data + index, static_cast<unsigned char>(styleValue), length - index); startMod = index; endMod = length; return...
The result for https://dumps.wikimedia.org/enwiki/20260301/enwiki-20260301-pages-articles-multistream11.xml-p6899367p7054859.bz2 (unpack, then add empty line at begging to avoid brace match) on my system: SciTE Old SciTE this patch Notepad4 2940 2720 1850 Styling performance for SciTE is measured with following changes: SciTEGlobal.properties file.size.large=2147483647 file.size.no.styles=2147483647 diff -r 1bab1e108d98 src/SciTEIO.cxx --- a/src/SciTEIO.cxx +++ b/src/SciTEIO.cxx @@ -56,6 +56,9 @@...
Styling time is reduced by 1/3 (measured with Notepad4) with the patch, not measured byte comparison/updating/precise range version.
Styling time is reduced by 1/3 with the patch, not measured byte comparison/updating/precise range version.
I think simplest fix is just using for (Sci::Position pos = 0; pos < LengthNoExcept(); pos++) complex fix is update length = LengthNoExcept(); at end of DeleteChars() / InsertString() block.
Document::ConvertLineEnds() does not convert all line-ends
if (dragDropEnabled && !sel.Range(selectionPart).Empty()) { inDragDrop = DragDrop::initial; }
It's same as Document::SetStyleFor(), which report entail range as changed.
Optimize `Document::SetStyles()` and `Document::SetStyleFor()`
cppcheckgui.exe crashed (without messages, where can I find crash logs?) when open and check https://github.com/zufuliu/notepad4/blob/main/Notepad4.cppcheck but command line cppcheck.exe --project="D:\notepad4\notepad4\build\VisualStudio\Notepad4.sln" --project-configuration="Release|x64" finished without crash and shows some [ctuOneDefinitionRuleViolation] warnings at end. It seems command line cppcheck.exe not support Cppcheck own project file? cppcheck.exe --project="D:\notepad4\notepad4\Notepad4.cppcheck"...
The background for cache indicator values is that I want (for URL highlighting) a hotspot style like indicator that changes text color and draw underline on mouse hover, this is link/URL style in many websites, e.g. https://github.com/ScintillaOrg/lexilla/issues, https://en.cppreference.com/w/cpp.html, https://learn.microsoft.com/en-us/windows/win32/apiindex/windows-api-list So IsDynamic() and OverridesTextFore() will getting complex, currently I added bool hoverUnderline into Indicator class (code...
No, only merge the two std::any_of() loops doesn't reduce binary size. bool flagDynamic = false; bool flagSetFore = false; for (const Indicator &indicator : indicators) { flagDynamic |= indicator.IsDynamic(); flagSetFore |= indicator.OverridesTextFore(); } indicatorsDynamic = flagDynamic; indicatorsSetFore = flagSetFore;
Cache IsDynamic() and OverridesTextFore() for Indicator
Check monospaced and average character width
Rectangular selection indent incorrect selection modes and positions
0x80 can be omitted from CP932/Shift_JIS, as it just maps to U+0080 C1 control character in CP932, and unsupported in other Japanese encodings: >>> pages = ['cp932', 'shift_jis', 'shift_jis_2004', 'shift_jisx0213', 'euc_jp', 'euc_jis_2004', 'euc_jisx0213'] >>> [b'\x80'.decode(page, 'backslashreplace') for page in pages] ['\x80', '\\x80', '\\x80', '\\x80', '\\x80', '\\x80', '\\x80'] >>> though omit it will cause a visual change: rendered by platform as invisible/box/question block vs rendered by Scintilla...
OK.
off-topic the single byte range for CP932/Shift-JIS seems contains EUDC (end user defined character?). https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit932.txt vs https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT EUDC from https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/ Code Page EUDC Control 932 0xa0, 0xfd - 0xff 936 0xff 949, 950 0xff 0x80 1361 0xd4 - 0xff 0x80 - 0x83 Should these EUDC also be included?
not sure whether it will cause problem on earlier or non-Windows systems. Tested following code on XP, Vista and Win7: #include <windows.h> #include <stdio.h> int main(void) { char chars[2] = {'\x80'}; wchar_t code[2] = {0}; int len = MultiByteToWideChar(936, 0, chars, 1, code, 2); return printf("len=%d, code=%04X\n", len, code[0]); } the output is len=1, code=20AC as on Win 10 and 11.
cp936/GBK treat 0x80 as valid single byte
The goal without change how aveCharWidth is measured is change how monospaceASCII is computed. Patch changed monospaceASCII to (maxWidth - minWidth)/minWidth < monospaceWidthEpsilon => (maxWidth - minWidth) < minWidth * monospaceWidthEpsilon, as minWidth is used to render ASCII characters.
It seems system (checked wine) also use English letters to measure average width. See GdiGetCharDimensions() at https://gitlab.winehq.org/wine/wine/blob/master/dlls/gdi32/text.c which is used by dialog creation at https://gitlab.winehq.org/wine/wine/blob/master/dlls/user32/dialog.c
OK, just fell code with SetWindowSubclass is clean.
SciTEWin::AboutMessage() (in SciTEWinDlg.cxx) uses GWLP_WNDPROC and GWLP_USERDATA.
I think use GWLP_USERDATA on lb is fine as it's system ListBox control which private to and used by ListBoxX (the wid container). Update win32/makefile.
Looking source code for SetWindowSubclass() at https://gitlab.winehq.org/wine/wine/blob/master/dlls/comctl32/commctrl.c, it's just wrap for SubclassWindow() and CallWindowProc(), maybe inefficient (use GetProp() to retrieve callback function and data). The change also added comctl32.dll dependency for Scintilla.dll, may not a problem for GUI app/lib. With old approach, ListBoxX::ControlWndProc can still be eliminate (to avoid PointerFromWindow(::GetParent(hWnd))) with something like SetWindowLongPtr(lb,...
Change ListBoxX to using modern control subclassing
Modularising the HTML lexer Will do some experiments later, as I already made a separated PHP lexer (with complete PHP styling, more JavaScript styles, also basic CSS highlighting) at https://github.com/zufuliu/notepad4/blob/main/scintilla/lexers/LexPHP.cxx Rewrite the lexer or new lexers with StyleContext would improve readability and maintainability, but this may slow down loading large program generated HTML/XML file.
SCE_HB_OPERATOR isn't needed if client-side VBScript is removed. keep sync client/server side VBScript styles just to make the lexer simple. Reclaiming the client-side VBScript and client-side Python style IDs needs a plan and possibly a separate new lexer. for backward compatibility reason, they probably needs to be kept for a while. possibly a separate new lexer. Current HTML lexer can be separated by server side languages or template: 1. A base HTML/XML lexer that contains SGML, tag, JavaScript...
Maybe this (adding SCE_HB_OPERATOR and SCE_HBA_OPERATOR) can be implemented now as more styles can be used and VBScript is dead, so it's no longer need to reserve styles.
Yeah, pass function is good. not sure whether it has security or related problem, so it's declared as reference in the patch. using CreateFoldMapFunc = void (&)(int codePage, FoldMap *foldingMap);
It may be possible to expose FoldMap to the DBCS branch of Document::FindText in a way that the compiler will be able to inline. I tried following in Notepad4 but abandoned due to binary size increment (not figured out why). 1. move CaseFolderDBCS into CaseFolder.h. 2. cast pcf.get() to CaseFolderDBCS pointer. 3. use following method to replace bytes folding: const char *FoldDualByte(unsigned char lead, unsigned char trail) const noexcept { const uint16_t ind = DBCSIndex(lead, trail); const char...
The condition could be simplified by ensure mixed is NUL-terminated (const char *search parameter) or larger than lenMixed (local const char bytes).
not the more complex cases. Some code pages (949, 1361) contains complex cases, e.g. ß to ss, but the output is still same 2-bytes length.
BothFold() and FindText() loops don't check DBCS trail byte (unlike UTF-8), filling foldingMap with any value for invalid bytes should not mater, invalid bytes rarely match. A more correct/robust implement would like UTF-8 CaseConvertString(): Character has no conversion so copy the input to output.
It looks a bug in the Unifont font itself, it's tmAveCharWidth is same as tmMaxCharWidth.
will decrease the value and may lead to unexpected changes that are visible to users. Only happens for proportional font, so does SurfaceImpl::AverageCharWidth() in PlatCocoa.mm. That's a little strange. Are you trying to handle bi-width fonts where the Chinese characters should be exactly double the width of Roman characters? Unifont is ASCII monospaced, but GDI SurfaceGDI::AverageCharWidth() value is 2x wider than ASCII letter, use it (instead of monospaceCharacterWidth) inside PositionCache::MeasureWidths()...
Since the loop is 1:1 input:output, and the output is commonly conservatively over-allocated, the bounds checks can probably be moved outside the loop. There's a minor hassle with an odd final lead byte but that could possibly be handled after the loop. pcf->Fold() is only used inside Document::FindText(), and is over-allocated, boundary checks inside Fold() can all be omitted. constexpr size_t maxFoldingExpansion = 4; for DBCS can also be omitted. for UTF-8, maybe it can be reduced, e.g. use maxExpansionCaseConversion...
both are slow than a plain loop like following: auto it = positions.begin(); XYPOSITION prev = *it; XYPOSITION maxWidth = prev; XYPOSITION minWidth = prev; while (++it != positions.end()) { const XYPOSITION current = *it; const XYPOSITION value = current - prev; prev = current; maxWidth = std::max(maxWidth, value); minWidth = std::min(minWidth, value); }
std::max_element() + std::min_element() can be merged into single std::minmax_element() as we don't care which one is min/max.
Check monospaced and average character width
std::map is not nothrow default constructable Seems only has problem for MSVC, see https://godbolt.org/z/3EGa5TMaT, the default constructor (dynamic initializer for cpToFoldMap_V0) does a 64 KB allocation, while libc++ and libstdc++ just zero some bytes. it's some sort of cost for app not even use DBCS encoding. In CaseFolderDBCS() constructor: if (!DBCSHasFoldMap(cp)) { CreateFoldMap(cp, DBCSGetMutableFoldMap(cp)); } foldingMap = DBCSGetFoldMap(cp); The three lookups can be merged into one, e.g....
Change CodePageToFoldMap to std::vector (but with custom structure other than std::pair to avoid create large temporary on stack). Other changes have minor speed or code size optimization, CreateFoldMap(): https://github.com/zufuliu/notepad4/blob/main/scintilla/win32/ScintillaWin.cxx#L3076 CaseFolderDBCS::Fold(): https://github.com/zufuliu/notepad4/blob/main/scintilla/win32/ScintillaWin.cxx#L3122 and when ch2 not trail byte: const char *pair = foldingMap.at(ind).chars; if (pair[0]) { folded[lenOut++]...
it only succeeds for low surrogates and lets high surrogates through. copy & paste error, correct would be 0xD800 <= ch <= 0xDFFF (merge the two tests or from UniConversion.h). The surrogate test isn't needed Indeed, result is same. There are some uses of more explicit formatting These (string concatenation or percent formatting) could be found with pylint, e.g. C0209: Formatting a regular string which could be an f-string (consider-using-f-string)
Sorry, I'm not work on this bug. I just feel both the changes have Implicit assumption about selections and selType retains during Indent() operation, but origin code does not require the assumption.
Unicode 16.0
Unicode 16.0
Oh, not tested that.
Just needs ThinRectangularRange();, no need to change selType.
Just needsThinRectangularRange();, no need to change selType.
That works, changes with less code also works: @@ -4155,6 +4155,10 @@ void Editor::Indent(bool forwards, bool lineIndent) { UndoGroup ug(pdoc); + const Selection::SelTypes selType = sel.selType; + if (sel.IsRectangular()) { + sel.selType = Selection::SelTypes::stream; + } for (size_t r=0; r<sel.Count(); r++) { const Sci::Line lineOfAnchor = pdoc->SciLineFromPosition(sel.Range(r).anchor.Position()); @@ -4231,6 +4235,8 @@ } } } + sel.selType = selType; + ThinRectangularRange(); ContainerNeedsUpdate(Update::Selection);...
without SetRectangularRange, both caret and anchor positions moved to right, and keeps no text selected. with SetRectangularRange, caret positions sticked at zero column, anchor positions moved to right, keeps first column selected, which is unexpected. SetRectangularRange | Editor::ButtonUpWithModifiers(class Scintilla::Internal::Point,unsigned int,enum Scintilla::KeyMod):5269 ToString: R#2,0-2; 0,1,2 | Editor::SetRectangularRange(const struct std::source_location):581 SetRectangularRange range:...
Not find a fix, the bug can be reproduced in SciTE with similar code, e.g.: case IDM_HELP: { wEditor.SetTabIndents(false); wEditor.SetUseTabs(true); wEditor.Tab(); wEditor.SetUseTabs(props.GetInt("use.tabs", 1)); wEditor.SetTabIndents(props.GetInt("tab.indents", 1)); return;
Remove unnecessary `InvalidateStyleRedraw()` for `Message::SetUseTabs`
Wonder how YX Hao identified the bug and fix. Apply 1567-Notepad4-debug-1013.diff onto today's Noetpad4 code, build (e.g. click batch file inside build\VisualStudio\ folder) and run Notepad4.exe from terminal, open a 2 bytes file (\n\n, just two new-lines), hold Alt key to make rectangular selection for the three lines, press Ctrl + Tab twice. log without InvalidateStyleRedraw() for Message::SetUseTabs: SetRectangularRange | Editor::ButtonUpWithModifiers(class Scintilla::Internal::Point,unsigned...
The mentioned code block inserts raw tab (\t) character at caret position: case CMD_CTRLTAB: SciCall_SetTabIndents(false); SciCall_SetUseTabs(true); SciCall_Tab(); SciCall_SetUseTabs(!fvCurFile.bTabsAsSpaces); SciCall_SetTabIndents(fvCurFile.bTabIndents); break; 1567before.gif is screenshot with InvalidateStyleRedraw(), 1567after.gif is screenshot without InvalidateStyleRedraw().
As here are only five DBCS code pages, for most application only one (the system DBCS code page for East Asian) is active used, I think CodePageToFoldMap can be changed to std::vector<std::pair<int, FoldMap>>. std::vector is noexcept default constructible but std::map is not, using std::vector may benefit load/unload Scintilla.dll (avoid running complex constructor/destructor?). For Notepad4, I just put FoldMap into CaseFolderDBCS, I think 64KB extra memory per document is not a thing, open 1000...
Monospace font should be the default for hex files
useTabs field is only used inside Document::SetLineIndentation() and Editor::Indent() methods, call to InvalidateStyleRedraw() can be eliminated.
InvalidateStyleRedraw() inside Editor::SetSelectionFromSerialized() can be replaced with just Redraw().
Redraw after restore selection
Looks related to [bugs:#2295] and [feature-requests:#1485].
Conflict between SC_MULTIAUTOC_EACH and SCI_AUTOCSETDROPRESTOFWORD
Add noexcept to COM methods
Optimize CreateFoldMap
Updated test code, removed redundant res.ptr == sv.data() check in Selection-0629.diff (std::from_chars()returns errc::invalid_argument when no digit is found) . sv.empty() check for mainRange block can be omitted since here should be at least one digit after selType or mainRange.
Updated test code. sv.empty() check for mainRange block can be omitted since here should be at least one digit after selType or mainRange.
Patch without forward lookup, it makes Scintilla.dll smaller.
Putting the main range first may be OK but it adds an extra ','. It went at the end as it's not needed for some uses so could be optional. Here is at least one SelectionRange, the comma is needed to separate numbers. mainRange is decoded first and used for both branches inside Selection(std::string_view sv), so it looks good to encode it first. using sv.find() (memchr like, should has not performance problem) to separate numbers / ranges looks to me just decode from random position and discard any...
Optimize selection serialization
It works the same way as EditModel::CurrentSurfaceMode() for Editor::CreateDrawingSurface(), code page 0 is CP_ACP. unicodeMode filed and parameter can be removed.
Or ListOptions could be extended to add codePage property like SurfaceMode.
Change ListBox to use SetFont(std::shared_ptr<Font> font)
A possible fix is directly passing code page (instead of Unicode mode) to list box, but this maybe a broken change on platform interface. ac.Start(wMain, idAutoComplete, sel.MainCaret(), PointMainCaret(), - lenEntered, lineHeight, IsUnicodeMode(), technology, options); + lenEntered, lineHeight, CodePage(), technology, options);
(Win32) Sourcing autocompletion candidates from DBCS files has regressed since 5.5.6
earlier return inside LineLayout::SubLineFromPosition() is still not fixed.
D2D bug with font family name and font face name
Close as fixed: font stretch is supported since 5.5.2, auto-completion list box is rendered with same technology as editor window since 5.5.6.
Autocomplete list in 5.5.6 (Windows)
OK. Notepad4 implemented custom Wrap::Word to allow break between ASCII punctuation, CJK character and other character with different CharacterClass ([feature-requests:#1223]), optimize for most common ASCII character make sense (avoid call CharacterBoundary(), CharacterAfter() and WordCharacterClass()), I just see the optimization also applies for current Scintilla.
This change makes LineLayout::WrapLine() for Wrap::WhiteSpace and Wrap::Word 2x~3x faster for ASCII long lines, so reduce total time for EditView::LayoutLine() and Editor::WrapLines().
LineLayout::WrapLine() could be optimized further: 1. For Wrap::WhiteSpace, since only C0 control characters are checked, CharacterBoundary(pos, -1) can be replaced with just pos--, this still yield valid character position. Using separator loops for Wrap::WhiteSpace and Wrap::Word improve speed for both. 2. For Wrap::Word, as last byte for previous character is known (chars[pos - 1], already loaded into register), Document::BraceMatch() like maxSafeChar can be used to reduce call to CharacterBoundary(pos,...
Simplify and optimize WrapLine()
Changed to posInLine -= FlagSet(pe, PointEnd::subLineEnd) ? 1 : 0;.
swap comparison to posInLine < lineStarts[line] looks would fix the bug. or lineStarts[line] > posInLine.
swap comparison to posInLine < lineStarts[line] looks would fix the bug.
Rewrite LineLayout::SubLineFromPosition() to avoid out of bound access when line = lines - 1.
Crash inside Editor::Paint()
The crash is due to styling inside StyleAreaBounded(). Paint before StyleAreaBounded 00000223964009A0 StyleAreaBounded scrolling=0, area=5723, max=5723 StyleToPositionInView pos=5723, endWindow=5723, endStyled=295 FoldChanged 18 FoldExpand 18, expanding=1 FoldExpand 85 SetScrollBars SetScrollBars ChangeSize DropGraphics SetScrollBars FoldChanged 19 FoldChanged 19 SetVisible SetScrollBars FoldChanged 20 FoldChanged 20 SetVisible ... FoldChanged 214 FoldChanged 215 FoldChanged 216 Paint after StyleAreaBounded...