See these code snippets
// LexerBase::WordListSet() WordList wlNew; wlNew.Set(wl); if (*keyWordLists[n] != wlNew) { keyWordLists[n]->Set(wl); return 0; }
// LexerCPP::WordListSet() WordList wlNew; wlNew.Set(wl); if (*wordListN != wlNew) { wordListN->Set(wl);
after calling SCI_SETKEYWORDS, the keywords is split and sorted twice if it's initialized empty (default) or set to a different.
Copy wlNew's internal state into target with a method like wordListN->Reset(wlNew), or directly call wordListN->Set(wl) without comparison is more efficient, especially for large set of unsorted keywords.
Patches added:
1305-strlen.diff
avoid strlen() in ArrayFromWordList() since the length is already known.
1305-Reset.diff
Added
WordList::Reset(WordList &other)
and applied in LexerBase::WordListSet(). Apply Reset() to other files in lexers folder will need individual commits.Replaced nullptr with 0.
Another patch: move equal comparison into
bool WordList::Reset(const char *s)
, replacevoid Reset(WordList &other)
.From the point of view of the client code, is there a benefit from having both Set and Reset or should Set's implementation work like Reset and return a bool?
I found the name Reset is just confusing, let Set return bool is good.
Possible, the following (without the complicated to compare every word in the set) would good enough for most application:
1. most keywords are fixed (hard-coded in application or it's configuration file) when applying a lexer.
2. words that dynamic collected and set by application trend to changed on each set
list
differs froms
in that it has NULs replacing spaces after each word (WordList.cxx line 53) so a simple strcmp wont work.An easy implementation is to always call ArrayFromWordList and sort the input and then compare the result.
BTW, onlyLineEnds was for SciTE's use when WordList was shared with SciTE. It should probably be deprecated then removed in the next major release.
OK, update the patch. onlyLineEnds and other lexers is changed.
Committed with a minor formatting change and some unit tests as [f08486].
If there is one lexer where you think this is strongly beneficial then that can be updated. Please do not send patches for other lexers.
Related
Commit: [f08486]
Other lexers (that inherited from DefaultLexer) will not updated by me, I not use these lexers: they yield bigger binary than old stateless ColouriseDoc/FoldDoc lexers.
Maybe LexSQL (many keywords), LexCPP (used by many language), LexHTML (many keywords) can be changed to use the new method.
Since adopting the new
Set
method doesn't break lexer's functionality, I think LexPython and other lexers not maintained by external developers can also adopting the changes. When people write new lexers or update existing lexers, they are most likely follow these well-written lexers. This is an opportunity to remove all double-set codes.Which is what I do not want to see. Making multiple similar changes over a project leads to mistakes as the developer stops looking closely and testing each instance. Global changes of technique are one of the biggest causes of instability.
The possibility of new bugs has to be balanced against a small increase in performance that is likely not noticeable to users.
Committed with minor formatting change and some unit tests as [f08486].
If there is one lexer where you think this is strongly beneficial then that can be updated. Please do not send patches for other lexers.
Related
Commit: [f08486]