Menu

#1305 Optimize Lexer's WordList.Set() method

Committed
closed
5
2019-10-24
2019-07-16
Zufu Liu
No

See these code snippets

// LexerBase::WordListSet()
WordList wlNew;
wlNew.Set(wl);
if (*keyWordLists[n] != wlNew) {
    keyWordLists[n]->Set(wl);
    return 0;
}
// LexerCPP::WordListSet()
WordList wlNew;
wlNew.Set(wl);
if (*wordListN != wlNew) {
    wordListN->Set(wl);

after calling SCI_SETKEYWORDS, the keywords is split and sorted twice if it's initialized empty (default) or set to a different.
Copy wlNew's internal state into target with a method like wordListN->Reset(wlNew), or directly call wordListN->Set(wl) without comparison is more efficient, especially for large set of unsorted keywords.

Discussion

  • Zufu Liu

    Zufu Liu - 2019-07-16

    Patches added:

    1305-strlen.diff
    avoid strlen() in ArrayFromWordList() since the length is already known.

    1305-Reset.diff
    Added WordList::Reset(WordList &other) and applied in LexerBase::WordListSet(). Apply Reset() to other files in lexers folder will need individual commits.

     
  • Zufu Liu

    Zufu Liu - 2019-07-16

    Replaced nullptr with 0.

     
  • Zufu Liu

    Zufu Liu - 2019-07-16

    Another patch: move equal comparison into bool WordList::Reset(const char *s), replace void Reset(WordList &other).

     
  • Neil Hodgson

    Neil Hodgson - 2019-09-27

    From the point of view of the client code, is there a benefit from having both Set and Reset or should Set's implementation work like Reset and return a bool?

     
  • Zufu Liu

    Zufu Liu - 2019-09-28

    I found the name Reset is just confusing, let Set return bool is good.

    Possible, the following (without the complicated to compare every word in the set) would good enough for most application:
    1. most keywords are fixed (hard-coded in application or it's configuration file) when applying a lexer.
    2. words that dynamic collected and set by application trend to changed on each set

    bool WordList::Set(const char *s) {
        if (list != nullptr && strcmp(list, s) == 0) {
            return false;
        }
        // rest code
        return true;
    }
    
     
    • Neil Hodgson

      Neil Hodgson - 2019-09-28

      list differs from s in that it has NULs replacing spaces after each word (WordList.cxx line 53) so a simple strcmp wont work.

      An easy implementation is to always call ArrayFromWordList and sort the input and then compare the result.

      BTW, onlyLineEnds was for SciTE's use when WordList was shared with SciTE. It should probably be deprecated then removed in the next major release.

       
  • Zufu Liu

    Zufu Liu - 2019-09-29
    • summary: Optimize Lexer's WordListSet() method --> Optimize Lexer's WordList.Set() method
     
  • Zufu Liu

    Zufu Liu - 2019-09-29

    OK, update the patch. onlyLineEnds and other lexers is changed.

     
  • Neil Hodgson

    Neil Hodgson - 2019-09-30
    • labels: lexer --> lexer, wordlist, optimization
    • Group: Completed --> Committed
     
  • Neil Hodgson

    Neil Hodgson - 2019-09-30

    Committed with a minor formatting change and some unit tests as [f08486].

    If there is one lexer where you think this is strongly beneficial then that can be updated. Please do not send patches for other lexers.

     

    Related

    Commit: [f08486]

    • Zufu Liu

      Zufu Liu - 2019-09-30

      Other lexers (that inherited from DefaultLexer) will not updated by me, I not use these lexers: they yield bigger binary than old stateless ColouriseDoc/FoldDoc lexers.

       
      • Zufu Liu

        Zufu Liu - 2019-09-30

        Maybe LexSQL (many keywords), LexCPP (used by many language), LexHTML (many keywords) can be changed to use the new method.

         
  • Zufu Liu

    Zufu Liu - 2019-09-30

    Since adopting the new Set method doesn't break lexer's functionality, I think LexPython and other lexers not maintained by external developers can also adopting the changes. When people write new lexers or update existing lexers, they are most likely follow these well-written lexers. This is an opportunity to remove all double-set codes.

     
    • Neil Hodgson

      Neil Hodgson - 2019-09-30

      This is an opportunity to remove all double-set codes.

      Which is what I do not want to see. Making multiple similar changes over a project leads to mistakes as the developer stops looking closely and testing each instance. Global changes of technique are one of the biggest causes of instability.

      The possibility of new bugs has to be balanced against a small increase in performance that is likely not noticeable to users.

       
  • Neil Hodgson

    Neil Hodgson - 2019-10-24
    • status: open --> closed
     
  • Neil Hodgson

    Neil Hodgson - 2019-10-24

    Committed with minor formatting change and some unit tests as [f08486].

    If there is one lexer where you think this is strongly beneficial then that can be updated. Please do not send patches for other lexers.

     

    Related

    Commit: [f08486]


Log in to post a comment.