Menu

#1312 Use binary search for WordList::InList() method

Completed
closed
None
5
2019-12-07
2019-09-30
Zufu Liu
No

If we change starts field to record ranges for each initial character, InList() method can be changed to use binary search, which is useful for words that have common prefix, e.g. keywordclass.cpp has 18 words (constinit is a new keyword in n4830) starts with c.

The patch also simplified other while (words[j][0] == firstChar) loop.

2 Attachments

Discussion

  • Zufu Liu

    Zufu Liu - 2019-10-01

    The comment need change to about uninitialized ranges

     
  • Neil Hodgson

    Neil Hodgson - 2019-10-12

    There is no explanation of why this change is beneficial. Feature requests should include a rationale for why they are wanted.

    The current code works when there are common prefixes. The code does not appear to unify common prefixes to save space as is done, for example, with a trie.

    The standard library implementation of binary search should always be used instead of reimplementing it as it is easy to make mistakes when transcribing to a new situation. The standard functions are std::lower_bound and std::upper_bound to

     
  • Zufu Liu

    Zufu Liu - 2019-10-12

    benefit: reduced comparison to detect whether an identifier is a keyword or not.
    e.g. in LexCPP, current inList("ch") compares all keyword starts with c every time.

    A hand make binary search is used because call strcmp in lower_bound or upper_bound doesn't help, possible because:
    (1) the first character is already compared after take the range.
    (2) string in list starts at random position, while argument s is normal well aligned. strcmp will fallback to byte-wise comparison.

     
    • Neil Hodgson

      Neil Hodgson - 2019-10-24

      "reduced comparison" is not a metric that is relevant to users. If you are claiming improved performance then say so and present evidence. Binary searches are often worse than linear searches on short arrays (which is the common case here) due to memory locality issues.

       
      • Zufu Liu

        Zufu Liu - 2019-10-24

        It's only use binary search when the range has more than four words, which can be changed.

         
  • Neil Hodgson

    Neil Hodgson - 2019-12-07
    • status: open --> closed
     

Log in to post a comment.