Use binary search for WordList::InList() method

Brought to you by: antoniolinares, johnsoonj, kapix93, nyamatongwe, and 2 others

#1312 Use binary search for WordList::InList() method

Milestone: Completed

Status: closed

Owner: Neil Hodgson

Labels: None

Priority: 5

Updated: 2019-12-07

Created: 2019-09-30

Creator: Zufu Liu

Private: No

If we change starts field to record ranges for each initial character, InList() method can be changed to use binary search, which is useful for words that have common prefix, e.g. keywordclass.cpp has 18 words (constinit is a new keyword in n4830) starts with c.

The patch also simplified other while (words[j][0] == firstChar) loop.

2 Attachments

WordList-InList.diff

cpp.properties.diff

Discussion

Zufu Liu - 2019-10-01

The comment need change to about uninitialized ranges

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2019-10-12

There is no explanation of why this change is beneficial. Feature requests should include a rationale for why they are wanted.

The current code works when there are common prefixes. The code does not appear to unify common prefixes to save space as is done, for example, with a trie.

The standard library implementation of binary search should always be used instead of reimplementing it as it is easy to make mistakes when transcribing to a new situation. The standard functions are std::lower_bound and std::upper_bound to

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zufu Liu - 2019-10-12

benefit: reduced comparison to detect whether an identifier is a keyword or not.
e.g. in LexCPP, current inList("ch") compares all keyword starts with c every time.

A hand make binary search is used because call strcmp in lower_bound or upper_bound doesn't help, possible because:
(1) the first character is already compared after take the range.
(2) string in list starts at random position, while argument s is normal well aligned. strcmp will fallback to byte-wise comparison.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Neil Hodgson - 2019-10-24
  
  "reduced comparison" is not a metric that is relevant to users. If you are claiming improved performance then say so and present evidence. Binary searches are often worse than linear searches on short arrays (which is the common case here) due to memory locality issues.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Zufu Liu - 2019-10-24
    
    It's only use binary search when the range has more than four words, which can be changed.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2019-12-07

status: open --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.