If we change starts field to record ranges for each initial character, InList() method can be changed to use binary search, which is useful for words that have common prefix, e.g. keywordclass.cpp has 18 words (constinit is a new keyword in n4830) starts with c.
The patch also simplified other while (words[j][0] == firstChar) loop.
The comment need change to
about uninitialized rangesThere is no explanation of why this change is beneficial. Feature requests should include a rationale for why they are wanted.
The current code works when there are common prefixes. The code does not appear to unify common prefixes to save space as is done, for example, with a trie.
The standard library implementation of binary search should always be used instead of reimplementing it as it is easy to make mistakes when transcribing to a new situation. The standard functions are std::lower_bound and
std::upper_boundtobenefit: reduced comparison to detect whether an identifier is a keyword or not.
e.g. in LexCPP, current
inList("ch")compares all keyword starts with c every time.A hand make binary search is used because call strcmp in lower_bound or upper_bound doesn't help, possible because:
(1) the first character is already compared after take the range.
(2) string in list starts at random position, while argument s is normal well aligned. strcmp will fallback to byte-wise comparison.
"reduced comparison" is not a metric that is relevant to users. If you are claiming improved performance then say so and present evidence. Binary searches are often worse than linear searches on short arrays (which is the common case here) due to memory locality issues.
It's only use binary search when the range has more than four words, which can be changed.