there is one thing about the word definition in Scintilla that I don't like very much, it cannot use context.
My main issue is the selection of floating point numbers in C++ source files. As an example, the number "-1.23e+4f" consists of the 6 words "-", "1", ".", "23e", "+" and "4f", which makes selection by double clicking impossible and navigation by Ctrl+Left/Right slower. It might not be worth it to include the first "-" (would need differentiation of unary/binary plus/minus), but the rest of the number would be pretty easy to be identified as one "word".
My first thought was to take a connected range of the defined style of the current position, but that wouldn't work well in comments and strings. Therefore I propose adding a new method to the lexer interface that could be called instead of Document::ExtendWordSelect(). The default implementation in LexerBase would either call that method (needs access to the document) or behave the same (needs to know defined word characters). The overload in the C++ lexer would only have to check for a number and calculate positions accordingly. If it is no number, it would call the default implementation.
While there might be more uses than better number selection in other lexers, I can't think of any right now. And since numbers look much alike in many languages, maybe there could be a general way without touching the lexers. But there are always exceptions. And while f.e. "12." is a valid double in C++ and some other languages, the dot shouldn't be included when a sentence in normal text ends with an integral number. That's why I think lexers are the right place.
I want to give it a try, but after all the pitfalls I ran into in my first feature patch, I have to ask a few questions first. Do external lexers have to be compiled against the Scintilla version they are used in or must they remain binary compatible (their vtable would not be extended properly without recompilation)? Would that word identification mechanism have to be optional or could it replace the current one? Is there maybe a better approach for this? Are there any reasons against implementing this at all?