Scintilla / Feature Requests / #1009 Language specific word selection

#1009 Language specific word selection

Milestone: Completed

Status: open

Owner: nobody

Labels: None

Priority: 5

Updated: 2013-08-17

Created: 2013-08-17

Creator: Neomi

Private: No

Hi,
there is one thing about the word definition in Scintilla that I don't like very much, it cannot use context.

My main issue is the selection of floating point numbers in C++ source files. As an example, the number "-1.23e+4f" consists of the 6 words "-", "1", ".", "23e", "+" and "4f", which makes selection by double clicking impossible and navigation by Ctrl+Left/Right slower. It might not be worth it to include the first "-" (would need differentiation of unary/binary plus/minus), but the rest of the number would be pretty easy to be identified as one "word".

My first thought was to take a connected range of the defined style of the current position, but that wouldn't work well in comments and strings. Therefore I propose adding a new method to the lexer interface that could be called instead of Document::ExtendWordSelect(). The default implementation in LexerBase would either call that method (needs access to the document) or behave the same (needs to know defined word characters). The overload in the C++ lexer would only have to check for a number and calculate positions accordingly. If it is no number, it would call the default implementation.

While there might be more uses than better number selection in other lexers, I can't think of any right now. And since numbers look much alike in many languages, maybe there could be a general way without touching the lexers. But there are always exceptions. And while f.e. "12." is a valid double in C++ and some other languages, the dot shouldn't be included when a sentence in normal text ends with an integral number. That's why I think lexers are the right place.

I want to give it a try, but after all the pitfalls I ran into in my first feature patch, I have to ask a few questions first. Do external lexers have to be compiled against the Scintilla version they are used in or must they remain binary compatible (their vtable would not be extended properly without recompilation)? Would that word identification mechanism have to be optional or could it replace the current one? Is there maybe a better approach for this? Are there any reasons against implementing this at all?

Discussion

Neil Hodgson - 2013-08-17

Lexers may be built as a DLL or .so and used against arbitrary Scintilla releases so the interface between Scintilla and lexers must remain binary compatible. Versioning this interface is a big deal. There are over a hundred lexers and any new feature is likely to only slowly be implemented.

Lexing is a progressive action and portions of the file may contain incorrect lexical information. Word-based features, such as 'highlight all occurrences' may work on this incorrect information and give different results if the definition of words changes based on lexical class. Extra lexing can be done to ensure accuracy but that will cost performance.

It may be possible to improve word selection of numbers but its more complex than it appears initially so there may have to be options to decide whether to use it and how it would operate.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Neomi - 2013-08-18
  
  Seems pretty risky in that case, too risky in fact. I guess my best bet is to update a selection in a SCN_DOUBLECLICK notification by way of an optional editor function or plug in.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Language specific word selection

Group

Searches

Help

#1009 Language specific word selection

Discussion