Scintilla / Feature Requests / #1108 Autoindenting wordwrap for Markdown

Neil Hodgson - 2015-05-31

Have you looked at the wrapping modes and options http://www.scintilla.org/ScintillaDoc.html#LineWrapping ?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Peter Klenner - 2015-05-31

None of the available options ('same', 'fixed', 'indent') seem to accomplish this feature:

Mode 'same' yields the same indent for all sublines beginning with the first line.

Mode 'fixed' works for one particular indentation, but not for others. All but the first line are indented by a fixed number of characters. Won't work for different indentation levels.

Mode 'indent' is suspiciously close to such a feature, and I recognise accumulated indentation levels for '-', followed by '--', etc. But the actual indentation is NOT equivalent to what is set via SCI_SETWRAPSTARTINDENT. (BTW, mode 'fixed' is reacting perfectly fine on SCI_SETWRAPSTARTINDENT.)

It seems to be matter of getting SCI_SETWRAPSTARTINDENT and SC_WRAPINDENT_INDENT to work together. Any pointers?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Neil Hodgson - 2015-06-02
  
  Start at EditView.cxx line 516.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Peter Klenner - 2015-06-03

I came up with the following solution. I added a new wrapIndentMode SC_WRAPINDENT_LIST to count the number of '-', '+', '*', tabs and spaces at the start of a line and to use a multiple of this number as wrapAddIndent.
Works like a charm for me ;)

if (vstyle.wrapIndentMode == SC_WRAPINDENT_INDENT) { wrapAddIndent = model.pdoc->IndentSize() * vstyle.spaceWidth; } else if (vstyle.wrapIndentMode == SC_WRAPINDENT_FIXED) { wrapAddIndent = vstyle.wrapVisualStartIndent * vstyle.aveCharWidth; // Line 516!! } else if (vstyle.wrapIndentMode == SC_WRAPINDENT_LIST) { // New Branch int i = 0, j = 0; while (ll->chars[i] == '-' || ll->chars[i] == '+' || ll->chars[i] == '*' || IsSpaceOrTab(ll->chars[i])) { if (ll->chars[i] != '\t') j++; i++; } wrapAddIndent = j*vstyle.aveCharWidth; }
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Neil Hodgson - 2015-06-04
  
  That doesn't look like it will work well with proportional fonts. The set of ignored characters is arbitrary.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2018-02-20

assigned_to: Neil Hodgson

Group: Committed --> Completed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Vic - 2018-08-29

I understood one problem is that the set of prefix text (that includes the possible bullets and spacing before main text ) is arbitrary, and you (dev. of scintilla) can't know in advance what prefix an user may need to use, for different languages/ personal styles etc

So, let the user define his possible prefix texts that scintilla will have to skip when aligning the wrapped sublines to the text of Ist subline, then also taking into account how much to indent past the indentation of Ist subline. And the user can specify that per filetype (language/extension ) , of course.
For example, let's say in my own language Mylang (no just Markdown), I need to use, as possible prefixex: "* ", "** ", "1. " and "$ $ "
Then I would specify in my property file for Mylang:
prefixes_to_skip="* ";"** ";"1. "; "$ $ " or, easier,
prefixes_to_skip=* ;** ;1. ;$ $ ;
In addition, I would also specify (in Scite) wrap.indent.mode=1. With such settings, it should wrap like this:

$ $ asld;fkja sljaf fak jk asdfasdfasl;kfja;lsdfkas;lfdkj * asdfasdfasdf asf asfd asf asddf asdfa sdfasdf asdf asfdasdf Asdfasdf asdfasfd asdf asdf asdf asdfa sdf adsdf

And with wrap.indent.mode=2 it should do instead (assuming indent width =4)

$ $ asld;fkja sljaf fak jk asdfasdfasl;kfja;lsdfkas;lfdkj * asdfasdfasdf asf asfd asf asddf asdfa sdfasdf asdf asfdasdf Asdfasdf asdfasfd asdf asdf asdf asdfa sdf adsdf

The specified prefix such as "$ $ " would tell scintilla how much extra space to add to the indentation of the sublines. That extra space can be computed in average character widths. I guess it can't be computed in terms of pixels, because that depends on the font family in use -- but you know better such details.

Last edit: Vic 2018-08-30
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Colomban Wendling - 2018-08-30
  
  This sounds flexible, but a little tedious on the part of the consumer, as it has to list all possible prefixes.
  
  It could be more interesting to have a way of asking the lexer where is the visual indent for a line, as the lexer will have a lot more information to make that decision that arbitrary prefixes, as it knows the syntax and has access to current style information.
  
  You could imagine the lexer being able to answer something like SCI_GETLINECONTENTSPOSITION(line) -> pos which the wrapping code could then use to get perfect visual indentation (as it has access to actual offset positions on screen). Such a method could even be used by applications to e.g. try and move the cursor there instead than at the SCI_GETLINEINDENTPOSITION position (or even smart home could be made to use that via a setting).
  The method name is hypothetical and can be discussed, but the idea would be having a way to query where the user contents starts rather than some syntactic stuff. It would give the position after bullet points for markup lists, after comment delimiters, etc.
  
  However, one thing this doesn't solve is handling optimal wrapping of XML, as it would depend on where the wrap would actually happen: ideally, XML would wrap like that:
  
  <foo>lorem ipsum dolor sit ⏎ amet</foo><bar>lorem ipsum</bar> <foo>lorem ipsum dolor sit amet</foo><bar>lorem ⏎ ipsum</bar> <foo>lorem ipsum dolor sit amet</foo> ⏎ <bar>lorem ipsum</bar>
  
  With the proposition above, the first example would be easy to implement, but the second one not so much. To support all of it, it would require the wrapping code to ask for the wrapping point at a specific poition, not a line, something like SCI_GETVISUALWRAPALIGNPOSITION(pos) -> pos or similar, that given a position that would be the start of the wrapped line, returns a position (before the input argument) with which to align.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Neil Hodgson - 2018-08-31
    
    While this seems complex, it may help even with code like
    
    for (TickReason tr = tickCaret; tr ⏎ <= tickDwell; tr = static_cast⏎ <TickReason>(tr + 1)) {
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Neil Hodgson - 2018-09-04
    
    Can there be a single GetVisualWrapAlignPosition method where simple implementations return the same position for any input position on a line and complex implementations examine the input position? Or should there be 2 methods with GetVisualWrapAlignPosition tried and if it isn't implemented or returns failure, GetLineContentsPosition is called?
    
    In the layout code, there also needs to be rules about what to do if the results are difficult. For example, if the align position is too close to the right hand side, it may need to back off to provide some minimum space.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Vic - 2018-08-30

It could be more interesting to have a way of asking the lexer...

Not all languages have their own lexer, thus it may not know the prefixes for the new language.
For example, I use .txt files with lexer of haskell or python (to get folding based on indentation), and that lexer has no idea of what kind of prefixes I want to use.
Also, even for langs with their own lexer, you may want to define your own prefixes and have scintilla wrap nicely.

Last edit: Vic 2018-08-31

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2018-08-30

For example, I use .txt files with lexer of haskell or python (to get folding based on indentation),

There is an "indent" language (SCLEX_INDENT) for that case.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Vic - 2018-08-31
  
  Thanks. I don't see that in Geany though...
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Colomban Wendling - 2018-08-31
    
    Geany don't currently build or give access to all lexers from Scintilla, only the ones that have a built'in usage. This could be changed though, but that's a Geany issue.
    
    Last edit: Colomban Wendling 2018-08-31
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Vic - 2018-08-31

One way for faster user- specification of lots of similar prefixes, is making it accept both literal text, and regex patterns (or just regex)
Example: (assuming { and } will signify regex)
possible_prefix_text=* ;$$ ;{[-]+ };{<[a-zA-Z]+>[ ]*};
The third would cover -, --, --- etc , what Peter Klenner (creator of this ticket) was after.
The forth would cover XML/HTML tags

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Colomban Wendling - 2018-08-31
  
  I guess regexes could work, but this would move the burden of maintainig appropriate rules on each Scintilla consumer. My theory was that wrapping positions are basically language-specific, and as such would be handled by the layer understanding the language, which is the lexer code in Scintilla.
  
  I agree that a customisable list would be flexible for the user, but as mentioned above it's also a burden: a simple example is your part for covering XML: it's actually not accurate, and should probably rather be <[-_[:alpha:]][-_[:alnum:]]*([ \t]+[-_[:alpha:]][-_[:alnum:]]*=('[^']*'|"[^"]*"|[^ >]*))*[ \t]*>[ \t]*> or something along those lines -- I'm not quite certain from the top of my head what a valid tag or attribute name is for example, but you get the point that it's not trivial. [1]
  Also, it would be very tricky to support my second XML wrapping example if wanted (which might or might not be very interesting in practice, but still shows that some cases are tricky), as well as some corner cases in various languages (I can think of aligning on * continuations prefixes in multiline C comments versus a leading multiplication operator – I agree, it's highly unlikely and not a very dramatic false-positive example, but well).
  
  Another reason for relying on the lexer would be performance, especially as wrapping is already quite expansive IIUC.
  Also (not that it'd be a bad thing), I don't think Scintilla currently has any configuration option using regexes, so this would add a new concept.
  
  But in any case, I guess the best solution for all cases would be combining both, e.g. having the lexer able to answer, possibly a property to disable this if wanted, and a mean of adding custom prefixes. Whether it's necessary or worth it is another question.
  
  [1] To be fair if it's only applied to XML documents, a simple <[^/][^>]*> would probably do, but that won't be the case for all cases.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Neil Hodgson - 2018-09-01
  
  Regexes are likely adequate for most cases. If Unicode prefixes are to be supported (and they probably should be) then both the list of prefixes and the document text should be decoded into wide character strings as is done for searching in Document.cxx. This is a fairly large block of code that will have to be extracted into something that can be used from multiple clients.
  
  The performance cost of running regex code should only be paid when a set of prefixes has been set and wrap mode is on.
  
  If this feature is to be added into lexers then there will be additions to the ILexer and/or IDocument interfaces to ask for and set wrap align positions. IDocument is easier to extend as there is really only one implementation (unless an external lexer wants to work with multiple major releases of Scintilla) but ILexer is more difficult as it is implemented on every lexer which may mean deferring changing it to the next major release.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Vic - 2018-09-01

Colomban Wendling:

But in any case, I guess the best solution for all cases would be combining both, e.g. having the lexer able to answer, possibly a property to disable this if wanted, and a mean of adding custom prefixes.

I totally agree with this. In addition to tags and keywords, another obvious prefix for lexers is comment symbol(s).

The reasons I defend the custom prefixes option are that:

It will work when the lexer does not know about the prefixes one needs. Examples:

Using bulleted lists inside a simple text document,

Bulleted lists inside a programming language's multiline comment blocks

Since it is very general, if it is easier to implement than the in-lexers option, then you could use it meanwhile (even if more tedious), untill the lexers option is implemented.
.

It seems to me that, in practice, one would only really need to specify custom prefixes via regexes for cases of very many prefixes with a pattern, which would be covered by a in-lexer option, if available.
If that is so, and if regex adds that much performance overhead, perhaps regex should be only a temporary solution untill the in-lexer implementation is ready ( while still keeping the option of literal (non-regex) custom prefixes permanently).

Last edit: Vic 2018-09-01
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2018-09-03

If someone wants to implement regular expressions here, its probably reasonable to avoid the iterator code from Document.cxx and just convert the whole line to a std::wstring for UTF-8 mode before checking against a prepared std::wregex starting at the indent position. For non-UTF-8 do the same with std::string and std::regex. For UTF-8, the returned position will have to be converted back to bytes. The regex/wregex can be created when the regex is set and should be emptied when the encoding is changed. Its a little clumsy having both a regex and a wregex variable but compiling a regular expression object from a string for each line is likely expensive.

Since regular expressions allow alternates "a|b|zz", the API can accept a single regular expression simplifying the code (no splitting the input then looping over the parts) and avoiding specifying a separator along with a quoting mechanism if the separator is otherwise significant.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Vic - 2018-09-14
  
  I only know very little of C/C++, but maybe I could help with research/ideas/pseudocode/algorithm/discussion/... . (For any of the feature requests I submit)
  
  ... and just convert the whole line to a std::wstring ...
  
  Maybe first 50 or 100 characters, after indent position, should suffice; after all, it should be a "prefix". While a whole line can be a whole page and more. If it helps performance.
  
  ... "a|b|zz" ...
  
  And it's very clear to the end user too.
  
  Last edit: Vic 2018-09-14
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Neil Hodgson - 2018-09-14
    
    Its actually non-trivial to convert a line to a std::wstring since encoding conversion is mostly handled in platform layers and they don't currently expose a method to the platform-independent code to do this.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Vic - 2018-09-15
      
      I just read a bit from this https://stackoverflow.com/questions/402283/stdwstring-vs-stdstring ,
      and what I got is that std::string could hold all ASCII characters (or at least first 127, not sure IIUC), on any platform, without problems.
      I looked at https://www.rapidtables.com/code/text/ascii-table.html , and these first 127 characters are what we are using most of the time anyways.
      So maybe implement just ASCII-based prefixes, for now -- it will still be very useful. The doc can have an explanation line like "Currently only ASCII based prefix texts are supported". ?
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Neil Hodgson - 2018-09-15
        
        It would be reasonable for an initial version to just allow ASCII characters.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Vic - 2018-09-15

Ather possible use of a prefix_text variable, at least for itemized lists, is having the editor insert automatically the same prefix for the next line, when pressing enter, if current line has that prefix. If after that press again enter without typing any other characters, then it would erase the prefix and just move to next line, at indent position.
I.e, just like in more advanced word processors.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Neil Hodgson - 2018-09-15
  
  This is really a separate feature which would be implemented SciTE, not Scintilla.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Autoindenting wordwrap for Markdown

Group

Searches

Help

#1108 Autoindenting wordwrap for Markdown

Related

Discussion