#106 use context annotations from the highlighter for ^G

Joe Allen

Right now, when you hit ^G, utomatch tries to ignore comments and strings, but its parsing is limited. It doesn't always work properly, as in bug 2220030. This patch allows utomatch to query the syntax highlighter to determine what is a comment or string.

Each highlighter needs context annotations in order to work for that language. I updated c.jsf, python.jsf, and sh.jsf with the context annotations. Eventually, the other languages could use context annotations, but it is not urgent. The others will still use -pound_comment et al.

In addition to the functional changes, I consolidated code from utomatch and tomatch_word in order to reduce a substantial amount of code duplication.

This patch is on top of patch 3459248.


  • By the way, in this patch, I distinguish between "double_quoted" strings and "single_quoted" strings. Do you think that is desirable, or should they just be "string"s?

    Also in the patch, single-quoted strings are only ignored if the -single_quoted option is on, and similarly for double-quoted strings. If -highlighter_context is on, should ^G ignore all strings independent of whether the -single_quoted or -no_double_quoted options are on?

  • test cases

  • John J. Jordan
    John J. Jordan

    Thanks for the patch, Charles. I've committed some of those fixes in [e41dd4] and then some more that I missed in [034935]. For a change of this magnitude (file formats), I'd like to get jhallen's approval before proceeding.

    But, I've looked at this a bit and come up with a few points...

    • Generally, I think the approach and implementation are solid. Joe alluded to the possibility of creating a special state machine for this sort of thing in [#245]. I think that it would be highly redundant with the syntaxes and that would be a bad thing.
    • I found the "prgetc(); ++col;" and "pgetc(); --col;" to be confusing and counterintuitive at first but eventually figured it out. Comments would help in those spots (for those entire if blocks).
    • I don't think that the extra option is really necessary and certainly not in the ^T menu. The highlighter should be able to detect whether there are any states that have matching context information on them. JOE ought to just always use that since the syntax knows better than the builtin heuristics (which are a fallback in any case).
    • Should we go all in on this and remove some of the heuristics (and even options) from the current implementation once the syntaxes are up to date?
    • Not sure about whether to handle ^G inside comments and strings. I think the current behavior is acceptable in any case.

    Lastly, I've attached the updated patch sans what's been committed (through [034935]), plus a small (related) fix to sh.jsf.



    Bugs: #245
    Commit: [034935]
    Commit: [e41dd4]

  • John J. Jordan
    John J. Jordan

    • assigned_to: Joe Allen
    • Group: --> CVS
  • Joe Allen
    Joe Allen

    I applied this patch to mercurial..

    I found one bug: it would segfault if highlight_context is enabled, a syntax is defined but highlighting is disabled (so there is no line attribute database and you get a NULL pointer dereference). I fixed this by reverting to the built-in heuristics if highlighting is disabled, but maybe it would better to parse the file the first time you hit Ctrl-G.

    Also still thinking if it's a good idea to automatically enable highlight_context if the syntax file is annotated.