Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#444 Error in syntax highlighting with sub-pattern

release
closed-fixed
Eddy De Greef
Program (402)
5
2004-12-02
2004-11-16
Killer Rex
No

The highlighting pattern becomes completly greedy and
matches all the text until the end of the file if a sub-
pattern is created

For example:

Pattern: ".*?" (?#This is to match a string)

Now it works fine:
Lalala "this is matched" this not "but this is matched
again" fvmds
ddd -this is not matched

But if I add the subpattern

Sub-Pattern: \\x (?#This will match the \x chars)

The previous text is matched until the end of the file.

The version information is:
NEdit 5.5
Sep 30, 2004

Built on: Solaris, Sparc, Forte C
Built at: Sep 30 2004, 15:07:28
With Motif: 1.2.3 [@(#)OSF/Motif Version 1.2.3]
Running Motif: 1.2 [unknown]
Server: Sun Microsystems, Inc. 6410
Visual: 8-bit PseudoColor (ID 0x22, Default)
Locale: C

On a Sun Ultra 5 with solaris 5.8

Discussion

  • Thorsten Haude
    Thorsten Haude
    2004-11-16

    Logged In: YES
    user_id=119143

    Could you attach the part of your nedit.rc with the pattern
    in question?

     
  • Killer Rex
    Killer Rex
    2004-11-17

    Logged In: YES
    user_id=604420

    I've created an .pats file with 2 languajes, one working and
    other not.

    I hope it helps!

    KillerRex

     
  • Killer Rex
    Killer Rex
    2004-11-17

    Gzipped tar with a .pats file and an example

     
  • Eddy De Greef
    Eddy De Greef
    2004-11-17

    • milestone: --> release
    • labels: --> Program
    • assigned_to: nobody --> edg
     
  • Eddy De Greef
    Eddy De Greef
    2004-11-17

    Logged In: YES
    user_id=73597

    Subpatterns are only supposed to work for patterns that have
    a starting and an ending expression. It has never worked for
    patterns consisting of only a single expression.

    I'm not sure why this is; there may be technical reasons.
    Anyway, it should either work or it should be disallowed to
    construct this kind of patterns. I will look into it.

    The proper way to define string matching patterns is to use
    a pattern with starting and ending expressions. Your simple
    pattern will fail if the string contains an escaped quote.
    Look at the string and string escape patterns for C, for
    instance, to see how string matching can be made more robust.

     
  • Killer Rex
    Killer Rex
    2004-11-18

    Logged In: YES
    user_id=604420

    I found a little bit strange that the subpatterns don't work for
    pattern with only one RE (if I have selected some text with a
    RE sounds logic to look inside it for another RE)

    So I've debugged a little bit the code and found where the
    problem is:
    In the line 1741 of highlight.c the selected text is marked until
    the end of the first RE and afterthat the subpattern is search.
    If the pattern has start&end RE it works fine but if does not
    have end RE it select all the text until the end.
    I have added a branch to the if checking the existence of
    endRE and if no endRE is present the subpattern is searched
    in the selected subtext.
    It works fine for me with all the files and syntax I have tried
    but I can't check correctly that all the highlight syntax works
    well (I'm not enough familiar with the nedit source code to
    ensure it)

    I don't know how to submit this changes to the source -or if
    they will be interesting at all, maybe the style is not correct,
    maybe the corrections is a biggest bug- so I attach the new
    highlight.c file to help you. If finally something like this is done
    please let me know :-)

    The bug of the line 1746 continues alive as it is shown with
    the RareMode languaje that I add and the example_2.txt file
    but now works for nearly all the cases except if a
    subexpression of a pattern with begin and end RE match text
    over the endRE matching text

    KillerRex

     
  • Killer Rex
    Killer Rex
    2004-11-18

    Patch to partially solve this bug

     
  • Killer Rex
    Killer Rex
    2004-11-18

    Logged In: YES
    user_id=604420

    I found a little bit strange that the subpatterns don't work for
    pattern with only one RE (if I have selected some text with a
    RE sounds logic to look inside it for another RE)

    So I've debugged a little bit the code and found where the
    problem is:
    In the line 1741 of highlight.c the selected text is marked until
    the end of the first RE and afterthat the subpattern is search.
    If the pattern has start&end RE it works fine but if does not
    have end RE it select all the text until the end.
    I have added a branch to the if checking the existence of
    endRE and if no endRE is present the subpattern is searched
    in the selected subtext.
    It works fine for me with all the files and syntax I have tried
    but I can't check correctly that all the highlight syntax works
    well (I'm not enough familiar with the nedit source code to
    ensure it)

    I don't know how to submit this changes to the source -or if
    they will be interesting at all, maybe the style is not correct,
    maybe the corrections is a biggest bug- so I attach the new
    highlight.c file to help you. If finally something like this is done
    please let me know :-)

    The bug of the line 1746 continues alive as it is shown with
    the RareMode languaje that I add and the example_2.txt file
    but now works for nearly all the cases except if a
    subexpression of a pattern with begin and end RE match text
    over the endRE matching text

    KillerRex

     
  • Killer Rex
    Killer Rex
    2004-11-18

    Logged In: YES
    user_id=604420

    I found a little bit strange that the subpatterns don't work for
    pattern with only one RE (if I have selected some text with a
    RE sounds logic to look inside it for another RE)

    So I've debugged a little bit the code and found where the
    problem is:
    In the line 1741 of highlight.c the selected text is marked until
    the end of the first RE and afterthat the subpattern is search.
    If the pattern has start&end RE it works fine but if does not
    have end RE it select all the text until the end.
    I have added a branch to the if checking the existence of
    endRE and if no endRE is present the subpattern is searched
    in the selected subtext.
    It works fine for me with all the files and syntax I have tried
    but I can't check correctly that all the highlight syntax works
    well (I'm not enough familiar with the nedit source code to
    ensure it)

    I don't know how to submit this changes to the source -or if
    they will be interesting at all, maybe the style is not correct,
    maybe the corrections is a biggest bug- so I attach the new
    highlight.c file to help you. If finally something like this is done
    please let me know :-)

    The bug of the line 1746 continues alive as it is shown with
    the RareMode languaje that I add and the example_2.txt file
    but now works for nearly all the cases except if a
    subexpression of a pattern with begin and end RE match text
    over the endRE matching text

    KillerRex

     
  • Killer Rex
    Killer Rex
    2004-11-18

    Logged In: YES
    user_id=604420

    Sorry for the repetition, bad browser behaviour.

    KillerRex

     
  • Scott Tringali
    Scott Tringali
    2004-11-18

    Logged In: YES
    user_id=11321

    I think if it's not supposed to work, then the UI should
    check for it and flag it as an error, so at least the user
    won't be confused.

    But if we can make it work, that sounds better!

     
  • Eddy De Greef
    Eddy De Greef
    2004-11-18

    Logged In: YES
    user_id=73597

    I fully agree, but I'm not yet convinced that we can make it
    work under all circumstances. If we allow subpatterns of
    single-expression patterns, I can see potential conflicts or
    ambiguities with coloring-only subpatterns. We must make
    sure that the behaviour remains predictable.

    Moreover, this extension would introduce an inconsistency:
    for begin/end patterns, sub-patterns can be used to postpone
    the matching of the end expression, which is a vital feature
    for many pattern sets. For single expression patterns, it is
    technically almost impossible to add such a feature. But
    that is a limitation that I could live with.

    KillerRex, thanks for the debugging. It saves me the trouble
    of pinpointing the problem. Your fix won't work for
    lookahead expressions, though: cutting of the string by
    temporarily inserting a 0 will cause them to fail.

    Regarding line 1746: There is NO bug. What you see is the
    postponing of the end expression matching. "Fixing" that
    would break many patterns. Sub-patterns must have precedence
    over end or error expressions.

    I will try to make it work, but the syntax highlighting code
    is probably the most delicate code in NEdit, and as
    illustrated above, there are several issues that have to be
    resolved or sorted out. And that will take some time...

     
  • Killer Rex
    Killer Rex
    2004-11-19

    Second attemp to find a solution

     
  • Killer Rex
    Killer Rex
    2004-11-19

    Logged In: YES
    user_id=604420

    Ok, second attemp.
    As you point out, my solution was a bigger bug :-)

    I've examined again what is exactly the problem I have found:
    - When the pattern is given by a start and end RE normally
    works fine (the fact that a submatching can move the end RE
    matching sounds extrange to me but after that I realize that
    some of my own languajes use that feature, only I haven't
    identificate it with the subpattern preference)

    - When the pattern is given by only one RE and I want to
    search inside it I can't do it with the Search between
    beginning and ending RE -they don't exist- and using the sub-
    expressionmakes the RE eat up all the text.

    So in this attemp I only split the case when no endRE is
    present and in that case search for the beggining of the
    subpatterns only until the end of the selected text.
    It seems to work. I only found a point: The subpattern can
    actually match text over the end of the selected text but I
    don't know if clasificate this as error, feature, point to have
    into account...
    The biggest problems I can imagine is the decision point I
    have used (checking the presence of an endRE to deduce the
    kind of RE) and the calculus of the length of the RE to specify
    to parseString until where it can look.
    Again, I can only test it with a few languajes (my own ones
    and C/C++, ksh, fortran and a few more) There is some
    languaje specially greedy with the syntax highlighting that I
    can use to check?

    KillerRex

     
  • Eddy De Greef
    Eddy De Greef
    2004-11-26

    • status: open --> pending-fixed
     
  • Eddy De Greef
    Eddy De Greef
    2004-11-26

    Logged In: YES
    user_id=73597

    A subpattern that can extend beyond the boundaries of the
    parent pattern can result in highlighting instabilities, so
    that solution is unacceptable if it allows that. (I've seen
    these instabilities while experimenting)

    The only way to solve this, is to extend the regular
    expression engine such that it does allow imposing a
    boundary till were an expression is allowed to match. This
    is what I have done in CVS: the regular expression engine is
    extended (which required a considerable amount of changes),
    and the boundary information of the parent pattern is now
    used to limit the matching range of the child pattern in
    this particular case. It appears to work fine.
    There are still some issues with the precedence of
    coloring-only sub-patterns and normal sub-patterns, but
    those are only minor and I've explained how this is resolved
    in the on-line help.

    Can you confirm that it works for you?

     
  • Killer Rex
    Killer Rex
    2004-12-02

    • status: pending-fixed --> open-fixed
     
  • Killer Rex
    Killer Rex
    2004-12-02

    Logged In: YES
    user_id=604420

    I have been testing it for some time and seems to work well.
    Now searching for a sub-pattern in a patern works fine.
    The detail is when the pattern and sub-pattern are given as a
    begin and end RE. If the beginRE is found but not the endRE,
    the subpattern match text over the end of parent pattern
    until it founds the endRE or the end of text.
    If the endRE is found outside the parent pattern it is matched
    and the parentRE is extended until the next match of the final
    condition or the end of the text.
    If the endRE is not found the sub-pattern fills all the
    remaining text.
    This effect of displacing the matching of the parent pattern is
    usefull but maybe it must be explained better in the
    documentation (I haven`t revised the last versions, maybe is
    already explained)
    A mechanism to enable/disable this feature can be also quite
    usefull (a new kind of sub-expression or a parent endRE
    matching like & or \1 to put in the errorRE to force the
    subpattern to stop if it found the parent endRE) Maybe the
    idea is beautifull but implies too many changes or inestabilities.

    Thanks!
    KillerRex

     
  • Eddy De Greef
    Eddy De Greef
    2004-12-02

    • status: open-fixed --> closed-fixed
     
  • Eddy De Greef
    Eddy De Greef
    2004-12-02

    Logged In: YES
    user_id=73597

    I have tried to explain it a bit better in the
    documentation: I have explictly mentioned the fact that the
    child pattern can postpone matching of the parent's end
    pattern. I hope it's clear enough.

    The selective enabling of this feature that you propose, is
    probably not that difficult to implement now that the regex
    engine allows confined matches, but I doubt that it is worth
    the trouble.
    To some extent, the same effect can be obtained by copying
    the parents ending pattern to the childs error pattern. It's
    not exactly the same, because the child would be highlighted
    even if only its starting pattern is found, but at least it
    won't overrun the parent's end.

    I'll think about it, but I have my doubts about the
    practical usefulness and it may cluther the user interface.

    I'll close this bug report, since the bug has been fixed.
    Thanks.