Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#592 Negated escape sequences misinterpreted in character class

open
nobody
None
5
2007-07-25
2007-07-25
Joerg Fischer
No

The negated escape sequences \L, \S, \D, \W, ... are misinterpreted inside a character class. That is, [\S] matches \s, but it really should match the opposite.

(Perl does it right.)

Discussion

  • Tony Balinski
    Tony Balinski
    2007-07-26

    Logged In: YES
    user_id=618141
    Originator: NO

    I have a patch for this. It adds a few more character tables so that the negated charset's characters can be added to the []-bracketed custom charset, as is the case for the positive charset escapes.

    Since I can't attach it here directly, you can find it here: http://ajbj.free.fr/nedit/nedit5.5dev/patches/NegatedEscapesInClassesFix.diff

    Interestingly, I notice that (?n\W) does not match newlines (my patch allows (?n[\W]) to do so, which is rather inconsistent). This is true also for \L, \D. Also \y without (?n ) around it will match newline. I believe these to be faults. What about you?

     
  • Thorsten Haude
    Thorsten Haude
    2007-12-27

    Logged In: YES
    user_id=119143
    Originator: NO

    Jörg's summary from an onlist discussion:

    1. (?N ) grouping by default, meaning treat \n as special char by
    never matching it unless \n is given explicitly. This implies
    that dot, [^...], and all the escape sequences can't match \n.

    2. (?n ) grouping, meaning drop NEdit's convention to treat \n
    specially and do in effect a Perl-like matching of newlines. That
    is, dot does not match \n, [^...] does match \n if it is not
    listed, and escape sequences do what they stand for: \s, \y, \D,
    \L, and \W match newlines, \S, \Y, \d, \l and \w do not.

    3. For the sake of completeness, invent a (?s ) grouping named after
    Perl's s(ingle line) modifier. This matches newlines like (?n )
    and in addition forces dot to match \n, too.