Regexp problem - what am I missing, or is this a bug?

R Bauer
2014-03-26
2014-03-28
  • R Bauer

    R Bauer - 2014-03-26

    I am using a simple regular expression to match the first space and all characters following to the end of the line. The regexp is:
    \s.*$
    I have the regular expression search mode checked, and the ". matches newline" option unchecked. However, this expression still matches the whitespace at the end of the line and wraps around to match the entire next line.

    Thanks,
    Robert

     
  • dail8859

    dail8859 - 2014-03-26

    Its not actually matching the whitespace at the end of the line, its matching the line ending since \s also matches new lines.

     
    • R Bauer

      R Bauer - 2014-03-26

      Thank you!

       
  • THEVENOT Guy

    THEVENOT Guy - 2014-03-26

    Hello Robert,

    dail8859 is quite right about \s regex symbol

    To sum up, with the PCRE ( Perl Common Regular Expression ) engine, used in Notepad++ :

    • \s represents ONE character of the class [\t\n\x0B\f\r\x20\xA0] ( any BLANK/SPACE character )

    • \h represents ONE character of the class [\t\x20\xA0] ( any HORIZONTAL BLANK character )

    • \v represents ONE character of the class [\n\x0B\f\r] ( any VERTICAL BLANK character )

    REMINDERS :

    By ascending order of code-point character :

    • [\b] = \x08, represents the BACKSPACE` character
    • \t = \x09, represents the HORIZONTAL TABULATION character
    • \n = \x0A, represents the LINE FEED character ( UNIX/SOX End of Line )
    • [\v] = \x0B, represents the VERTICAL TABULATION character
    • \f = \x0C, represents the FORM FEED character
    • \r = \x0D, represents the CARRIAGE RETURN character ( OLD MAC End of Line)
    • \x20 represents the usual SPACE character
    • \xA0 represents the NO BREAK SPACE character ( NBSP )

    IMPORTANT :

    • The \s class, of BLANK characters, is the UNION of the two classes \h and \v.

    • Difference between \b and [\b] : \b represents an ASSERTION ( PARTICULAR position in text), but [\b] = \x08 represents the BACKSPACE character.

    • Difference between \v and [\v] : \v = [\n\x0B\f\r] represents ONE character of the VERTICAL BLANK class, but [\v] = \x0B represents the VERTICAL TABULATION character ONLY.


    Let's go back to your problem !

    From above, you can easily notice that your regex isn't precise enough !

    If I understand you, clearly, you would like to select the TOTALITY of each line, WITHOUT the EOL characters, beginning with a SPACE character, wouldn't you ?

    If so, I suggest the simple regex : ^ .*, with a SPACE, before the DOT symbol.

    NOTES :

    • If, in addition, you want to select any EOL characters, this regex becomes :

    ^ .*\R with, again, a SPACE, before the DOT symbol.

    • If you also consider the TABULATION character, just change the SPACE by [\t ], with a SPACE, before the ENDING square bracket.

    • If each line MUST contain characters, after the FIRST space, at position 1, change the STAR symbol, by the + symbol.

    Oh, I forgot to say that the . matches newline option must, of course, be unchecked.

    Cheers,

    guy038

    P.S. :

    You will find good documentation, about the new PERL Common Regular Expressions (PCRE), used by N++, since the 6.0 version, at the TWO addresses below :

    http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

    http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

    The FIRST link concerns the syntax of regular expressions in SEARCH

    The SECOND link concerns the syntax of regular expressions in REPLACEMENT

     
    Last edit: THEVENOT Guy 2014-03-26
    • GerdB

      GerdB - 2014-03-27

      Hello THEVENOT Guy,

      just to clarify: Notepad++ uses the boost::regex library in Perl-compatible Mode. There is also the PCRE-Library hosted at http://www.pcre.org which provides Regular Expressions with almost the same syntax, but is otherwise a completely different beast.

       
  • THEVENOT Guy

    THEVENOT Guy - 2014-03-28

    Hello GerdB,

    Thanks you for this additionnal technical point. I was mistaken and used a shortcut :-((

    So, N++ doesn't include the true PCRE library, but rather the boost::regex library, in Perl-compatible mode, if I fully understood what you said !

    This obviously explain the small differences with the true PCRE library.

    Then, when I'll speak about Notepad++ S/R, in regex mode, I try not to forget this precision !

    Cheers,

    guy038

     
    Last edit: THEVENOT Guy 2014-03-28

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks