Regexp problem - what am I missing, or is this a bug?

2. Help
R Bauer
2014-03-26
2014-03-28
  • R Bauer
    R Bauer
    2014-03-26

    I am using a simple regular expression to match the first space and all characters following to the end of the line. The regexp is:
    \s.*$
    I have the regular expression search mode checked, and the ". matches newline" option unchecked. However, this expression still matches the whitespace at the end of the line and wraps around to match the entire next line.

    Thanks,
    Robert

     
  • dail8859
    dail8859
    2014-03-26

    Its not actually matching the whitespace at the end of the line, its matching the line ending since \s also matches new lines.

     
    • R Bauer
      R Bauer
      2014-03-26

      Thank you!

       
  • THEVENOT Guy
    THEVENOT Guy
    2014-03-26

    Hello Robert,

    dail8859 is quite right about \s regex symbol

    To sum up, with the PCRE ( Perl Common Regular Expression ) engine, used in Notepad++ :

    • \s represents ONE character of the class [\t\n\x0B\f\r\x20\xA0] ( any BLANK/SPACE character )

    • \h represents ONE character of the class [\t\x20\xA0] ( any HORIZONTAL BLANK character )

    • \v represents ONE character of the class [\n\x0B\f\r] ( any VERTICAL BLANK character )

    REMINDERS :

    By ascending order of code-point character :

    • [\b] = \x08, represents the BACKSPACE` character
    • \t = \x09, represents the HORIZONTAL TABULATION character
    • \n = \x0A, represents the LINE FEED character ( UNIX/SOX End of Line )
    • [\v] = \x0B, represents the VERTICAL TABULATION character
    • \f = \x0C, represents the FORM FEED character
    • \r = \x0D, represents the CARRIAGE RETURN character ( OLD MAC End of Line)
    • \x20 represents the usual SPACE character
    • \xA0 represents the NO BREAK SPACE character ( NBSP )

    IMPORTANT :

    • The \s class, of BLANK characters, is the UNION of the two classes \h and \v.

    • Difference between \b and [\b] : \b represents an ASSERTION ( PARTICULAR position in text), but [\b] = \x08 represents the BACKSPACE character.

    • Difference between \v and [\v] : \v = [\n\x0B\f\r] represents ONE character of the VERTICAL BLANK class, but [\v] = \x0B represents the VERTICAL TABULATION character ONLY.


    Let's go back to your problem !

    From above, you can easily notice that your regex isn't precise enough !

    If I understand you, clearly, you would like to select the TOTALITY of each line, WITHOUT the EOL characters, beginning with a SPACE character, wouldn't you ?

    If so, I suggest the simple regex : ^ .*, with a SPACE, before the DOT symbol.

    NOTES :

    • If, in addition, you want to select any EOL characters, this regex becomes :

    ^ .*\R with, again, a SPACE, before the DOT symbol.

    • If you also consider the TABULATION character, just change the SPACE by [\t ], with a SPACE, before the ENDING square bracket.

    • If each line MUST contain characters, after the FIRST space, at position 1, change the STAR symbol, by the + symbol.

    Oh, I forgot to say that the . matches newline option must, of course, be unchecked.

    Cheers,

    guy038

    P.S. :

    You will find good documentation, about the new PERL Common Regular Expressions (PCRE), used by N++, since the 6.0 version, at the TWO addresses below :

    http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

    http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

    The FIRST link concerns the syntax of regular expressions in SEARCH

    The SECOND link concerns the syntax of regular expressions in REPLACEMENT

     
    Last edit: THEVENOT Guy 2014-03-26
    • GerdB
      GerdB
      2014-03-27

      Hello THEVENOT Guy,

      just to clarify: Notepad++ uses the boost::regex library in Perl-compatible Mode. There is also the PCRE-Library hosted at http://www.pcre.org which provides Regular Expressions with almost the same syntax, but is otherwise a completely different beast.

       
  • THEVENOT Guy
    THEVENOT Guy
    2014-03-28

    Hello GerdB,

    Thanks you for this additionnal technical point. I was mistaken and used a shortcut :-((

    So, N++ doesn't include the true PCRE library, but rather the boost::regex library, in Perl-compatible mode, if I fully understood what you said !

    This obviously explain the small differences with the true PCRE library.

    Then, when I'll speak about Notepad++ S/R, in regex mode, I try not to forget this precision !

    Cheers,

    guy038

     
    Last edit: THEVENOT Guy 2014-03-28