BUG: Finding blank lines with RE ^\s*$ doesnt work

2. Help
delbert
2014-02-11
2014-02-27
  • delbert
    delbert
    2014-02-11

    Hi,
    Im using Find dialog to count lines in a file.
    Im using the following RE to find blank lines.
    The lines can contain 0 or more white space.

    The following RE is the correct RE "^\s*$".
    My File looks like this (4 lines, 2 blank lines)
    First Line

    Fourth Line

    It behaves inconsistently and wrongly in Notepad++ 6.5.3
    e.g.
    1. In the Find tab enter Find what.
    Hit count
    - It only counts 1 (not 2) (WRONG)

    1. Go to the Replace tab.
      Enter some string in the Replace with
      Hit Replace All
    2. It replaces 1 line and deletes 1 line. (WRONG)

    3. Change the RE to this "^$"
      In the Find tab enter Find what.
      Hit count

    4. It only counts 0 (WRONG)
      Hit the FindNext multiple times
    5. it finds 1 (WRONG)

    6. Go to the Replace tab.
      Enter some string in the Replace with
      Hit Replace All
      It replaces 2 line (Correct).

    It looks like there is some overly greedy matching that is matching newline chars.

     
  • dail8859
    dail8859
    2014-02-12

    I'm using v6.4.3 and it looks like its doing the same thing. I guess for now you can use "^\s*?\r\n" assuming you are using windows line endings. Also this will not catch the case where the very last line in the file is blank.

     
  • THEVENOT Guy
    THEVENOT Guy
    2014-02-27

    Hello Delbert, Dail8859 and All,

    Sorry, I was "absent" from N++ forums since about 3 weeks. In the meanwhile, I hope you have been able to solve your problem. However, I would like to add some precisions :

    1)

    The problem is that the escape character \s represents, in fact, one of the characters of the range [\t\n\x0B\f\r\x20\xA0]. So, sometimes the regex ^\s*$ matches more than one line ! As blank lines are, generally, only composed of spaces and/or tabulations, the **\s** class can be restricted to the range [\t ], with a space before the closing square bracket !

    2)

    When you're searching for ^\s*$, the resulting string matched may be a zero length string, between the two anchors ^ and $. In that case, due to improvements of the S/R engine, introduced by Dave Brotherstone and François-R Boyer, in 6.3.0 N++ version, when you click on the Find Next button, you get a calltip "zero length match", each time you meet a true blank line : NO character, followed by EOL character(s).

    3)

    In addition, if you count the number of occurrences of ^\s*$, with the Count button, these zero length matches are NOT part of counting. ( BTW, don't forget that count always concern ALL the file and not the part from the cursor location to the end of file, even if the wrap around square box is unset !)


    So, it always better to search or count non zero length matches ! If you do so, the number of found occurrences and the number of replacements will be exact :-) But, how to match empty lines with a non zero regex ? Well, you just have to count the EOL characters with the regex ^[\t ]*\R. Remember that the \R form matches the two characters \r\n, in a Windows file, the single character \n in an Unix/OSX file or character \r in an old MAC file !

    Of course, as the EOL is part of the find regex, it needs to be re-written \r\n in the replacement part ( or \n or \r )

    To sum up, if you want to count or change, for example, any EMPTY line, or a line with ONLY spaces and/or tabulations by a ten dashes line, use the S/R below :

    SEARCH : ^[\t ]*\R with a SPACE before the ending square bracket

    REPLACE : ----------\r\n

    NOTE : If the LAST line of your file ends with EOL character(s), with ONLY the cursor on the NEXT line, the count of that zero length line is impossible but a replacement is still possible with the S/R below, as the \z represents the very end of the file :

    SEARCH : ^[ \t]*\z

    REPLACE : ----------

    Hope this post will help you a bit :)

    guy038

     
    Last edit: THEVENOT Guy 2014-02-27