Menu

Does N++ Support REGEX With Multiple Lines?

2009-02-13
2012-11-13
  • Speedy Cheng

    Speedy Cheng - 2009-02-13

    Hi.

    I wanted to search for all occurrences of /* ... */ and replace with a blank.

    I can do this if /* ... */ is on one line, but not if it's spread over multiple lines; e.g.

    /*
    ...
    */

    Searching the forum, there was a post 2 years ago that says REGEX can't handle newline characters?

    Is there a way to achieve the above?

     
    • Fool4UAnyway

      Fool4UAnyway - 2009-02-13

      So why would you start yet another thread about it?

      I wanted to make a suggestion, but that wouldn't work, because it would involve recursively create linebreak markers in regex mode and then replacing these linebreak markers in non-regex mode.

      So I just started a small extensive investigation, if I could possibly trick the regex engine. I have had some occasions on which the CR of the CR LF linebreak would get included in the found match. I succeeded in reproducing this behavior. I yet had to extend this "greedy" behavior by the LF character also, to get a little further.

      Currently, I can turn the comment block into a single line, which you can easily remove by replacing it in non-regex mode.

      Here's are the golden instructions with the magical outcome.

      Move the cursor to the top of the document: press Ctrl+Home.

      Press Ctrl+R to open the Advanced Find/Replace Dialog.

      Check the Regular Expr checkbox.
      (Uncheck the Selection (NR) checkbox.)
      (Uncheck the Wrap checkbox.)
      Check the Recurse Repl checkbox.

      In the Find field, enter:
      ^/\*[^/*]*[^*/][^/]$

      In the Replace field, enter:
      /* ____ without the underscore, but with an additional space

      If you replace all occurrences, your comment blocks will turn into lines containing the following text: "/* */" without the quotes. The additional space character is just to make those lines more clear.

      Now you can replace all those lines, again even in regex mode.

      Move the cursor to the top of the document: press Ctrl+Home.

      Press Ctrl+R to open the Advanced Find/Replace Dialog.

      In the Find field, enter:
      ^/\* *\*/*[^/][^/]$
      (There may be additional space characters between /* and */.)

      Clear the Replace field.

      Replace all occurrences.

      This should do the trick!

      Here is the magic of the first regular expression:

      ^/\* ____ searches for lines starting the comment block

      [^/*]* __ searches for any character not being / or *

      So this possibly won't work with lines containing separate occurrences of those characters.

      For all other lines, the effect will be a match of the complete rest of the line (up to the linebreak)

      [^*/] ___ this is some magic: it selects yet another character

      Because we are at the end of the line, it selects the CR symbol. The / character should also be excluded here to avoid wrong matches.

      [^/] ____ this is some more magic: select the LF symbol also

      $ _______ we don't want to replace the comment block ending!

      Just sit back, be amazed for a while, and figure how the second regex works with the same charm as the first.

      Fool4UAnyway, thank me VERY much!!!

       
    • Speedy Cheng

      Speedy Cheng - 2009-02-14

      Thanks Fool4UAnyway.

      Unfortunately, that does seem to work for me.

      1. ^/\*[^/*]*[^*/][^/]$

         Ends up finding the " /* "

         It doesn't

      2. ^/\* *\*/*[^/][^/]$

         Finds nothing at all.

      I did manage to find a tedious way of doing this ... kinda going along your first suggestion.

      a. Remove all line breaks with a dummy word 'dummy' [using extended search]
          I now have a single line of text
      b. Replace "*/" with  "*/" + linebreak [using extended search]
         So now each occurrence of /* */ displays on one line
      c. Use regex to get rid of /*... */
      d. replace 'dummy' with line breaks [using extended search]

      Though I was hoping to find a less tedious, convoluted solution like yours!!! :-P

       
      • Fool4UAnyway

        Fool4UAnyway - 2009-02-14

        > 1. ^/\*[^/*]*[^*/][^/]$

        > Ends up finding the " /* "

        > It doesn't

        What do you mean to say? Is the last line (a) complete(d) (sentence)?

        Of course, if it (does or) does not find the match correctly, you can try to reduce the regex to figure out what might be causing the mismatch. (I assume you didn't include the space character added to each line of messages on these forums.)

        I tried it again.
        It works for me.
        Do you have any white space preceding your comment blocks?
        My regex doesn't take those into account, but it would be hard to do:

        ^[ \t]*/\*.....

        Did you (un)check all options in the Ctrl+R Advanced Find/Replace Dialog (not the standard Ctrl+F/H dialog!).

        > 2. ^/\* *\*/*[^/][^/]$

        > Finds nothing at all.

        It should find comment blocks turned into single lines. If step 1 didn't work, it's not strange that step 2 won't work either.

        ^/\* +\*/*[^/][^/]$ might be a better regex, because I added the additional space character to the /* comment block start anyway.

        If you keep difficulties getting this to work, perhaps you could post an example of your comment block(s). Feel free to replace any words you don't want to be public.

         
        • Fool4UAnyway

          Fool4UAnyway - 2009-02-14

          > Check the Recurse Repl checkbox.

          This is an essential part of the magic!
          This option is not in the standard Ctrl+F/H Find/Replace dialog box.

          Your "dummy" solution is better than what I had in mind, because it's cycle once only process. I had in mind doing that iteratively with _only_ the comment block lines. The number of cycles would depend on the maximum number of lines of the comment blocks... That explains the "Fool" part of my name, I guess.

          If you can't get it to work, perhaps you could describe in some more detail what trying the above _does_ do and results in.

           
    • Speedy Cheng

      Speedy Cheng - 2009-02-15

      Hi Fool4UAnyway.

      I did manage to find one tedious solution ... I was hoping for something more elegant like what you are trying to show me!
      Maybe you can simplify the below?

      Assume I have:

      TEXT1
      /*
      ... 1 ...
      */
      TEXT2
      TEXT3  /*
                    ... 3 ...
                 */

      1. I first replace all newlines with a 'dummy' word (replacing expression \r\n with 'dummy'):

      TEXT1dummy/*dummy...1...dummy*/dummyTEXT2dummyTEXT3  /*dummy...3...dummy*/dummy

      2. Then replace all '*/' with '*/' + newline (using expression, replacing */ with */\r\n

      TEXT1dummy/*dummy...1...dummy*/
      dummyTEXT2dummyTEXT3  /*dummy...3...dummy*/
      dummy

      3. Now I've got EACH '/* ... */' to display in a single row.

      4. I can now use REGEX to remove '/* ... */' for they no longer include newlines: [ ]*/\*.**/
          I added [ ]* to account for leading spaces

      TEXT1dummy
      dummyTEXT2dummyTEXT3
      dummy

      5. Put the newlines back in (replace dummy with \r\n)
      TEXT1

      TEXT2

      TEXT3

      6. Get rid of blank lines by looking for \r\n\r\n

      It's convoluted, it's tedious, but it works!

       
      • Fool4UAnyway

        Fool4UAnyway - 2009-02-15

        Could you please answer the questions in my previous mail?

        I am trying to get a grip on the results you (do not) get.

        I am not sure if there is an easy way to improve your solution.
        There is an option to Delete Blank Lines in the Text FX Edit submenu. This may just be a little easier.

        So, currently, it is not clear to me what happens when you try my method.

        I am using Notepad++ 5.0 ANSI version. I am not aware of any changes made to the Text FX Plugin.

         
        • Speedy Cheng

          Speedy Cheng - 2009-02-16

          Yes, in my original reply to your first response I meant to say it did *not* work for me :-(

          I used Ctrl+R for your examples ...

           
        • Fool4UAnyway

          Fool4UAnyway - 2009-02-15

          You say you are using Extended search mode in your method.

          Are you using the Standard Find/Replace Dialog for my method, as well? That is, are you using Ctrl+H (or Ctrl+F) instead of Ctrl+R for the Advanced Find/Replace Dialog?

          I just tried the Standard Find/Replace Dialog.
          This doesn't work with the method I described.

          However, it is possible to achieve the same.

          Be sure to put the cursor on top of the document.

          Open the Ctrl+H Standard Replace Dialog.
          Choose the regular expression search mode.

          In the Find Field, enter:
          ^/\*[^/*]*[^*/][^/]$

          In the Replace Field, enter:
          /* _ (do not include the underscore used to show the space)

          Now there are two options:

          1. You press the Replace All button until no more occurrences are found.

          2. You check Wrap mode and press the Replace button until no more occurrences are found.

          I recommend using option 1.

          Now you should be able to remove all single line block comments using the other regular expression.

           
          • Fool4UAnyway

            Fool4UAnyway - 2009-02-15

            > Now there are two options:

            > 1. You press the Replace All button until no more occurrences are found.

            > 2. You check Wrap mode and press the Replace button until no more occurrences are found.

            > I recommend using option 1.

            I don't know why Wrap mode isn't effectively applied when using the Replace All button. I guess it should concatenate the first first two lines of every comment block on each cycle, until all comment blocks are on one line.

            I consider this a bug.

             
            • Fool4UAnyway

              Fool4UAnyway - 2009-02-15

              I just tried the Advanced Find/Replace Dialog (Ctrl+R) with Wrap Mode checked and Replace Recursive unchecked. This does (not do) two things I expect.

              1. When replacing each single match manually using Replace & Find Again, at the end of the document the search stops. It should start again at the top of the document. The Standard Replace Dialog Ctrl+H does do this.

              2. When replacing all matches using Replace Rest, the results are the same as with the standard dialog: again, the search isn't restarted from the start of the document until no more matches are found.

               
          • Speedy Cheng

            Speedy Cheng - 2009-02-16

            Hi.

            I was wondering if you can show me what your code is trying to accomplish?

            It seems to search for each occurrence of '/*', continue to the end of the line, and replace with '/*'?
            Plus, in N++ 5.2, it seems to skip the first line of the document?

            I wanted to remove ALL occurrences of '/* ... */', whether this all appeared on one line, or spread across multiple lines.

            My regex,   [ ]*/\*.**/   works only if '/* ... */' is all on one line.

            I also noticed that my regex will have the following.

            Given: TEXT1 /* ... 1 ... */ TEXT2 /* ... 2 ... */
            My regex selects:  /* ... 1 ... */ TEXT2 /* ... 2 ... */

            i.e. It does not select each '/* ... */' pair individually, but as one long string from the first '/*' in the line to the last '*/'!!!
                  This is no good as TEXT2 would also be deleted, but I want to keep it.

            So in my workaround, it was a good idea to create a newline after each '*/', resulting in at most one '/* ... */'  per line.

             
            • Fool4UAnyway

              Fool4UAnyway - 2009-02-16

              > My regex,
              > [ ]*/\*.**/
              > works only if
              > '/* ... */' is
              > -all- _the only text_
              > on one line.

              Try the following regex:
              [ ]*/\*[^*]**/

              This is a non-greedy regular expression: it doesn't accept "anything" (= everything) after the comment block start /*, but _only_ all characters as long as they are not *.

              Like I said before, there could be *'s and /'s around in your comment text, but I guess it's easier to use this non-greedy method, than to have an extensive method working around these occurrences.

              However, I still don't have any clue about what goed wrong when you try to use my method. All I know now is that you confirmed that it doesn't work.

               
    • Speedy Cheng

      Speedy Cheng - 2009-02-16

      Hi.

      I finally got the following regex, which is a slightly modified version of your original regex,  to work:

      ^[ ]*/\*[^[/]*+]*[^*/][^/]$

      Previously, it would ignore '/' + multiple '*', e.g. /******* .... ***/

      I think the reason it did not work for me the first time was because I missed the part about setting the 'Recurse Repl' checkbox (D'oh!)

      I'm still not sure how it's able to span across the linebreaks, but hey - it works!

      Thanks for all your help!

       
      • Fool4UAnyway

        Fool4UAnyway - 2009-02-16

        > I think the reason it did not work for me the first time was
        > because I missed the part about setting the 'Recurse Repl'
        > checkbox (D'oh!)

        I guess I must have forgotten about that, never mentioned it and never asked you about it to reassure you did use all the right options. Did I mention that this is an essential part of the magic work?

        > I'm still not sure how it's able to span across the
        > linebreaks, but hey - it works!

        This Recursive Replace option simply says: if I did replace anything, should I start looking again from the point I started?

        So, after removing the linebreak at the end, it looks again at the start of the line. Does it (still) start with /*? Does it end on */? No, then remove the new linebreak. Etc.

        > I finally got the following regex, which is a slightly
        > modified version of your original regex, to work:

        > ^[ ]*/\*[^[/]*+]*[^*/][^/]$

        ^ __________ look from the start of the line
        [ ]* _______ accept any space characters, " *" would do also
        / __________ match /
        \* _________ match *
        [^[/]*+]* __ AFAIK, accept anything but [, / ] * or +
        ____________ this usually will "eat up" the rest of the line
        [^*/][^/]$ _ magic to include the linebreak characters
        ____________ AND assure that the line does not end on */

        > Previously, it would ignore '/' + multiple '*',
        > e.g. /******* .... ***/

        I had no clue about multiple *'s following the /.
        You never gave any to me!

        > Thanks for all your help!

        SHOW ME THE MONEY!
        Don't send it - I won't accept it.
        Just show it.

         
        • Fool4UAnyway

          Fool4UAnyway - 2009-02-16

          Some corrections:

          Earlier, I suggested:

          > Try the following regex:
          > [ ]*/\*[^*]**/

          > This is a non-greedy regular expression: it doesn't
          > accept "anything" (= everything) after the comment block
          > start /*, but _only_ all characters
          > as long as they are not *.

          To be totally transparent and correct, the regex should have been:
          [ ]*/\*[^*]*\*/
          with the _character_ * at the end entered as \* instead of *.
          This makes a difference.

          You regex:

          ^[ ]*/\*[^[/]*+]*[^*/][^/]$

          > [^[/]*+]* __ AFAIK, accept anything but [, / ] * or +
          > ____________ this usually will "eat up" the rest of the line

          This probably does NOT what you expect.

          It's even DANGEROUS. Right now, it made Notepad++ crash. I tried it on my sample text (also a while before), and noticed it found for example two occurrences, highlighting the match. When looking for the next match, it would say "Found!" but highlight nothing. I now pushed the Find button a couple of times in a row... Notepad++ got CPU-hungry. I killed it.

          So, what _does_ it do?

          [^[/]*+]*

          The pain in the ass may be the final part:
          ]* _______ This on its own would mean: any ], but none is OK as well.

          Does it work like that? Let's change * into +, meaning: at least one is required.

          OK, now I find occurrences in a sample text that make sense. It matched just line endings or line parts with the *.

          So, if this final part is a regex itself, what does the rest do?

          [^[/]*+]*

          From end to start:

          ]* _______ any string of consecutive ]'s or none will also be matched
          + ________ this does not require a + character, it turns out
          __________ so it must require "at least one match" of the preceding part
          * ________ this does not require a * character, it turns out
          __________ so it must require "any number of matches" of the preceding part

          [^[/]_____ is the first part of the regex.

          ] ________ is just the closing of the opening [, grouping characters that may be entered literally
          ^ ________ means: match none of the following characters
          [/ _______ these are the _only_ characters to not match!

          So, this regex says:

          Accept any character that is not [ or /,
          Accept any consecutive character that is not [ or /,
          "Accept/Require at least one match of this kind of string"
          Followed by any consecutive string of ] characters or none

          I guess I can "accept" that this regex also matches... nothing. But I'd rather not have Notepad++ again.

          Try this regex on your document to verify this behavior:
          [^[/]*+]+

          Getting a little dizzy here?