For each htm files inside this folder, remove anything before keyword1 and after keyword2 ?

2. Help
David
2014-03-15
2014-05-24
  • David
    David
    2014-03-15

    Hello,

    First thanks for creating this great freeware. ;)

    I have many htm files in a folder.
    I would like that Notepad++ opens one file, then delete anything before keyword1 and after keyword2. Then save and close the htm file.
    And repeat the same process for all the other htm files in the folder.

    Is it possible ?
    Many thanks in advance ;)
    David
    Win8.1 64bits

     
    Last edit: David 2014-03-15
  • David
    David
    2014-03-16

    Hello,

    Here are the regex that seems to work for me :

    1) cut everything after keyword2 :
    with regex +match new line checked
    replace
    keyword2.*
    with
    nothing
    + hit replace

    2) cut anything before keyword1 :
    with regex +match new line checked (as long as I am in the beginning of the file)
    replace
    .*?keyword1
    with
    nothing
    +hit replace

    My computer is not powerful enough, so I can't open all files in Notepad++ at once.
    Any idea ?
    Thanks in advance ;)

     
  • David
    David
    2014-03-16

    Hello again,
    I have found a way to do what I want with Dngrep http://code.google.com
    /p/dngrep/
    So no need to help ! ;)

     
  • THEVENOT Guy
    THEVENOT Guy
    2014-03-16

    Hi David,

    Well, you didn't say how many html files are contained in your folder but :

    1)

    For less of about fifteen files, and with the last 6.5.5 version of N++, you'll just have to :

    • Type in a command DOS windows : X:\.....\Notepad++.exe Y:\....\*.html

    • Perform one of the four Search/Replacement, proposed below, on each file opened

    • Save all these files, with the File - Close All menu command

    2)

    For more files, you would preferably use the python script of N++, which in a few lines, would achieve the job. But, as I'm not yet acquainted with Python plugin, I can't help you on that part :-(. Luckily, I'm sure that some guys will be easily able to help you :-)

    I'll just point out the right regular expression search to use !

    As you didn't say if your html files may contain several keywords Keyword1 and/or Keyword2 and, also, if they may be mixed, I 'll consider that the contains of the file are of the general form below :

    .......Keyword1.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword2.......
    .......Keyword2.......

    For any of the four regexes below, that you choose :

    • Go back to the beginning of the current file ( CTRL + Org )

    • Open the Replace dialog ( CTRL + H )

    • Select the Regular expression radio button

    • Uncheck the Wrap around square box

    • Type the regex chosen, from the four regexes below, in the Find what: zone

    • Type in \1 in the Replace With: zone

    • Click on the Replace All button

    Then :

    • Use the regex (?s).*?(Keyword1.*Keyword2).*, if you want to keep ONLY the bold text below

    ( From the first keyword Keyword1 to the last keyword Keyword2 )

    .......Keyword1.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword2.......
    **.......Keyword2**.......

    • Use the regex (?s).*(Keyword1.*?Keyword2).*, if you want to keep ONLY the bold text below

    ( From the last keyword Keyword1 to the first keyword Keyword2, after Keyword1 )

    .......Keyword1.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    **.......Keyword2**.......
    .......Keyword2.......
    .......Keyword2.......

    • Use the regex (?s).*?(Keyword1.*?Keyword2).*, if you want to keep ONLY the bold text below

    ( From the first keyword Keyword1 to the first keyword Keyword2, after Keyword1 )

    .......Keyword1.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword2.......
    .......Keyword2.......

    • Use the regex (?s).*(Keyword1.*Keyword2).*, if you want to keep ONLY the bold text below

    ( From the last keyword Keyword1 to the last keyword Keyword2 )

    .......Keyword1.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword2.......
    .......Keyword2.......

    Notes :

    • If each file contains ONLY ONE keyword ( Keyword1 and Keyword2 ), just use the last regex

    • If you care about the right case of your two keywords, just change, at the beginning of the regex, the modifier (?s) into the modifier (?s-i)

    • If you want, as well, delete the two keywords Keyword1 and Keyword2 :

    Place the opening round bracket right after the keyword Keyword1 and the ending round bracket, right before the keyword Keyword2

    Hope this explanation will be valuable to you, anyway !

    Cheers,

    guy038

     
  • THEVENOT Guy
    THEVENOT Guy
    2014-03-16

    My previous post, although correctly posted, displays the SourceForge message below :

    Post awaiting moderation ! So, I'm trying to post it again. Hope it works, this time !

    Hi David,

    Well, you didn't say how many html files are contained in your folder but :

    1)

    For less of about fifteen files, and with the last 6.5.5 version of N++, you'll just have to :

    • Type in a command DOS windows : X:\.....\Notepad++.exe Y:\....\*.html

    • Perform one of the four Search/Replacement, proposed below, on each file opened

    • Save all these files, with the File - Close All menu command

    2)

    For more files, you would preferably use the python script of N++, which in a few lines, would achieve the job. But, as I'm not yet acquainted with Python plugin, I can't help you on that part :-(. Luckily, I'm sure that some guys will be easily able to help you :-)

    I'll just point out the right regular expression search to use !

    As you didn't say if your html files may contain several keywords Keyword1 and/or Keyword2 and, also, if they may be mixed, I 'll consider that the contains of the file are of the general form below :

    .......Keyword1.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword2.......
    .......Keyword2.......

    For any of the four regexes below, that you choose :

    • Go back to the beginning of the current file ( CTRL + Org )

    • Open the Replace dialog ( CTRL + H )

    • Select the Regular expression radio button

    • Uncheck the Wrap around square box

    • Type the regex chosen, from the four regexes below, in the Find what: zone

    • Type in \1 in the Replace With: zone

    • Click on the Replace All button

    Then :

    • Use the regex (?s).*?(Keyword1.*Keyword2).*, if you want to keep ONLY the bold text below

    ( From the first keyword Keyword1 to the last keyword Keyword2 )

    .......Keyword1.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword2.......
    **.......Keyword2**.......

    • Use the regex (?s).*(Keyword1.*?Keyword2).*, if you want to keep ONLY the bold text below

    ( From the last keyword Keyword1 to the first keyword Keyword2, after Keyword1 )

    .......Keyword1.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    **.......Keyword2**.......
    .......Keyword2.......
    .......Keyword2.......

    • Use the regex (?s).*?(Keyword1.*?Keyword2).*, if you want to keep ONLY the bold text below

    ( From the first keyword Keyword1 to the first keyword Keyword2, after Keyword1 )

    .......Keyword1.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword2.......
    .......Keyword2.......

    • Use the regex (?s).*(Keyword1.*Keyword2).*, if you want to keep ONLY the bold text below

    ( From the last keyword Keyword1 to the last keyword Keyword2 )

    .......Keyword1.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword2.......
    .......Keyword2.......

    Notes :

    • If each file contains ONLY ONE keyword ( Keyword1 and Keyword2 ), just use the last regex

    • If you care about the right case of your two keywords, just change, at the beginning of the regex, the modifier (?s) into the modifier (?s-i)

    • If you want, as well, delete the two keywords Keyword1 and Keyword2 :

    Place the opening round bracket right after the keyword Keyword1 and the ending round bracket, right before the keyword Keyword2

    Hope this explanation will be valuable to you, anyway !

    Cheers,

    guy038

     
  • THEVENOT Guy
    THEVENOT Guy
    2014-03-16

    Hi, David and All,

    My two previous post, although correctly posted, display the SourceForge message below :

    Post awaiting moderation !

    So, I'm going to do a last try ! If it won't work, I'll try to create a new topic to post my answer, with a link to your original post !

    Best Regards

    guy038

     
  • THEVENOT Guy
    THEVENOT Guy
    2014-03-16

    Hi David,

    Well, you didn't say how many html files are contained in your folder but :

    1)

    For less of about fifteen files, and with the last 6.5.5 version of N++, you'll just have to :

    • Type in a command DOS windows : X:\.....\Notepad++.exe Y:\....\*.html

    • Perform one of the four Search/Replacement, proposed below, on each file opened

    • Save all these files, with the File - Close All menu command

    2)

    For more files, you would preferably use the python script of N++, which in a few lines, would achieve the job. But, as I'm not yet acquainted with Python plugin, I can't help you on that part :-(. Luckily, I'm sure that some guys will be easily able to help you :-)

    I'll just point out the right regular expression search to use !

    As you didn't say if your html files may contain several keywords Keyword1 and/or Keyword2 and, also, if they may be mixed, I 'll consider that the contains of the file are of the general form below :

    .......Keyword1.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword2.......
    .......Keyword2.......

    For any of the four regexes below, that you choose :

    • Go back to the beginning of the current file ( CTRL + Org )

    • Open the Replace dialog ( CTRL + H )

    • Select the Regular expression radio button

    • Uncheck the Wrap around square box

    • Type the regex chosen, from the four regexes below, in the Find what: zone

    • Type in \1 in the Replace With: zone

    • Click on the Replace All button

    Then :

    • Use the regex (?s).*?(Keyword1.*Keyword2).*, if you want to keep ONLY the bold text below

    ( From the first keyword Keyword1 to the last keyword Keyword2 )

    .......Keyword1.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword2.......
    **.......Keyword2**.......

    • Use the regex (?s).*(Keyword1.*?Keyword2).*, if you want to keep ONLY the bold text below

    ( From the last keyword Keyword1 to the first keyword Keyword2, after Keyword1 )

    .......Keyword1.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    **.......Keyword2**.......
    .......Keyword2.......
    .......Keyword2.......

    • Use the regex (?s).*?(Keyword1.*?Keyword2).*, if you want to keep ONLY the bold text below

    ( From the first keyword Keyword1 to the first keyword Keyword2, after Keyword1 )

    .......Keyword1.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword2.......
    .......Keyword2.......

    • Use the regex (?s).*(Keyword1.*Keyword2).*, if you want to keep ONLY the bold text below

    ( From the last keyword Keyword1 to the last keyword Keyword2 )

    .......Keyword1.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword1.......
    .......Keyword2.......
    .......Keyword2.......
    .......Keyword2.......

    Notes :

    • If each file contains ONLY ONE keyword ( Keyword1 and Keyword2 ), just use the last regex

    • If you care about the right case of your two keywords, just change, at the beginning of the regex, the modifier (?s) into the modifier (?s-i)

    • If you want, as well, delete the two keywords Keyword1 and Keyword2 :

    Place the opening round bracket right after the keyword Keyword1 and the ending round bracket, right before the keyword Keyword2

    Hope this explanation will be valuable to you, anyway !

    Cheers,

    guy038

     
  • THEVENOT Guy
    THEVENOT Guy
    2014-03-16

    Hi, David and All,

    Some of my previous post to you, although correctly posted, display the SourceForge message below :

    Post awaiting moderation I don't understand at all what happens !!!

    In the meanwhile, two hours later, David, I now see your two other posts and that you solved your problem, by your own. Fine !

    I also notice my different identical tries to answer you, but unfortunately, I can't re-edit these posts, to do some corrections or delete the duplicate answers !

    Then, one more time, I'm trying, just for information, to post it, after deleting some text. May be, the original post was too long ??


    Well, you didn't say how many html files are contained in your folder but :

    1)

    For less of about fifteen files, and with the last 6.5.5 version of N++, you'll just have to :

    • Type in a command DOS windows :

    X:\.....\Notepad++.exe Y:\....\*.html

    • Perform one of the four Search/Replacement, proposed below, on each file opened

    • Save all these files, with the File - Close All menu command

    2)

    For more files, you would preferably use the python script of N++, which in a few lines, would achieve the job. But, as I'm not yet acquainted with Python plugin, I can't help you on that part :-(. Luckily, I'm sure that some guys will be easily able to write a script for your needs :-)

    I'll just point out the right regular expression search to use !

    As you didn't say if your html files may contain several keywords Keyword1 and/or Keyword2 and, also, if they may be mixed. Then, I'll consider the general case, with several keywords mixed.

    For any of the four regexes below, that you choose :

    • Go back to the beginning of the current file ( CTRL + Org )

    • Open the Replace dialog ( CTRL + H )

    • Select the Regular expression radio button

    • Uncheck the Wrap around square box

    • Type the regex chosen, from the four regexes below, in the Find what: zone

    • Type in \1 in the Replace With: zone

    • Click on the Replace All button

    So :

    • Use the regex (?s).*?(Keyword1.*Keyword2).*, if you want to keep text from the first keyword Keyword1 to the last keyword Keyword2

    • Use the regex (?s).*(Keyword1.*?Keyword2).*, if you want to keep text from the last keyword Keyword1 to the first keyword Keyword2, after Keyword1

    • Use the regex (?s).*?(Keyword1.*?Keyword2).*, if you want to keep text from the first keyword Keyword1 to the first keyword Keyword2, after Keyword1

    • Use the regex (?s).*(Keyword1.*Keyword2).*, if you want to keep text from the last keyword Keyword1 to the last keyword Keyword2

    Notes :

    • If each file contains ONLY ONE keyword ( Keyword1 and Keyword2 ), just use the last regex

    • If you care about the right case of your two keywords, just change, at the beginning of the regex, the modifier (?s) into the modifier (?s-i)

    • If you want, as well, delete the two keywords Keyword1 and Keyword2 :
      Place the opening round bracket right after the keyword Keyword1 and the ending round bracket, right before the keyword Keyword2

    Hope this explanation will be valuable to you, anyway !

    Cheers,

    guy038

     
    Last edit: THEVENOT Guy 2014-03-16
  • cchris
    cchris
    2014-03-22

    I haven't touched a thing, never sw that "awaits moderation" thing, an all the duplicates do appear posted on the forum.
    If you wish to clean up a bit but can't, I can help.

    CChris