Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

search and copy

2. Help
Vanad1um3
2011-12-03
2012-11-13
  • Vanad1um3
    Vanad1um3
    2011-12-03

    Good day!

    Is it possible to search a text for a particular html tag and copy every instance of it to another file? Either the tag + content or only its content.

    Vladislav

     
  • Vera
    Vera
    2011-12-11

    Hello,
    as no profi jumps in to answer you might like to use my manual attempt ;-)

    I would:
    - work on a copied file
    a- seperate the searched items to be in their own lines
    b- and delete the rest.
    c- extract the contents from the tags

    eg. you want to grab all link-tags like

    <a href="address" ...> content </a>
    

    a- search and replace using regular expression mode

    find: <a(.*?)</a>
    replace: \n<a\1</a>\n
    

    b- now use the mark tab to search and bookmark those lines:

    find: <a
    

    and mark-option is selected!

    go to bookmark-menu:
    1.- Invert all bookmarked lines
    2.- Delete bookmarked lines

    c- to remove the tags and keep the contents

    search and replace using regular expression mode

    find: <a(.*?)>(.*?)</a>
    replace: \2
    

    Hope this can be a way for the moment ~ greetings

     
  • THEVENOT Guy
    THEVENOT Guy
    2012-10-08

    Hello vanad1um3,

    I think I find out a general SEARCH and REPLACEMENT expression,
      using regular expressions, that could do ALL the job, without
      the use of bookmarks or anything else !

    Let's suppose, as said perquant, you need to extract all link-tags
      like :    <a href="address"……>contents</a>

    To begin, we build the regular expression to match this expression :

      So we get :   RegExp =  <a.*?</a>

       .*? means the SMALLEST string between '<a' and '</a>'

    Then, we construct the SEARCH regular expression  (?s).*?(Re)(\R)?|.*\z
      where Re represents the RegExp above

    Finally, we obtain :  (?s).*?(<a.*?</a>)(\R)?|.*\z

    Some explanations :

      -  the (?s) modifier means that the dot (.) match the END of LINE characters
           ( It's exactly the same as ticking the '. matches newline' box )

      -  .*? stands for ALL the characters between TWO occurences of <a…>…..</a>

      -  the ORIGINAL RegExp <a.*?</a> is surrounded with parentheses to store it as \1

      -  (\R)? stands for any optional 'END of LINE', stored as \2
                 ( \r\n if WINDOWS file, \n if UNIX file and \r if MAC fle )

      -  .*\z represents ALL the characters AFTER the last <a….>….</a> of a file
                up to the VERY END of a file

      -  |  means the ALTERNATION symbol

    Whatever the form of the initial RegExp, the REPLACEMENT regular expression
      is ALWAYS  (?1\1(?2\2:\r\n))

    It's a bit more complicated to explain :

      - As long as <a…>….</a> is NOT found, or AFTER the LAST <a…>….</a>, NO
          replacement is done because there's not a NEGATIVE part in the (?1……) expression

      - If a tag is found, we just re-write it (\1)

      - If <a…>…..</a> tag is found, then :

          - If an END of FILE (\2) is found, we re-write it.

          - If an END of FILE is NOT found we just write an 'END of FILE' \r\n
              part after the COLON in the imbricated (?2…:….) expression

    Notes :

      - You can store just a part of the orginal RegExp. For example, if you don't need
          the boundaries '<a' and '</a>', just change the SEARCH regular expression in :

                 (?s).*?<a(.*?)</a>(\R)?|.*\z

      - The SEARCH-REPLACEMENT is't still OK if the <a…..>…….</a> construction lies
          in MORE than ONE line

    To sum up :

    FIND                      :     (?s).*?(<a.*?</a>)(\R)?|.*\z

    REPLACEMENT  :     (?1\1(?2\2:\r\n))

    Tick the 'Regular expression' radio button

    Tick the 'Match case' box, if necessary

    Go to the BEGINNING of a file ( CTRL + Org )

    Click on the 'Replace All' button  :   Whaooooooooooou !!

      we get ALL the <a…….>…….</a> constructions, ONE per LINE !

     
    I hope it'll be useful to you

    guy038

    P.S. You can find some documentation, about the new PRCE Regular Expressions, used by N++, at the
           two adresses below :

         http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

         http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html