Total Newb Needs Help With Notepad++

2012-10-18
2012-11-13
  • Trevor Zacek
    Trevor Zacek
    2012-10-18

    Hi everyone,

    I have a large text file that I opened with Notepad++ and there are occurences of opening <Title> and closing </Title> tags. There are many of them, about 23 thousand, actually.

    I need to copy the text between these tags. So as the output I need 23 thousand lines, each line including the text that appeared between the title tags in the original code.

    Can anyone tell me how to do that?

    Thanks!

     

  • Anonymous
    2012-10-18

    You can replace all appearances of <Title>  and </Title> with an empty string.

     
  • I think you can combine "Macro" tool with regexp search. Something like:
    -Start macro recording
    -Ctrl+F
    -Write regular expression, that will search next text block with <Title> and </Title>
    -Cut it
    -goto the end of file
    -Paste it
    -Stop macro recording
    -Replay macro so many times you want (or "to the end of file")
    -Replace <Title> and  </Title> to empty string.

     
  • srbs
    srbs
    2012-10-19

    My suggestion is to use Textcrawler's Extract tool with a regular expression.

    This regex should work:
    (?<=<Title>).*?(?=</Title>)

     
  • Jan Schreiber
    Jan Schreiber
    2012-10-19

    On the Mark tab of the find dialog, check "Regular expression" and "Mark line". Enter "<Title>(.+)</Title>" (w/o the quotes) as search term, then click "Find All." This step will add bookmarks to all the lines that match the regEx.
    Then do Search -> Bookmark -> Copy Bookmarked Lines and paste to a new document. Finally, on the Replace tab of the Find and Replace dialog reuse the above regEx and use "$1" (without quotes) as replace term. This step will remove "<Title>" and "</Title>".

     
  • THEVENOT Guy
    THEVENOT Guy
    2012-10-28

    Hello tekamolo,

    I think we can do the job with ONLY ONE search/replacement !

      1) COPY ALL the text BETWEEN the TWO lines '------------', below, in a NEW file


    <Title>THIS IS A </Title>very small <Title>TEXT TO SEE</Title>67890
    <Title></Title>
    12345<Title>IF ALL</Title>
    <Title>SEEMS</Title>
    -------<Title>OK.
    NICE !
    IT WORKS FINE !</Title>………..
    no good text
    --------------------------------------------------------------------

    Just notice TWO facts :

      - ALL the text you need to extract, in this example, is UPPERCASE text !

      - The LAST bloc <Title>……</Title> is a MULTI-lines BLOC  ( It doesn't matter ! )

      2) In this NEW tab, type CTRL-H to open the SEARCH-REPLACEMENT dialog

      3) SELECT the radio button 'Regular expression'

      4) SELECT the box 'Wrap around'
     
     
      5) Do the SEARCH-REPLACEMENT, below, on the text of the NEW file :

    SEARCH :         (?s).*?<Title>(.*?)</Title>(\R)?|.*\z

    REPLACE :       (?1\1(?2\2:\r\n))

      => Finally, we obtain, below, ALL UPPERCASE text which is INSIDE the ZONES  <Title>…..</Title>

    THIS IS A
    TEXT TO SEE

    IF ALL
    SEEMS
    OK.
    NICE !
    IT WORKS FINE !

    Once again, notice TWO facts :

      - The EMPTY forms  <Title></Title>  generate a BLANK line

      - The cursor, BEFORE the SEARCH-REPLACEMENT, can be at ANY POSITION of the file !

    I've made a TUTORIAL, about the PCRE Regular Expressions ( Perl Common Regular Expressions ),
      used in Notepad++, from the 6.0 version.

    As I'm French, all this manual is written in French. but you can find out some tricks or
      explanations in all the lists and examples, all along this tutorial.

    Christian Cuvier ( cchris ), a very well-known contributer, allowed me to put my tutorial
      on his personnel site.

    So, you can download this TUTORIAL, in 3 versions, (.txt .pdf .html), at the address below :

           http://oedoc.free.fr/Regex/TutorielRegex.zip

    I hope it'll be useful to you

    Cheers !

    guy038

    P.S. You can also find some documentation, about the new PRCE Regular Expressions, used by N++, at the
           two adresses below :

         http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

         http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

         The FIRST one concerns the syntax of regular expressions in the SEARCH part

         The SECOND one concerns the syntax of regular expressions in the REPLACEMENT part