I have a large text file that I opened with Notepad++ and there are occurences of opening <Title> and closing </Title> tags. There are many of them, about 23 thousand, actually.
I need to copy the text between these tags. So as the output I need 23 thousand lines, each line including the text that appeared between the title tags in the original code.
Can anyone tell me how to do that?
You can replace all appearances of <Title> and </Title> with an empty string.
I think you can combine "Macro" tool with regexp search. Something like:
-Start macro recording
-Write regular expression, that will search next text block with <Title> and </Title>
-goto the end of file
-Stop macro recording
-Replay macro so many times you want (or "to the end of file")
-Replace <Title> and </Title> to empty string.
My suggestion is to use Textcrawler's Extract tool with a regular expression.
This regex should work:
On the Mark tab of the find dialog, check "Regular expression" and "Mark line". Enter "<Title>(.+)</Title>" (w/o the quotes) as search term, then click "Find All." This step will add bookmarks to all the lines that match the regEx.
Then do Search -> Bookmark -> Copy Bookmarked Lines and paste to a new document. Finally, on the Replace tab of the Find and Replace dialog reuse the above regEx and use "$1" (without quotes) as replace term. This step will remove "<Title>" and "</Title>".
I think we can do the job with ONLY ONE search/replacement !
1) COPY ALL the text BETWEEN the TWO lines '------------', below, in a NEW file
<Title>THIS IS A </Title>very small <Title>TEXT TO SEE</Title>67890
IT WORKS FINE !</Title>………..
no good text
Just notice TWO facts :
- ALL the text you need to extract, in this example, is UPPERCASE text !
- The LAST bloc <Title>……</Title> is a MULTI-lines BLOC ( It doesn't matter ! )
2) In this NEW tab, type CTRL-H to open the SEARCH-REPLACEMENT dialog
3) SELECT the radio button 'Regular expression'
4) SELECT the box 'Wrap around'
5) Do the SEARCH-REPLACEMENT, below, on the text of the NEW file :
SEARCH : (?s).*?<Title>(.*?)</Title>(\R)?|.*\z
REPLACE : (?1\1(?2\2:\r\n))
=> Finally, we obtain, below, ALL UPPERCASE text which is INSIDE the ZONES <Title>…..</Title>
THIS IS A
TEXT TO SEE
IT WORKS FINE !
Once again, notice TWO facts :
- The EMPTY forms <Title></Title> generate a BLANK line
- The cursor, BEFORE the SEARCH-REPLACEMENT, can be at ANY POSITION of the file !
I've made a TUTORIAL, about the PCRE Regular Expressions ( Perl Common Regular Expressions ),
used in Notepad++, from the 6.0 version.
As I'm French, all this manual is written in French. but you can find out some tricks or
explanations in all the lists and examples, all along this tutorial.
Christian Cuvier ( cchris ), a very well-known contributer, allowed me to put my tutorial
on his personnel site.
So, you can download this TUTORIAL, in 3 versions, (.txt .pdf .html), at the address below :
I hope it'll be useful to you
P.S. You can also find some documentation, about the new PRCE Regular Expressions, used by N++, at the
two adresses below :
The FIRST one concerns the syntax of regular expressions in the SEARCH part
The SECOND one concerns the syntax of regular expressions in the REPLACEMENT part