Is it possible to search a text for a particular html tag and copy every instance of it to another file? Either the tag + content or only its content.
as no profi jumps in to answer you might like to use my manual attempt ;-)
- work on a copied file
a- seperate the searched items to be in their own lines
b- and delete the rest.
c- extract the contents from the tags
eg. you want to grab all link-tags like
<a href="address" ...> content </a>
a- search and replace using regular expression mode
b- now use the mark tab to search and bookmark those lines:
and mark-option is selected!
go to bookmark-menu:
1.- Invert all bookmarked lines
2.- Delete bookmarked lines
c- to remove the tags and keep the contents
search and replace using regular expression mode
Hope this can be a way for the moment ~ greetings
I think I find out a general SEARCH and REPLACEMENT expression,
using regular expressions, that could do ALL the job, without
the use of bookmarks or anything else !
Let's suppose, as said perquant, you need to extract all link-tags
like : <a href="address"……>contents</a>
To begin, we build the regular expression to match this expression :
So we get : RegExp = <a.*?</a>
.*? means the SMALLEST string between '<a' and '</a>'
Then, we construct the SEARCH regular expression (?s).*?(Re)(\R)?|.*\z
where Re represents the RegExp above
Finally, we obtain : (?s).*?(<a.*?</a>)(\R)?|.*\z
Some explanations :
- the (?s) modifier means that the dot (.) match the END of LINE characters
( It's exactly the same as ticking the '. matches newline' box )
- .*? stands for ALL the characters between TWO occurences of <a…>…..</a>
- the ORIGINAL RegExp <a.*?</a> is surrounded with parentheses to store it as \1
- (\R)? stands for any optional 'END of LINE', stored as \2
( \r\n if WINDOWS file, \n if UNIX file and \r if MAC fle )
- .*\z represents ALL the characters AFTER the last <a….>….</a> of a file
up to the VERY END of a file
- | means the ALTERNATION symbol
Whatever the form of the initial RegExp, the REPLACEMENT regular expression
is ALWAYS (?1\1(?2\2:\r\n))
It's a bit more complicated to explain :
- As long as <a…>….</a> is NOT found, or AFTER the LAST <a…>….</a>, NO
replacement is done because there's not a NEGATIVE part in the (?1……) expression
- If a tag is found, we just re-write it (\1)
- If <a…>…..</a> tag is found, then :
- If an END of FILE (\2) is found, we re-write it.
- If an END of FILE is NOT found we just write an 'END of FILE' \r\n
part after the COLON in the imbricated (?2…:….) expression
- You can store just a part of the orginal RegExp. For example, if you don't need
the boundaries '<a' and '</a>', just change the SEARCH regular expression in :
- The SEARCH-REPLACEMENT is't still OK if the <a…..>…….</a> construction lies
in MORE than ONE line
To sum up :
FIND : (?s).*?(<a.*?</a>)(\R)?|.*\z
REPLACEMENT : (?1\1(?2\2:\r\n))
Tick the 'Regular expression' radio button
Tick the 'Match case' box, if necessary
Go to the BEGINNING of a file ( CTRL + Org )
Click on the 'Replace All' button : Whaooooooooooou !!
we get ALL the <a…….>…….</a> constructions, ONE per LINE !
I hope it'll be useful to you
P.S. You can find some documentation, about the new PRCE Regular Expressions, used by N++, at the
two adresses below :
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.