Re: [Htmlparser-user] regarding using not filter
Brought to you by:
derrickoswald
From: Derrick O. <Der...@Ro...> - 2006-01-16 23:10:11
|
The filter isn't going to re-arrange your page for you. The best you can do is get the list of tags and then remove them from their parent's child list: for (int i=0; i < arr.length; i++) { filter1 = new TagNameFilter (arr[i]); filter2 = new NotFilter (filter1); NodeList x = n.extractAllNodesThatMatch (filter2); for (int j = 0; j < x.Length; j++) { Node tag = x.elementAt (j); tag.getParent ().getChildren ().remove (tag); } } or something like that. No guarantees about removing something that's already been removed though. Srikrishna Swaminathan wrote: > hi, > my name is srikrishna,i am a college student. > i am quiet new to html parser. > i am trying to use html parser to filter out certain tags in a html page. > with not filter i am able take out tags like script,img. > but the problem is that the tags are removed only if they outside any > other tags. > eg:if there is an img tag inside an table or td tag,the img tag is not > removed. > i dont want to remove the table or td tag,but i want to remove the > img,script tags inside the table tags. > the initial code i wrote was: > > String[]arr=new > String[]{"img","input","br","span","script","noscript","b","a href",}; > try{ > > Parser parser = new Parser (); > parser.setURL ("targetin.html"); > NodeList n = parser.parse(null); > NodeFilter filter1 = null; > NotFilter filter2 = null; > for(int i=0;i<arr.length;i++){ > filter1 = new TagNameFilter (arr[i]); > filter2 = new NotFilter(filter1); > n = n.extractAllNodesThatMatch(filter2); > } > String s1=n.toHtml(); > > This code removes all tags,outside table,tr,td tags.but if they are > inside a table tag,it is not able to do it. |