[Htmlparser-user] regarding using not filter
Brought to you by:
derrickoswald
From: Srikrishna S. <swa...@gm...> - 2006-01-16 06:52:50
|
hi, my name is srikrishna,i am a college student. i am quiet new to html parser. i am trying to use html parser to filter out certain tags in a html page. with not filter i am able take out tags like script,img. but the problem is that the tags are removed only if they outside any other tags. eg:if there is an img tag inside an table or td tag,the img tag is not removed. i dont want to remove the table or td tag,but i want to remove the img,script tags inside the table tags. the initial code i wrote was: String[]arr=3Dnew String[]{"img","input","br","span","script","noscript","b","a href",}; try{ Parser parser =3D new Parser (); parser.setURL ("targetin.html"); NodeList n =3D parser.parse(null); NodeFilter filter1 =3D null; NotFilter filter2 =3D null; for(int i=3D0;i<arr.length;i++){ filter1 =3D new TagNameFilter (arr[i]); filter2 =3D new NotFilter(filter1); n =3D n.extractAllNodesThatMatch(filter2); } String s1=3Dn.toHtml(); This code removes all tags,outside table,tr,td tags.but if they are inside a table tag,it is not able to do it. |