hi,
my name is srikrishna,i am a college student.
i am quiet new to html parser.
i am trying to use html parser to filter out certain tags in a html page.
with not filter i am able take out tags like script,img.
but the problem is that the tags are removed only if they outside any other
tags.
eg:if there is an img tag inside an table or td tag,the img tag is not
removed.
i dont want to remove the table or td tag,but i want to remove the
img,script tags inside the table tags.
the initial code i wrote was:
String[]arr=3Dnew
String[]{"img","input","br","span","script","noscript","b","a href",};
try{
Parser parser =3D new Parser ();
parser.setURL ("targetin.html");
NodeList n =3D parser.parse(null);
NodeFilter filter1 =3D null;
NotFilter filter2 =3D null;
for(int i=3D0;i<arr.length;i++){
filter1 =3D new TagNameFilter (arr[i]);
filter2 =3D new NotFilter(filter1);
n =3D n.extractAllNodesThatMatch(filter2);
}
String s1=3Dn.toHtml();
This code removes all tags,outside table,tr,td tags.but if they are inside
a table tag,it is not able to do it.
|