[Htmlparser-user] Newbie Problem with HasChildFilter

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi

I'm probably doing something stupid here, but I can't get the
HasChildFilter to work properly. I am trying to get all the <a> tags
that occur inside the <body> tag so I can re-write them. I don't want
the javascript generated tags that occur inside the <head> tag. My
test case is below.

String testHtml = "<html><head><script><a href=JAVASCRIPT:openProc('\"
+ parent.contents.procUID[i] + \"','main')>"
	                     +"</script><body><table><tr><td>Cell
Content</td></tr><tr><td>"
	                     +"<a target=\"main\"
href=\"findXml.jsp?XMLFile=G455051\">Control
Mechanism</a></td></tr></table></body></html>";

Parser parser = new Parser(testHtml);
NodeList originalPage = parser.parse(null);
NodeFilter filter = new AndFilter(new TagNameFilter("body"),
	        new HasChildFilter(new TagNameFilter("a"),true));
NodeList extract = originalPage.extractAllNodesThatMatch(filter, true);

This fails to find any of the <a> tags - extract.size() is zero. Can
someone point out
what I'm doing wrong please.

Regards