Re: [Htmlparser-user] Newbie Problem with HasChildFilter
Brought to you by:
derrickoswald
From: Roger V. <rog...@go...> - 2009-07-07 12:11:48
|
> > If you want the A tags insid the BODY tag it would be: > NodeFilter filter = new AndFilter(new TagNameFilter("A"), > new HasParentFilter(new TagNameFilter("BODY"),true)); > Thanks Derek, that worked perfectly. I've now got another problem that I think might be a bug. With the testcase (I'm not making this up - I've actually got work with this sort of stuff!) String testHtml = "<html><head><script><a href=JAVASCRIPT:openProc('\" + parent.contents.procUID[i] + \"','main')>" +"</script><body><table><tr><td><img src=/666.jpg\"></td></tr><tr><td>" +"document.write(\"<a href=JAVASCRIPT:openProc('\" + parent.contents.procUID[i] + \"','main')>\" + parent.contents.procDisplay[i] + \"</a>\"</a></td></tr></table></body></html>"; Parser parser = new Parser(testHtml); NodeList originalPage = parser.parse(null); NodeFilter filter = new AndFilter(new TagNameFilter("a"), new HasParentFilter(new TagNameFilter("body"),true)); NodeList extract = originalPage.extractAllNodesThatMatch(filter, true); This picks up the second JAVASCRIPT LinkTag - the one outside the <head> tag, but inside the document.write(). When I try to evaluate LinkTag.getLinkTag() against this, HtmlParser is reporting the text as JAVASCRIPT:openProc('" which is not correct. Any ideas? Regards |