[Htmlparser-user] Can't use extractAllNodesThatMatch back-to-back for same Parser instance
Brought to you by:
derrickoswald
|
From: Daniel D. <me...@cr...> - 2008-03-11 14:03:57
|
Hello,
Anyone know why I can't use two extractAllNodesThatMatch(filter)
methods back-to-back on the same Parser instance?
More specifically I have this code:
========================================
Parser parser = new Parser(google);
NodeList titleList = parser.extractAllNodesThatMatch(titleFilter);
NodeList summaryTableList = parser.extractAllNodesThatMatch(summaryTableFilter);
========================================
The Google search results page I'm parsing has a series of these:
<a href="blah">Title</a>
<table><tr><td>.....Summary info....</td></tr></table>
The two filters above, when independent, work fine. Run them
back-to-back and the second will come up empty. I don't see where the
extractAllNodesThatMatch method literally pulls the nodes out of the
captured source, thus affecting the second filter. Here are my
filters:
========================================
// filter to pull out titles (all links that are next to a table)
NodeFilter titleFilter = new AndFilter (
new NodeClassFilter (LinkTag.class),
new HasSiblingFilter (new NodeClassFilter(TableTag.class))
);
// filter to pull out summaries (all tables that are next to a title link)
NodeFilter summaryTableFilter = new AndFilter (
new NodeClassFilter (TableTag.class),
new NodeClassFilterOnPreviousSibling (LinkTag.class)
// custom filter
);
========================================
Thanks for the help. I've already tried subclassing the Parser so
that I could implement the clone() method, but got the same result.
-Daniel
|