Re: [Htmlparser-user] Can't use extractAllNodesThatMatch back-to-back for same Parser instance
Brought to you by:
derrickoswald
From: Daniel D. <me...@cr...> - 2008-03-11 22:57:49
|
Was able to fix the problem using this code, pulled from the extractAllNodesThatMatch method itself: ========================= NodeIterator e; for (e = parser.elements (); e.hasMoreNodes (); ) { Node currentNode = e.nextNode(); currentNode.collectInto(titleList, titleFilter); currentNode.collectInto(summaryTableList, summaryTableFilter); } =================== -Daniel On Tue, Mar 11, 2008 at 9:03 AM, Daniel Dixon <me...@cr...> wrote: > Hello, > > Anyone know why I can't use two extractAllNodesThatMatch(filter) > methods back-to-back on the same Parser instance? > > More specifically I have this code: > > ======================================== > Parser parser = new Parser(google); > > NodeList titleList = parser.extractAllNodesThatMatch(titleFilter); > NodeList summaryTableList = parser.extractAllNodesThatMatch(summaryTableFilter); > ======================================== > > The Google search results page I'm parsing has a series of these: > > <a href="blah">Title</a> > <table><tr><td>.....Summary info....</td></tr></table> > > The two filters above, when independent, work fine. Run them > back-to-back and the second will come up empty. I don't see where the > extractAllNodesThatMatch method literally pulls the nodes out of the > captured source, thus affecting the second filter. Here are my > filters: > > ======================================== > // filter to pull out titles (all links that are next to a table) > NodeFilter titleFilter = new AndFilter ( > new NodeClassFilter (LinkTag.class), > new HasSiblingFilter (new NodeClassFilter(TableTag.class)) > ); > // filter to pull out summaries (all tables that are next to a title link) > NodeFilter summaryTableFilter = new AndFilter ( > new NodeClassFilter (TableTag.class), > new NodeClassFilterOnPreviousSibling (LinkTag.class) > // custom filter > ); > ======================================== > > Thanks for the help. I've already tried subclassing the Parser so > that I could implement the clone() method, but got the same result. > > -Daniel > -- ------------------------------- Daniel me...@da... www.OneDanShow.com ------------------------------- |