Re: [Htmlparser-developer] Nonrecursive FilterBean.setFilters()
Brought to you by:
derrickoswald
From: Derrick O. <Der...@Ro...> - 2005-09-02 11:59:28
|
It was an oversight. There probably needs to be explicit set/get of a recursion flag on the bean. The reason for there being a recursion flag on NodeList is to gain some control of the process, otherwise just asking the nodes in the list to do their own filtering would automatically get recursive behaviour, and that may not be what is desired when processing the list. Martin Hudson wrote: > Having a great time with the tool. I ran into the following behaviour > and wanted insight into the design decision and/or and alternative to > my fix. > > I had previously been manually creating a parser, creating a filter, > copyinto a NodeList and then processing that NodeList with a new > filter. Basically a two step filter. The first pass utilized relative > context to grab chunks of html while the second did a quick filter on > the resulting elements to pull from that reduced list. > > I was refactoring code to use the FilterBean class as this seemed to > offer an opportunity to simplify the code and handle the two filters > in series automagically. The unexpected result (for me J ) was that > the behavior was not identical. It turns out that the filter bean > explicitly does not recurse on the subsequent filter applications. > > From FilterBean, Line 166: > > ret = ret.extractAllNodesThatMatch (getFilters ()[i], false); > > as a result the second filter can’t find <A>s that are within <SPAN>s, > for instance. My short term hack was to set the recursion flag to true. > > Finally my question! Why is non-recursion the intended behavior as > this behaves differently than manually applying subsequent filters? Is > my fix OK or will it break some intended behavior elsewhere? > > Martin N. Hudson > > devIS - Development InfoStructure > |