[Htmlparser-developer] Nonrecursive FilterBean.setFilters()
Brought to you by:
derrickoswald
From: Martin H. <Mh...@de...> - 2005-09-01 14:22:34
|
Having a great time with the tool. I ran into the following behaviour and wanted insight into the design decision and/or and alternative to my fix. =20 I had previously been manually creating a parser, creating a filter, copyinto a NodeList and then processing that NodeList with a new filter. Basically a two step filter. The first pass utilized relative context to grab chunks of html while the second did a quick filter on the resulting elements to pull from that reduced list. =20 I was refactoring code to use the FilterBean class as this seemed to offer an opportunity to simplify the code and handle the two filters in series automagically. The unexpected result (for me :-) ) was that the behavior was not identical. It turns out that the filter bean explicitly does not recurse on the subsequent filter applications. =20 From FilterBean, Line 166: ret =3D ret.extractAllNodesThatMatch (getFilters ()[i], false); =20 as a result the second filter can't find <A>s that are within <SPAN>s, for instance. My short term hack was to set the recursion flag to true. =20 Finally my question! Why is non-recursion the intended behavior as this behaves differently than manually applying subsequent filters? Is my fix OK or will it break some intended behavior elsewhere? =20 Martin N. Hudson devIS - Development InfoStructure =20 |