[Htmlparser-developer] Nonrecursive FilterBean.setFilters()
Brought to you by:
derrickoswald
|
From: Martin H. <Mh...@de...> - 2005-09-01 14:22:34
|
Having a great time with the tool. I ran into the following behaviour
and wanted insight into the design decision and/or and alternative to my
fix.
=20
I had previously been manually creating a parser, creating a filter,
copyinto a NodeList and then processing that NodeList with a new filter.
Basically a two step filter. The first pass utilized relative context
to grab chunks of html while the second did a quick filter on the
resulting elements to pull from that reduced list.
=20
I was refactoring code to use the FilterBean class as this seemed to
offer an opportunity to simplify the code and handle the two filters in
series automagically. The unexpected result (for me :-) ) was that the
behavior was not identical. It turns out that the filter bean
explicitly does not recurse on the subsequent filter applications.
=20
From FilterBean, Line 166:
ret =3D ret.extractAllNodesThatMatch (getFilters
()[i], false);
=20
as a result the second filter can't find <A>s that are within <SPAN>s,
for instance. My short term hack was to set the recursion flag to true.
=20
Finally my question! Why is non-recursion the intended behavior as this
behaves differently than manually applying subsequent filters? Is my
fix OK or will it break some intended behavior elsewhere?
=20
Martin N. Hudson
devIS - Development InfoStructure
=20
|