Is it possible to starting searching for nodes in a html file starting from a given string position?
Say I know the html code and text for that part of file - where texts of interest appears - will be unique, is there a way to say starting looking forward or backward from this string index for a given node type? Rather than looping all similar type of node and checking for a given string (which may appear elsewhere).
thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You could use a stateful filter.
The filter would subclass StringFilter or RegexFilter and add a state flag where you've set the flag when you encounter the string. This only works forward from that point though.
public class StatefulStringFilter extends StringFilter
{
boolean mTriggered; // goes true when pattern seen
... usual constructors delegating to superclass
boolean accept (Node node)
{
boolean ret;
// return true if pattern already seen or node matches
ret = mTriggered || super.accept (node);
mTriggered = ret;
return (ret);
}
}
Then you can us this filter 'AND'ed with your 'other' filter that wants to work only after the string has been found...
parser.extractAllNodesThatMatch (
new AndFilter (
new StatefulStringFilter ("<pattern>"),
new DoThisAfterSeeingPatternFilter (yadda)));
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This example you gave; the pattern (as in new StatefulStringFilter ("<pattern>")) can only be text (as in text in textnode) and cannot be made up of partial html code and text right? as it extends StringFilter which only search for text.
By the way, I think HtmlParser is great! I've looked at other parsers and found this one easiest to use.
thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
True, the example uses text matching.
One generalization would be for the stateful filter to take any other filter and once the subordinate filter 'trips' then always return true.
public class StatefulFilter implements Filter
{
NodeFilter mSubordinate; // sub filter to check
boolean mTriggered; // goes true when subordinate filter goes true
Hi,
Is it possible to starting searching for nodes in a html file starting from a given string position?
Say I know the html code and text for that part of file - where texts of interest appears - will be unique, is there a way to say starting looking forward or backward from this string index for a given node type? Rather than looping all similar type of node and checking for a given string (which may appear elsewhere).
thanks.
You could use a stateful filter.
The filter would subclass StringFilter or RegexFilter and add a state flag where you've set the flag when you encounter the string. This only works forward from that point though.
public class StatefulStringFilter extends StringFilter
{
boolean mTriggered; // goes true when pattern seen
... usual constructors delegating to superclass
boolean accept (Node node)
{
boolean ret;
// return true if pattern already seen or node matches
ret = mTriggered || super.accept (node);
mTriggered = ret;
return (ret);
}
}
Then you can us this filter 'AND'ed with your 'other' filter that wants to work only after the string has been found...
parser.extractAllNodesThatMatch (
new AndFilter (
new StatefulStringFilter ("<pattern>"),
new DoThisAfterSeeingPatternFilter (yadda)));
Hi,
This example you gave; the pattern (as in new StatefulStringFilter ("<pattern>")) can only be text (as in text in textnode) and cannot be made up of partial html code and text right? as it extends StringFilter which only search for text.
By the way, I think HtmlParser is great! I've looked at other parsers and found this one easiest to use.
thanks.
True, the example uses text matching.
One generalization would be for the stateful filter to take any other filter and once the subordinate filter 'trips' then always return true.
public class StatefulFilter implements Filter
{
NodeFilter mSubordinate; // sub filter to check
boolean mTriggered; // goes true when subordinate filter goes true
public StatefulFilter (NodeFilter subordinate)
{
mTriggered = false;
mSubordinate = subordinate;
}
boolean accept (Node node)
{
boolean ret;
// return true if triggered or subordinate node matches
ret = mTriggered || subordinate.accept (node);
mTriggered = ret;
return (ret);
}
}
Then you could use any other filter matching various text and HTML as the trigger:
parser.extractAllNodesThatMatch (
new AndFilter (
new StatefulFilter (
<complex filter>
),
new DoThisAfterTriggeringFilter (yadda)));
You can use the FilterBuilder application to build the complex filter, but the StatefulFilter obviously won't be available within that program.