[Htmlparser-user] Can we extract links matching a substring in StringFilter or even better matching
Brought to you by:
derrickoswald
|
From: Kamdar, D. \(MLITS\) <Dev...@ml...> - 2008-02-27 17:06:15
|
Hi,
I am trying to parse a HTML page and extract all the links that have a A
tag and have "#Entry" substring in their href attribute.
E.g. Here is the html file
<html>
<body>
<A href="#Entry1">1) First Line</A>
<A href="#Entry2">2) Second Line</A>
<A href="#Entry3">3) Third Line</A>
<A href="#Entry4">4) Fourth Line</A>
</body>
</html>
I need to extract a list of links that each Entry represents.
I tried using a combination of TagFilter("A") and StringFilter(")") in
AndFilter() as follows:
NodeList list = parser.extractAllNodesThatMatch(new AndFilter(new
TagNameFilter("A"),new StringFilter(")")));
But I think, StringFilter searches for whole strings i.e. exact match
and not SubStrings
i.e. partly matching ) in
<A href="#Entry1">1) First Line</A>
link for example.
Or is there a filter where I can use the content of the attribute like
#Entry1 or even better a substring of the content #Entry to filter out
the tags?
Is there a way I can achieve what I am trying to do here?
Any help is much appreciated.
Thanks
Devang Kamdar
|