[Htmlparser-user] Can we extract links matching a substring in StringFilter or even better matching
Brought to you by:
derrickoswald
From: Kamdar, D. \(MLITS\) <Dev...@ml...> - 2008-02-27 17:06:15
|
Hi, I am trying to parse a HTML page and extract all the links that have a A tag and have "#Entry" substring in their href attribute. E.g. Here is the html file <html> <body> <A href="#Entry1">1) First Line</A> <A href="#Entry2">2) Second Line</A> <A href="#Entry3">3) Third Line</A> <A href="#Entry4">4) Fourth Line</A> </body> </html> I need to extract a list of links that each Entry represents. I tried using a combination of TagFilter("A") and StringFilter(")") in AndFilter() as follows: NodeList list = parser.extractAllNodesThatMatch(new AndFilter(new TagNameFilter("A"),new StringFilter(")"))); But I think, StringFilter searches for whole strings i.e. exact match and not SubStrings i.e. partly matching ) in <A href="#Entry1">1) First Line</A> link for example. Or is there a filter where I can use the content of the attribute like #Entry1 or even better a substring of the content #Entry to filter out the tags? Is there a way I can achieve what I am trying to do here? Any help is much appreciated. Thanks Devang Kamdar |