Re: [Htmlparser-user] iterate through node list
Brought to you by:
derrickoswald
|
From: Mattia T. <mat...@gm...> - 2007-09-17 11:44:59
|
Hi try this:
in a new class, after importing:
import org.htmlparser.tags.*;
import org.htmlparser.util.*;
insert next method:
protected URL[] extractLinks(String url) throws ParserException {
Parser parser;
Vector vector;
LinkTag link;
URL[] ret;
parser = new Parser(url);
ObjectFindingVisitor visitor = new ObjectFindingVisitor(
LinkTag.class);
parser.visitAllNodesWith(visitor);
Node[] nodes = visitor.getTags();
vector = new Vector();
for (int i = 0; i < nodes.length; i++)
try {
link = (LinkTag) nodes[i];
System.out.println(link.getLink() + " " + link.getLinkText
());
vector.add(new URL(link.getLink()));
} catch (MalformedURLException murle) {
murle.printStackTrace();
}
ret = new URL[vector.size()];
vector.copyInto(ret);
return (ret);
}
Hope this help.
Cheers
Mattia
2007/9/17, Nic Soltani <oo...@gm...>:
>
> Hi
> I created a NodeList which contains hyperlinks extracted from an HTML
> webpage,
> I need to be able to iterate through every single node and extract its
> href.
> Wondering if anyone can help me with:
>
> 1. how to Iterate nodes 1 by 1
> 2. extract href
>
>
> NodeList URLs = ExtractHyperLinks(HTML);
> /*
> * at this stage we have all:
> * <A HREF="link1">something1</A>
> * <A HREF="link2">something2</A>
> * <A HREF="link3">something3</A>
> * <A HREF="link4">something4</A>
> * <A HREF="link5">something5</A>
> */
>
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2005.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
|