Hey all,
Ok..I am not really asking much from this parser, :) but all i want to do is parse an html page, and process some tags like in the code below:
for (NodeIterator iterator = parser.elements (); iterator.hasMoreNodes (); )
{
tag = (Node) (iterator.nextNode ());
if(tag != null)
{
if (tag instanceof StringNode){
StringNode sn = (StringNode) tag;
System.out.println(sn.getText() );
}
else if (tag instanceof ImageTag){
ImageTag it = (ImageTag)tag;
System.out.println("Img: " + it.getImageURL() );
}
else if (tag instanceof LinkTag){
LinkTag lt = (LinkTag) tag;
System.out.println(lt.getLinkText() + "\n ~~~>> " + lt.getLink() );
}
else if (tag instanceof FormTag){
System.out.println("FORM Tag located" );
}
............
This is not all the code, but whatever i am trying to do here doesnot work when i try it.
I'd like to isolate some tags and deal with them (but in the order in which they appear on the website..not collections of all nodes of that type).
Please please...help me out here!
cheers
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You're missing recursion.
If a node is an instance of CompositeTag, it can have children. By iterating through the children recursively you can get all nodes. Wrap the contents of your "if" block as a method taking a Node, i.e. doSomething(Node node), then add this to the method:
If you add it to the end of your method you should get things in the order of the page, i.e. process the current node first, then it's children. You can get end tags separately if you need them.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2004-05-04
Thanks :) i'll give that I try right away.
cheers mate!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hey all,
Ok..I am not really asking much from this parser, :) but all i want to do is parse an html page, and process some tags like in the code below:
Parser parser;
URL url;
URLConnection connection;
parser = new Parser ();
url = new URL ("http://page.ca");
connection = url.openConnection ();
parser.setConnection (connection);
Node tag;
for (NodeIterator iterator = parser.elements (); iterator.hasMoreNodes (); )
{
tag = (Node) (iterator.nextNode ());
if(tag != null)
{
if (tag instanceof StringNode){
StringNode sn = (StringNode) tag;
System.out.println(sn.getText() );
}
else if (tag instanceof ImageTag){
ImageTag it = (ImageTag)tag;
System.out.println("Img: " + it.getImageURL() );
}
else if (tag instanceof LinkTag){
LinkTag lt = (LinkTag) tag;
System.out.println(lt.getLinkText() + "\n ~~~>> " + lt.getLink() );
}
else if (tag instanceof FormTag){
System.out.println("FORM Tag located" );
}
............
This is not all the code, but whatever i am trying to do here doesnot work when i try it.
I'd like to isolate some tags and deal with them (but in the order in which they appear on the website..not collections of all nodes of that type).
Please please...help me out here!
cheers
You're missing recursion.
If a node is an instance of CompositeTag, it can have children. By iterating through the children recursively you can get all nodes. Wrap the contents of your "if" block as a method taking a Node, i.e. doSomething(Node node), then add this to the method:
if (node instanceof CompositeTag && null != node.getChildren ())
for (NodeIterator iterator = node.getChildren ().elements (); iterator.hasMoreNodes (); )
doSomething (iterator.nextNode ());
If you add it to the end of your method you should get things in the order of the page, i.e. process the current node first, then it's children. You can get end tags separately if you need them.
Thanks :) i'll give that I try right away.
cheers mate!