Menu

further help needed with parser

Help
Anonymous
2004-05-03
2004-05-04
  • Anonymous

    Anonymous - 2004-05-03

    Hey all,
    Ok..I am not really asking much from this parser, :) but all i want to do is parse an html page, and process some tags like in the code below:

    Parser parser;
            URL url;
            URLConnection connection;
           parser = new Parser ();
            url = new URL ("http://page.ca");
            connection = url.openConnection ();
            parser.setConnection (connection);

            Node tag;

            for (NodeIterator iterator = parser.elements (); iterator.hasMoreNodes (); )
            {
                tag = (Node) (iterator.nextNode ());

                    if(tag != null)
                    {
            if (tag instanceof StringNode){
                                    StringNode sn  = (StringNode) tag;
                                    System.out.println(sn.getText() );
                            }
                            else if (tag instanceof ImageTag){
                                    ImageTag it  = (ImageTag)tag;
                                    System.out.println("Img: " + it.getImageURL() );
                            }
                            else if (tag instanceof LinkTag){
                                    LinkTag lt  = (LinkTag) tag;
                                    System.out.println(lt.getLinkText() + "\n   ~~~>> " + lt.getLink() );
                            }
                            else if (tag instanceof FormTag){
                                    System.out.println("FORM Tag located" );
                            }

    ............

    This is not all the code, but whatever i am trying to do here doesnot work when i try it.
    I'd like to isolate some tags and deal with them (but in the order in which they appear on the website..not collections of all nodes of that type).

    Please please...help me out here!

    cheers

     
    • Derrick Oswald

      Derrick Oswald - 2004-05-03

      You're missing recursion.
      If a node is an instance of CompositeTag, it can have children. By iterating through the children recursively you can get all nodes. Wrap the contents of your "if" block as a method taking a Node, i.e. doSomething(Node node), then add this to the method:

      if (node instanceof CompositeTag && null != node.getChildren ())
         for (NodeIterator iterator = node.getChildren ().elements (); iterator.hasMoreNodes (); )
              doSomething (iterator.nextNode ());

      If you add it to the end of your method you should get things in the order of the page, i.e. process the current node first, then it's children. You can get end tags separately if you need them.

       
    • Anonymous

      Anonymous - 2004-05-04

      Thanks :) i'll give that I try right away.

      cheers mate!

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.