Re: [Htmlparser-user] Help needed

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Dipesh,

The NodeIterator will only be the top level nodes if you use the unadulterated Parser. For example:
<html>
  <head>
  </head>
  <body>
  </body>
</html>

yields only one top level node... the <HTML> node, All other nodes are children of the top level node.
The equals() method is unlikely to work in any case. I don't believe it's implemented anywhere in the node class hierarchy.
Maybe it should.

If your intent is just to check element for element equality, I would instead suggest the Lexer class.
(Which is what you get if you set the NodeFactory property on the parser to new PrototypicalNodeFactory (true);)
The Lexer has a nextNode() method that will retrieve the nodes in a flat sequence.
Then I would use:
  if (e1.toHtml ().equals (e2.toHtml ()))
to compare the original HTML strings.
But then, you are responsible for syncing up in case of injected nodes, which may be what you were trying to avoid.
Otherwise you could just do a string comparison of the two entire HTML pages.

In the case of a mismatch, you could submit a suitable portion of the page to the parser and see if it can figure out the nesting for you, but that sounds inefficient. Depends on how many pages you need to process.

Derrick

----- Original Message ----
From: Dipesh Sharma <dip...@re...>
To: der...@ro...
Sent: Tuesday, April 24, 2007 10:53:31 PM
Subject: Help needed

Hi Derrick,

A few days ago i had mailed for help, but none of the replies really helped me. Plz tell me if this can be acheived at all, and if so how? I'll be gateful to you. I'm trying to compare the html tag nodes of 2 different web pages by taking one node at a time. Hence, I need to compare the 1st node of the 2 web pages, then go to 2nd nodes and compare and so on. Could you plz help me how i can achieve this. I've tried to use Node iterator but haven't been successfull. Attached is my code.

import org.htmlparser.Parser;

    import org.htmlparser.util.NodeList;

     import org.htmlparser.util.ParserException;

     import org.htmlparser.beans.StringBean;

    import org.htmlparser.filters.TagNameFilter;

       import org.htmlparser.util.*;

       import      org.htmlparser.*;

       import org.htmlparser.filters.AndFilter;

       import org.htmlparser.filters.HasParentFilter;

    class Test

    {

        public static void main (String[] args)

        {

            try

            {

                 Parser parser1 = new Parser ("http://www.deals2buy.com");

                 Parser parser2 = new Parser ("http://www.deals.com");

                 NodeIterator e1 = parser1.elements ();

                 NodeIterator e2 = parser2.elements ();

                 while(e1.hasMoreNodes() && e2.hasMoreNodes())

                 {

                     if (e1.equals(e2))

                          System.out.println ("Yes");

                      else

                           System.out.println ("No");

                 }

            }

            catch (ParserException pe)

            {

                pe.printStackTrace ();

            }

        }

    }