Thread: [Htmlparser-developer] NodeIterator and SimpleNodeIterator
Brought to you by:
derrickoswald
From: Joseph R. <jmr...@tg...> - 2003-04-02 19:37:38
|
After finding myself poking through more and more of the code for the HTML parser, and needing to fix bugs, I've decided to join this list. And I figure that there's no better way to join the list than with a question. :-) (I've looked through the archives, and I don't see the answer in there. Sorry if this was discussed before and I missed it.) Is there a reason that the iterators that Parser.elements() and CompositeTag.children() are different classes, and are incompatible? I wanted to write some code along the lines of: -------------------------------------------------------------- Parser parser = new Parser(url); NodeIterator iter = parser.elements(); doParse(iter); ... private void doParse(NodeIterator iter) { while(iter.hasMoreNodes()) { Node node = iter.nextNode(); doStuff(); if(node instanceof CompositeTag) { doParse(((CompositeTag)node).children()); } } } -------------------------------------------------------------- Unfortunately, because the iterators are different (and don't even share a superclass), I can't do this, and have to duplicate my doParse method with two different signatures. This seems like a natural thing to want to do. For example, when parsing a page, a form tag might contain a lot of other elements (text, links, etc.) in it that we want to get, and the only way to do that is to iterate inside. _____________________________________________________________ Joe Robins Tel: 212-918-5057 Thaumaturgix, Inc. Fax: 212-918-5001 19 W. 44th St., 18th Floor Email: jmr...@tg... New York, NY 10036 http://www.tgix.com thau'ma-tur-gy, n. the working of miracles. |
From: Somik R. <so...@ya...> - 2003-04-03 04:08:50
|
Hi Joseph, The difference is subtle. NodeIterator throws exceptions. SimpleNodeIterator does not. This was bcos a SimpleNodeIterator used inside a collection would not need to throw parser exception, as it is not parsing - just iterating. However, a NodeIterator requires its implementations to throw exceptions depending on the parse. One solution to your problem could be - have NodeList return a NodeIterator as well. What do you think ? Regards, Somik --- Joseph Robins <jmr...@tg...> wrote: > After finding myself poking through more and more of > the code for the > HTML parser, and needing to fix bugs, I've decided > to join this list. > And I figure that there's no better way to join the > list than with a > question. :-) (I've looked through the archives, > and I don't see the > answer in there. Sorry if this was discussed before > and I missed it.) > > Is there a reason that the iterators that > Parser.elements() and > CompositeTag.children() are different classes, and > are incompatible? I > wanted to write some code along the lines of: > > -------------------------------------------------------------- > > Parser parser = new Parser(url); > NodeIterator iter = parser.elements(); > doParse(iter); > > ... > > private void doParse(NodeIterator iter) { > while(iter.hasMoreNodes()) { > Node node = iter.nextNode(); > doStuff(); > if(node instanceof CompositeTag) { > doParse(((CompositeTag)node).children()); > } > } > } > > -------------------------------------------------------------- > > Unfortunately, because the iterators are different > (and don't even share > a superclass), I can't do this, and have to > duplicate my doParse method > with two different signatures. > > This seems like a natural thing to want to do. For > example, when > parsing a page, a form tag might contain a lot of > other elements (text, > links, etc.) in it that we want to get, and the only > way to do that is > to iterate inside. > > _____________________________________________________________ > Joe Robins Tel: 212-918-5057 > Thaumaturgix, Inc. Fax: 212-918-5001 > 19 W. 44th St., 18th Floor Email: jmr...@tg... > New York, NY 10036 http://www.tgix.com > > thau'ma-tur-gy, n. the working of miracles. > > > > ------------------------------------------------------- > This SF.net email is sponsored by: ValueWeb: > Dedicated Hosting for just $79/mo with 500 GB of > bandwidth! > No other company gives more support or power for > your dedicated server > http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/ > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer __________________________________________________ Do you Yahoo!? Yahoo! Tax Center - File online, calculators, forms, and more http://tax.yahoo.com |
From: Joseph R. <jmr...@tg...> - 2003-04-04 19:10:20
|
Somik Raha wrote: > The difference is subtle. NodeIterator throws > exceptions. SimpleNodeIterator does not. This was bcos > a SimpleNodeIterator used inside a collection would > not need to throw parser exception, as it is not > parsing - just iterating. However, a NodeIterator > requires its implementations to throw exceptions > depending on the parse. > One solution to your problem could be - have > NodeList return a NodeIterator as well. What do you > think ? You mean have a second method which would return a NodeIterator instead of a simpleNodeIterator? That would work. Then I could just use that method instead of the existing one when I wanted to recurse, and all would be fine. _____________________________________________________________ Joe Robins Tel: 212-918-5057 Thaumaturgix, Inc. Fax: 212-918-5001 19 W. 44th St., 18th Floor Email: jmr...@tg... New York, NY 10036 http://www.tgix.com thau'ma-tur-gy, n. the working of miracles. |