Thread: RE: [Htmlparser-developer] HTMLParser 1.2 - 1.3
Brought to you by:
derrickoswald
From: <dha...@or...> - 2003-05-14 10:35:43
|
Why does the NodeVisitor not have a method called visitNode() analogous to visitTag()? Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-28290019 Extn. 1457 > -----Original Message----- > From: DerrickOswald [mailto:Der...@ro...] > Sent: Wednesday, May 14, 2003 3:09 AM > To: htmlparser-developer > Cc: DerrickOswald > Subject: Re: [Htmlparser-developer] HTMLParser 1.2 - 1.3 > > > > The visitor paradigm is one way. > See StringBean which implements NodeVisitor. > You could pass all nodes to the same method: > > public void visitTag (Tag tag) { do_something (tag); } > public void visitEndTag (Tag tag) { do_something (tag); } > etc. > void do_something (Node node) { <do something> } > > dha...@or... wrote: > > >Hi, > > > >I had started using HTMLParser version 1.2 sometime in > August of last year. At > >that time the parser had more of a flat structure unlike > today's tree structure > >with parents, children etc. > > > >At that time, I could find all the nodes irrespective of > their depth in the > >following manner: > > > >NodeIterator e = lHTMLParser.elements(); > >while (e.hasMoreNodes()) > >{ > > Node lNode = (Node)e.nextNode(); > > <do something> > >} > > > > > >With the advent of 1.3 the tree structure came in, in which > some nodes were > >inside other nodes. I have registered the scanners whose > tags I want. However > >if these tags are nested within other tags that I have > registered, then the > >above scenario does not work. I need to go deeper. That is > not always feasible. > >Is there any mechanism in 1.3 like the one above using which > I can get all the > >nodes irrespective of their nested level. > > > > > >Regards, > > > >Dhaval Udani > >Senior Analyst > >M-Line, QPEG > >OrbiTech Solutions Ltd. > >+91-22-28290019 Extn. 1457 > > > > > > > >------------------------------------------------------- > >Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > >The only event dedicated to issues related to Linux > enterprise solutions > >www.enterpriselinuxforum.com > > > >_______________________________________________ > >Htmlparser-developer mailing list > >Htm...@li... > >https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > > > > > > > > > ------------------------------------------------------- > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > The only event dedicated to issues related to Linux > enterprise solutions > www.enterpriselinuxforum.com > > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > |
From: <dha...@or...> - 2003-05-14 10:44:53
|
Considering that HTMLParser is a SAX-based parser, it should be possible to have all the nodes at the first level itself as a flat structure. Additionally the embedded nodes should also be referenced as children of other nodes. Am I correct in the understanding or is there something that I have missed out. Would definitely welcome some thoughts out here. Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-28290019 Extn. 1457 > -----Original Message----- > From: DerrickOswald [mailto:Der...@ro...] > Sent: Wednesday, May 14, 2003 3:09 AM > To: htmlparser-developer > Cc: DerrickOswald > Subject: Re: [Htmlparser-developer] HTMLParser 1.2 - 1.3 > > > > The visitor paradigm is one way. > See StringBean which implements NodeVisitor. > You could pass all nodes to the same method: > > public void visitTag (Tag tag) { do_something (tag); } > public void visitEndTag (Tag tag) { do_something (tag); } > etc. > void do_something (Node node) { <do something> } > > dha...@or... wrote: > > >Hi, > > > >I had started using HTMLParser version 1.2 sometime in > August of last year. At > >that time the parser had more of a flat structure unlike > today's tree structure > >with parents, children etc. > > > >At that time, I could find all the nodes irrespective of > their depth in the > >following manner: > > > >NodeIterator e = lHTMLParser.elements(); > >while (e.hasMoreNodes()) > >{ > > Node lNode = (Node)e.nextNode(); > > <do something> > >} > > > > > >With the advent of 1.3 the tree structure came in, in which > some nodes were > >inside other nodes. I have registered the scanners whose > tags I want. However > >if these tags are nested within other tags that I have > registered, then the > >above scenario does not work. I need to go deeper. That is > not always feasible. > >Is there any mechanism in 1.3 like the one above using which > I can get all the > >nodes irrespective of their nested level. > > > > > >Regards, > > > >Dhaval Udani > >Senior Analyst > >M-Line, QPEG > >OrbiTech Solutions Ltd. > >+91-22-28290019 Extn. 1457 > > > > > > > >------------------------------------------------------- > >Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > >The only event dedicated to issues related to Linux > enterprise solutions > >www.enterpriselinuxforum.com > > > >_______________________________________________ > >Htmlparser-developer mailing list > >Htm...@li... > >https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > > > > > > > > > ------------------------------------------------------- > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > The only event dedicated to issues related to Linux > enterprise solutions > www.enterpriselinuxforum.com > > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > |
From: Somik R. <so...@ya...> - 2003-05-14 22:09:19
|
Dhaval Udani wrote: > Considering that HTMLParser is a SAX-based parser, it should be possible to > have all the nodes at the first level itself as a flat structure. Additionally > the embedded nodes should also be referenced as children of other nodes. Am I > correct in the understanding or is there something that I have missed out. This is done with the HtmlScanner - registered when you called registerDomScanners(). Regards, Somik |
From: <dha...@or...> - 2003-05-14 14:30:24
|
We anyway have a NodeVisitor abstract class with certain methods for specific visitors. The default implementation of these methods could call the supertype method. Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-28290019 Extn. 1457 > -----Original Message----- > From: DerrickOswald [mailto:Der...@ro...] > Sent: Wednesday, May 14, 2003 5:08 PM > To: htmlparser-developer > Cc: DerrickOswald > Subject: Re: [Htmlparser-developer] HTMLParser 1.2 - 1.3 > > > > The visitor pattern that currently is in place has 'call > backs' for each > of the types of nodes. > I guess you're suggesting that a 'supertype' callback be added. > I don't think this would be a problem. > The class implementing the visitor interface needs to add one more > method, although it could be vacuous, and the logic behind > the mechanics > has to call two methods, the supertype method and the specific type. > > > dha...@or... wrote: > > >Why does the NodeVisitor not have a method called > visitNode() analogous to > >visitTag()? > > > >Regards, > > > >Dhaval Udani > >Senior Analyst > >M-Line, QPEG > >OrbiTech Solutions Ltd. > >+91-22-28290019 Extn. 1457 > > > > > > > > > > > >>-----Original Message----- > >>From: DerrickOswald [mailto:Der...@ro...] > >>Sent: Wednesday, May 14, 2003 3:09 AM > >>To: htmlparser-developer > >>Cc: DerrickOswald > >>Subject: Re: [Htmlparser-developer] HTMLParser 1.2 - 1.3 > >> > >> > >> > >>The visitor paradigm is one way. > >>See StringBean which implements NodeVisitor. > >>You could pass all nodes to the same method: > >> > >> public void visitTag (Tag tag) { do_something (tag); } > >> public void visitEndTag (Tag tag) { do_something (tag); } > >>etc. > >> void do_something (Node node) { <do something> } > >> > >>dha...@or... wrote: > >> > >> > >> > >>>Hi, > >>> > >>>I had started using HTMLParser version 1.2 sometime in > >>> > >>> > >>August of last year. At > >> > >> > >>>that time the parser had more of a flat structure unlike > >>> > >>> > >>today's tree structure > >> > >> > >>>with parents, children etc. > >>> > >>>At that time, I could find all the nodes irrespective of > >>> > >>> > >>their depth in the > >> > >> > >>>following manner: > >>> > >>>NodeIterator e = lHTMLParser.elements(); > >>>while (e.hasMoreNodes()) > >>>{ > >>> Node lNode = (Node)e.nextNode(); > >>> <do something> > >>>} > >>> > >>> > >>>With the advent of 1.3 the tree structure came in, in which > >>> > >>> > >>some nodes were > >> > >> > >>>inside other nodes. I have registered the scanners whose > >>> > >>> > >>tags I want. However > >> > >> > >>>if these tags are nested within other tags that I have > >>> > >>> > >>registered, then the > >> > >> > >>>above scenario does not work. I need to go deeper. That is > >>> > >>> > >>not always feasible. > >> > >> > >>>Is there any mechanism in 1.3 like the one above using which > >>> > >>> > >>I can get all the > >> > >> > >>>nodes irrespective of their nested level. > >>> > >>> > >>>Regards, > >>> > >>>Dhaval Udani > >>>Senior Analyst > >>>M-Line, QPEG > >>>OrbiTech Solutions Ltd. > >>>+91-22-28290019 Extn. 1457 > >>> > >>> > >>> > >>>------------------------------------------------------- > >>>Enterprise Linux Forum Conference & Expo, June 4-6, 2003, > Santa Clara > >>>The only event dedicated to issues related to Linux > >>> > >>> > >>enterprise solutions > >> > >> > >>>www.enterpriselinuxforum.com > >>> > >>>_______________________________________________ > >>>Htmlparser-developer mailing list > >>>Htm...@li... > >>>https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > >>> > >>> > >>> > >>> > >>> > >> > >> > >>------------------------------------------------------- > >>Enterprise Linux Forum Conference & Expo, June 4-6, 2003, > Santa Clara > >>The only event dedicated to issues related to Linux > >>enterprise solutions > >>www.enterpriselinuxforum.com > >> > >>_______________________________________________ > >>Htmlparser-developer mailing list > >>Htm...@li... > >>https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > >> > >> > >> > > > > > > > >------------------------------------------------------- > >Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > >The only event dedicated to issues related to Linux > enterprise solutions > >www.enterpriselinuxforum.com > > > >_______________________________________________ > >Htmlparser-developer mailing list > >Htm...@li... > >https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > > > > > > > > > ------------------------------------------------------- > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > The only event dedicated to issues related to Linux > enterprise solutions > www.enterpriselinuxforum.com > > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > |
From: <dha...@or...> - 2003-05-15 03:59:18
|
> -----Original Message----- > From: somik [mailto:so...@ya...] > Sent: Thursday, May 15, 2003 3:37 AM > To: htmlparser-developer > Cc: somik > Subject: Re: [Htmlparser-developer] HTMLParser 1.2 - 1.3 > > > Dhaval Udani wrote: > > Why does the NodeVisitor not have a method called > visitNode() analogous to > > visitTag()? > > I have the same question for you - why do you want > visitNode() ? Simply bcos > it can be there, or bcos it solves a problem. > I thought that since Tag extends from Node, a visitNode() method would be appropriate. Apart from that if I want to search for some tags that I have not registered (say a <HTML> tag or a <HEAD> tag) I could use the visitNode mechanism to do it. A similar argument can arise as to why have visitTag() at all when you have specialized visitImageTag() and visitLinkTag() etc. Dhaval |
From: Somik R. <so...@ya...> - 2003-05-16 02:12:55
|
Dhaval Udani wrote: > I thought that since Tag extends from Node, a visitNode() method would be > appropriate. Apart from that if I want to search for some tags that I have not > registered (say a <HTML> tag or a <HEAD> tag) I could use the visitNode > mechanism to do it. No - if you want to search for a "Tag" - you would use visitTag(). Everything is a Tag. An EndTag is a Tag. All Tags are nodes. A StringNode and RemarkNode are also nodes. What node is it that you wish to search for? > A similar argument can arise as to why have visitTag() at all when you have > specialized visitImageTag() and visitLinkTag() etc. We have visitTag() so that you can visit all tags - that are not links, or images (inclusive too, if you wish). Regards, Somik |
From: Derrick O. <Der...@ro...> - 2003-05-14 11:48:47
|
The visitor pattern that currently is in place has 'call backs' for each of the types of nodes. I guess you're suggesting that a 'supertype' callback be added. I don't think this would be a problem. The class implementing the visitor interface needs to add one more method, although it could be vacuous, and the logic behind the mechanics has to call two methods, the supertype method and the specific type. dha...@or... wrote: >Why does the NodeVisitor not have a method called visitNode() analogous to >visitTag()? > >Regards, > >Dhaval Udani >Senior Analyst >M-Line, QPEG >OrbiTech Solutions Ltd. >+91-22-28290019 Extn. 1457 > > > > > >>-----Original Message----- >>From: DerrickOswald [mailto:Der...@ro...] >>Sent: Wednesday, May 14, 2003 3:09 AM >>To: htmlparser-developer >>Cc: DerrickOswald >>Subject: Re: [Htmlparser-developer] HTMLParser 1.2 - 1.3 >> >> >> >>The visitor paradigm is one way. >>See StringBean which implements NodeVisitor. >>You could pass all nodes to the same method: >> >> public void visitTag (Tag tag) { do_something (tag); } >> public void visitEndTag (Tag tag) { do_something (tag); } >>etc. >> void do_something (Node node) { <do something> } >> >>dha...@or... wrote: >> >> >> >>>Hi, >>> >>>I had started using HTMLParser version 1.2 sometime in >>> >>> >>August of last year. At >> >> >>>that time the parser had more of a flat structure unlike >>> >>> >>today's tree structure >> >> >>>with parents, children etc. >>> >>>At that time, I could find all the nodes irrespective of >>> >>> >>their depth in the >> >> >>>following manner: >>> >>>NodeIterator e = lHTMLParser.elements(); >>>while (e.hasMoreNodes()) >>>{ >>> Node lNode = (Node)e.nextNode(); >>> <do something> >>>} >>> >>> >>>With the advent of 1.3 the tree structure came in, in which >>> >>> >>some nodes were >> >> >>>inside other nodes. I have registered the scanners whose >>> >>> >>tags I want. However >> >> >>>if these tags are nested within other tags that I have >>> >>> >>registered, then the >> >> >>>above scenario does not work. I need to go deeper. That is >>> >>> >>not always feasible. >> >> >>>Is there any mechanism in 1.3 like the one above using which >>> >>> >>I can get all the >> >> >>>nodes irrespective of their nested level. >>> >>> >>>Regards, >>> >>>Dhaval Udani >>>Senior Analyst >>>M-Line, QPEG >>>OrbiTech Solutions Ltd. >>>+91-22-28290019 Extn. 1457 >>> >>> >>> >>>------------------------------------------------------- >>>Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara >>>The only event dedicated to issues related to Linux >>> >>> >>enterprise solutions >> >> >>>www.enterpriselinuxforum.com >>> >>>_______________________________________________ >>>Htmlparser-developer mailing list >>>Htm...@li... >>>https://lists.sourceforge.net/lists/listinfo/htmlparser-developer >>> >>> >>> >>> >>> >> >> >>------------------------------------------------------- >>Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara >>The only event dedicated to issues related to Linux >>enterprise solutions >>www.enterpriselinuxforum.com >> >>_______________________________________________ >>Htmlparser-developer mailing list >>Htm...@li... >>https://lists.sourceforge.net/lists/listinfo/htmlparser-developer >> >> >> > > > >------------------------------------------------------- >Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara >The only event dedicated to issues related to Linux enterprise solutions >www.enterpriselinuxforum.com > >_______________________________________________ >Htmlparser-developer mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > |
From: Somik R. <so...@ya...> - 2003-05-14 22:07:28
|
Dhaval Udani wrote: > Why does the NodeVisitor not have a method called visitNode() analogous to > visitTag()? I have the same question for you - why do you want visitNode() ? Simply bcos it can be there, or bcos it solves a problem. Regards, Somik ----- Original Message ----- From: <dha...@or...> To: <htm...@li...> Sent: Wednesday, May 14, 2003 6:34 AM Subject: RE: [Htmlparser-developer] HTMLParser 1.2 - 1.3 > Why does the NodeVisitor not have a method called visitNode() analogous to > visitTag()? > > Regards, > > Dhaval Udani > Senior Analyst > M-Line, QPEG > OrbiTech Solutions Ltd. > +91-22-28290019 Extn. 1457 > > > > > -----Original Message----- > > From: DerrickOswald [mailto:Der...@ro...] > > Sent: Wednesday, May 14, 2003 3:09 AM > > To: htmlparser-developer > > Cc: DerrickOswald > > Subject: Re: [Htmlparser-developer] HTMLParser 1.2 - 1.3 > > > > > > > > The visitor paradigm is one way. > > See StringBean which implements NodeVisitor. > > You could pass all nodes to the same method: > > > > public void visitTag (Tag tag) { do_something (tag); } > > public void visitEndTag (Tag tag) { do_something (tag); } > > etc. > > void do_something (Node node) { <do something> } > > > > dha...@or... wrote: > > > > >Hi, > > > > > >I had started using HTMLParser version 1.2 sometime in > > August of last year. At > > >that time the parser had more of a flat structure unlike > > today's tree structure > > >with parents, children etc. > > > > > >At that time, I could find all the nodes irrespective of > > their depth in the > > >following manner: > > > > > >NodeIterator e = lHTMLParser.elements(); > > >while (e.hasMoreNodes()) > > >{ > > > Node lNode = (Node)e.nextNode(); > > > <do something> > > >} > > > > > > > > >With the advent of 1.3 the tree structure came in, in which > > some nodes were > > >inside other nodes. I have registered the scanners whose > > tags I want. However > > >if these tags are nested within other tags that I have > > registered, then the > > >above scenario does not work. I need to go deeper. That is > > not always feasible. > > >Is there any mechanism in 1.3 like the one above using which > > I can get all the > > >nodes irrespective of their nested level. > > > > > > > > >Regards, > > > > > >Dhaval Udani > > >Senior Analyst > > >M-Line, QPEG > > >OrbiTech Solutions Ltd. > > >+91-22-28290019 Extn. 1457 > > > > > > > > > > > >------------------------------------------------------- > > >Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > > >The only event dedicated to issues related to Linux > > enterprise solutions > > >www.enterpriselinuxforum.com > > > > > >_______________________________________________ > > >Htmlparser-developer mailing list > > >Htm...@li... > > >https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > > > > > > > > > > > > > > > > > ------------------------------------------------------- > > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > > The only event dedicated to issues related to Linux > > enterprise solutions > > www.enterpriselinuxforum.com > > > > _______________________________________________ > > Htmlparser-developer mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > > > > ------------------------------------------------------- > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > The only event dedicated to issues related to Linux enterprise solutions > www.enterpriselinuxforum.com > > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |