htmlparser-developer Mailing List for HTML Parser (Page 11)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(4) |
Nov
(1) |
Dec
(4) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(12) |
Feb
|
Mar
(7) |
Apr
(27) |
May
(14) |
Jun
(16) |
Jul
(27) |
Aug
(74) |
Sep
(1) |
Oct
(23) |
Nov
(12) |
Dec
(119) |
2003 |
Jan
(31) |
Feb
(23) |
Mar
(28) |
Apr
(59) |
May
(119) |
Jun
(10) |
Jul
(3) |
Aug
(17) |
Sep
(8) |
Oct
(38) |
Nov
(6) |
Dec
(1) |
2004 |
Jan
(4) |
Feb
(4) |
Mar
(1) |
Apr
(2) |
May
|
Jun
(7) |
Jul
(6) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2005 |
Jan
|
Feb
(1) |
Mar
|
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
(10) |
Oct
(4) |
Nov
(15) |
Dec
|
2006 |
Jan
|
Feb
(1) |
Mar
|
Apr
(4) |
May
(11) |
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2007 |
Jan
(3) |
Feb
(2) |
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2008 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(5) |
Oct
(1) |
Nov
|
Dec
|
2009 |
Jan
|
Feb
(1) |
Mar
|
Apr
(2) |
May
|
Jun
(4) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
(2) |
2010 |
Jan
(1) |
Feb
|
Mar
|
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(6) |
Oct
|
Nov
(1) |
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(1) |
2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
From: Somik R. <so...@ya...> - 2003-05-16 02:14:44
|
> Slight problem here. Suppose I am parsing for some comments above the HTML or > say that I have a JSP, then it is quite likely to have values at the same level > as <HTML>. Do you think it makes sense to ahve a implicit root-level within the > Parser whose one child would be everything under <HTML> and all other tags at > the same level. I'd suggest the user-story driven approach. Do we have a "real" scenario where someone would benefit from this? No speculations :) Regards, Somik |
From: Somik R. <so...@ya...> - 2003-05-16 02:12:55
|
Dhaval Udani wrote: > I thought that since Tag extends from Node, a visitNode() method would be > appropriate. Apart from that if I want to search for some tags that I have not > registered (say a <HTML> tag or a <HEAD> tag) I could use the visitNode > mechanism to do it. No - if you want to search for a "Tag" - you would use visitTag(). Everything is a Tag. An EndTag is a Tag. All Tags are nodes. A StringNode and RemarkNode are also nodes. What node is it that you wish to search for? > A similar argument can arise as to why have visitTag() at all when you have > specialized visitImageTag() and visitLinkTag() etc. We have visitTag() so that you can visit all tags - that are not links, or images (inclusive too, if you wish). Regards, Somik |
From: <dha...@or...> - 2003-05-15 06:08:02
|
> > Dhaval Udani wrote: > > Considering that HTMLParser is a SAX-based parser, it > should be possible > to > > have all the nodes at the first level itself as a flat structure. > Additionally > > the embedded nodes should also be referenced as children of > other nodes. > Am I > > correct in the understanding or is there something that I > have missed out. > > This is done with the HtmlScanner - registered when you called > registerDomScanners(). > Somik, I am slightly confused out here. Even if I register HtmlScanner, how will I get a flat structrue of all the nodes. Will I yet not get a tree-like representation and I will have to parse through the children of all of them. Dhaval |
From: <dha...@or...> - 2003-05-15 05:50:27
|
Hi, In the documentation of CompositeTagScanner there is mention of ENDERS & END_TAG_ENDERS string arrays. Can someone tell me the difference between the two? Also the documentation makes a call to a 4 argument constructor with first argument as string array. I don't see any suc constructor in the code. Dhaval |
From: <dha...@or...> - 2003-05-15 05:44:59
|
> > Dhaval Udani wrote: > > Yeah i understand that. The problem being that currently > such a situation > > cannot be envisaged. However it may prove beneficial to > other scanner > writers > > if they ever come up with such scenarios. > > That is the classic definition of over-engineering.. Of > course, you are not > bound to not over-engineer, IMHO. :) > Well the situation just came up. Assume a <HEAD> tag which is not closed. It needs to be closed when a <BODY> tag is encountered. Hence BODY would be in the STARTERS array for HEAD. |
From: <dha...@or...> - 2003-05-15 04:05:52
|
Derrick, Is it possible to bring some consistency into the APIs, as per my mail sent earlier (which i am attaching here as well) before making the final release. Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-28290019 Extn. 1457 > -----Original Message----- > From: DerrickOswald [mailto:Der...@ro...] > Sent: Thursday, May 15, 2003 7:56 AM > To: htmlparser-developer > Cc: DerrickOswald > Subject: [Htmlparser-developer] v1.3 > > > From the absence of bug reports, and to my knowledge, no > pending code > submissions, it looks like candidate release 3, 1.3-20030511 can be > relabled as the final release of version 1.3. > > Any objections, last minute code drops or documentation changes? > Or can we move on and call the next integration build version 1.4? > > > > > > ------------------------------------------------------- > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > The only event dedicated to issues related to Linux > enterprise solutions > www.enterpriselinuxforum.com > > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > |
From: <dha...@or...> - 2003-05-15 04:03:07
|
> > Use the HtmlScanner, to get the first level nodes under html. > If you don't - > there is no such thing as a "root node" to add children to. Slight problem here. Suppose I am parsing for some comments above the HTML or say that I have a JSP, then it is quite likely to have values at the same level as <HTML>. Do you think it makes sense to ahve a implicit root-level within the Parser whose one child would be everything under <HTML> and all other tags at the same level. Dhaval |
From: <dha...@or...> - 2003-05-15 03:59:18
|
> -----Original Message----- > From: somik [mailto:so...@ya...] > Sent: Thursday, May 15, 2003 3:37 AM > To: htmlparser-developer > Cc: somik > Subject: Re: [Htmlparser-developer] HTMLParser 1.2 - 1.3 > > > Dhaval Udani wrote: > > Why does the NodeVisitor not have a method called > visitNode() analogous to > > visitTag()? > > I have the same question for you - why do you want > visitNode() ? Simply bcos > it can be there, or bcos it solves a problem. > I thought that since Tag extends from Node, a visitNode() method would be appropriate. Apart from that if I want to search for some tags that I have not registered (say a <HTML> tag or a <HEAD> tag) I could use the visitNode mechanism to do it. A similar argument can arise as to why have visitTag() at all when you have specialized visitImageTag() and visitLinkTag() etc. Dhaval |
From: <dha...@or...> - 2003-05-15 03:56:34
|
> -----Original Message----- > From: somik [mailto:so...@ya...] > Sent: Thursday, May 15, 2003 3:36 AM > To: htmlparser-developer > Cc: somik > Subject: Re: [Htmlparser-developer] CompositeTagScanner - > Some comments > > > Dhaval Udani wrote: > > Yeah i understand that. The problem being that currently > such a situation > > cannot be envisaged. However it may prove beneficial to > other scanner > writers > > if they ever come up with such scenarios. > > That is the classic definition of over-engineering.. Of > course, you are not > bound to not over-engineer, IMHO. :) > ha..ha...ha :) Guess I've fallen in the trap :) |
From: Derrick O. <Der...@ro...> - 2003-05-15 02:36:47
|
From the absence of bug reports, and to my knowledge, no pending code submissions, it looks like candidate release 3, 1.3-20030511 can be relabled as the final release of version 1.3. Any objections, last minute code drops or documentation changes? Or can we move on and call the next integration build version 1.4? |
From: Somik R. <so...@ya...> - 2003-05-14 22:12:00
|
Dhaval Udani wrote: > Another problem: > > One more thing that I noticed: > > At the base level Parser.elements() gives me a list of all the first-level > nodes. Using any of the first level nodes I may obtain its next-level children > using CompositeTag.getChildren(). This being a NodeList I can easily add new > elements(nodes, tags etc) here. However I cannot add the same at the first > level. Use the HtmlScanner, to get the first level nodes under html. If you don't - there is no such thing as a "root node" to add children to. Regards, Somik |
From: Somik R. <so...@ya...> - 2003-05-14 22:09:19
|
Dhaval Udani wrote: > Considering that HTMLParser is a SAX-based parser, it should be possible to > have all the nodes at the first level itself as a flat structure. Additionally > the embedded nodes should also be referenced as children of other nodes. Am I > correct in the understanding or is there something that I have missed out. This is done with the HtmlScanner - registered when you called registerDomScanners(). Regards, Somik |
From: Somik R. <so...@ya...> - 2003-05-14 22:07:28
|
Dhaval Udani wrote: > Why does the NodeVisitor not have a method called visitNode() analogous to > visitTag()? I have the same question for you - why do you want visitNode() ? Simply bcos it can be there, or bcos it solves a problem. Regards, Somik ----- Original Message ----- From: <dha...@or...> To: <htm...@li...> Sent: Wednesday, May 14, 2003 6:34 AM Subject: RE: [Htmlparser-developer] HTMLParser 1.2 - 1.3 > Why does the NodeVisitor not have a method called visitNode() analogous to > visitTag()? > > Regards, > > Dhaval Udani > Senior Analyst > M-Line, QPEG > OrbiTech Solutions Ltd. > +91-22-28290019 Extn. 1457 > > > > > -----Original Message----- > > From: DerrickOswald [mailto:Der...@ro...] > > Sent: Wednesday, May 14, 2003 3:09 AM > > To: htmlparser-developer > > Cc: DerrickOswald > > Subject: Re: [Htmlparser-developer] HTMLParser 1.2 - 1.3 > > > > > > > > The visitor paradigm is one way. > > See StringBean which implements NodeVisitor. > > You could pass all nodes to the same method: > > > > public void visitTag (Tag tag) { do_something (tag); } > > public void visitEndTag (Tag tag) { do_something (tag); } > > etc. > > void do_something (Node node) { <do something> } > > > > dha...@or... wrote: > > > > >Hi, > > > > > >I had started using HTMLParser version 1.2 sometime in > > August of last year. At > > >that time the parser had more of a flat structure unlike > > today's tree structure > > >with parents, children etc. > > > > > >At that time, I could find all the nodes irrespective of > > their depth in the > > >following manner: > > > > > >NodeIterator e = lHTMLParser.elements(); > > >while (e.hasMoreNodes()) > > >{ > > > Node lNode = (Node)e.nextNode(); > > > <do something> > > >} > > > > > > > > >With the advent of 1.3 the tree structure came in, in which > > some nodes were > > >inside other nodes. I have registered the scanners whose > > tags I want. However > > >if these tags are nested within other tags that I have > > registered, then the > > >above scenario does not work. I need to go deeper. That is > > not always feasible. > > >Is there any mechanism in 1.3 like the one above using which > > I can get all the > > >nodes irrespective of their nested level. > > > > > > > > >Regards, > > > > > >Dhaval Udani > > >Senior Analyst > > >M-Line, QPEG > > >OrbiTech Solutions Ltd. > > >+91-22-28290019 Extn. 1457 > > > > > > > > > > > >------------------------------------------------------- > > >Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > > >The only event dedicated to issues related to Linux > > enterprise solutions > > >www.enterpriselinuxforum.com > > > > > >_______________________________________________ > > >Htmlparser-developer mailing list > > >Htm...@li... > > >https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > > > > > > > > > > > > > > > > > ------------------------------------------------------- > > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > > The only event dedicated to issues related to Linux > > enterprise solutions > > www.enterpriselinuxforum.com > > > > _______________________________________________ > > Htmlparser-developer mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > > > > ------------------------------------------------------- > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > The only event dedicated to issues related to Linux enterprise solutions > www.enterpriselinuxforum.com > > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Somik R. <so...@ya...> - 2003-05-14 22:06:15
|
Dhaval Udani wrote: > Yeah i understand that. The problem being that currently such a situation > cannot be envisaged. However it may prove beneficial to other scanner writers > if they ever come up with such scenarios. That is the classic definition of over-engineering.. Of course, you are not bound to not over-engineer, IMHO. :) Regards, Somik |
From: <dha...@or...> - 2003-05-14 14:43:09
|
Another problem: One more thing that I noticed: At the base level Parser.elements() gives me a list of all the first-level nodes. Using any of the first level nodes I may obtain its next-level children using CompositeTag.getChildren(). This being a NodeList I can easily add new elements(nodes, tags etc) here. However I cannot add the same at the first level. Can anything be done about it? Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-28290019 Extn. 1457 > -----Original Message----- > From: Udani, Dhaval H. > Sent: Wednesday, May 14, 2003 7:45 PM > To: htmlparser-developer > Cc: Udani, Dhaval H. > Subject: [Htmlparser-developer] SimpleNodeIteraotr > > > Hi, > > Since SimpleNodeIterator is basically the same as > NodeIterator, I think it > should just extend NodeIterator and redefine the methods > without throwing the > exceptions. Either that way or make NodeIterator extend > SimpleNodeIterator. I > think its better design and leads to more uniform code. > > Secondly, some anomalies that I noted. Not really serious but > just from a more > consistent viewpoint. > > Parser.elements() returns a NodeIterator > > while CompositeTag.getChildren() returns a NodeList. > > Ideally since both are returning the nodes, a common > mechanism should be used. > > Also NodeList.elements() returns a SimpleNodeIterator. This > should also be > synchronized with the above so that one common class is returned. > > This is pretty typical of Java classes : Vector has add() > method and Hashtable > has put() method clearly showing that both were written > separately. There are > many more such examples actually. We should try and avoid it. > > Dhaval > > > > ------------------------------------------------------- > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > The only event dedicated to issues related to Linux > enterprise solutions > www.enterpriselinuxforum.com > > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > |
From: <dha...@or...> - 2003-05-14 14:30:24
|
We anyway have a NodeVisitor abstract class with certain methods for specific visitors. The default implementation of these methods could call the supertype method. Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-28290019 Extn. 1457 > -----Original Message----- > From: DerrickOswald [mailto:Der...@ro...] > Sent: Wednesday, May 14, 2003 5:08 PM > To: htmlparser-developer > Cc: DerrickOswald > Subject: Re: [Htmlparser-developer] HTMLParser 1.2 - 1.3 > > > > The visitor pattern that currently is in place has 'call > backs' for each > of the types of nodes. > I guess you're suggesting that a 'supertype' callback be added. > I don't think this would be a problem. > The class implementing the visitor interface needs to add one more > method, although it could be vacuous, and the logic behind > the mechanics > has to call two methods, the supertype method and the specific type. > > > dha...@or... wrote: > > >Why does the NodeVisitor not have a method called > visitNode() analogous to > >visitTag()? > > > >Regards, > > > >Dhaval Udani > >Senior Analyst > >M-Line, QPEG > >OrbiTech Solutions Ltd. > >+91-22-28290019 Extn. 1457 > > > > > > > > > > > >>-----Original Message----- > >>From: DerrickOswald [mailto:Der...@ro...] > >>Sent: Wednesday, May 14, 2003 3:09 AM > >>To: htmlparser-developer > >>Cc: DerrickOswald > >>Subject: Re: [Htmlparser-developer] HTMLParser 1.2 - 1.3 > >> > >> > >> > >>The visitor paradigm is one way. > >>See StringBean which implements NodeVisitor. > >>You could pass all nodes to the same method: > >> > >> public void visitTag (Tag tag) { do_something (tag); } > >> public void visitEndTag (Tag tag) { do_something (tag); } > >>etc. > >> void do_something (Node node) { <do something> } > >> > >>dha...@or... wrote: > >> > >> > >> > >>>Hi, > >>> > >>>I had started using HTMLParser version 1.2 sometime in > >>> > >>> > >>August of last year. At > >> > >> > >>>that time the parser had more of a flat structure unlike > >>> > >>> > >>today's tree structure > >> > >> > >>>with parents, children etc. > >>> > >>>At that time, I could find all the nodes irrespective of > >>> > >>> > >>their depth in the > >> > >> > >>>following manner: > >>> > >>>NodeIterator e = lHTMLParser.elements(); > >>>while (e.hasMoreNodes()) > >>>{ > >>> Node lNode = (Node)e.nextNode(); > >>> <do something> > >>>} > >>> > >>> > >>>With the advent of 1.3 the tree structure came in, in which > >>> > >>> > >>some nodes were > >> > >> > >>>inside other nodes. I have registered the scanners whose > >>> > >>> > >>tags I want. However > >> > >> > >>>if these tags are nested within other tags that I have > >>> > >>> > >>registered, then the > >> > >> > >>>above scenario does not work. I need to go deeper. That is > >>> > >>> > >>not always feasible. > >> > >> > >>>Is there any mechanism in 1.3 like the one above using which > >>> > >>> > >>I can get all the > >> > >> > >>>nodes irrespective of their nested level. > >>> > >>> > >>>Regards, > >>> > >>>Dhaval Udani > >>>Senior Analyst > >>>M-Line, QPEG > >>>OrbiTech Solutions Ltd. > >>>+91-22-28290019 Extn. 1457 > >>> > >>> > >>> > >>>------------------------------------------------------- > >>>Enterprise Linux Forum Conference & Expo, June 4-6, 2003, > Santa Clara > >>>The only event dedicated to issues related to Linux > >>> > >>> > >>enterprise solutions > >> > >> > >>>www.enterpriselinuxforum.com > >>> > >>>_______________________________________________ > >>>Htmlparser-developer mailing list > >>>Htm...@li... > >>>https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > >>> > >>> > >>> > >>> > >>> > >> > >> > >>------------------------------------------------------- > >>Enterprise Linux Forum Conference & Expo, June 4-6, 2003, > Santa Clara > >>The only event dedicated to issues related to Linux > >>enterprise solutions > >>www.enterpriselinuxforum.com > >> > >>_______________________________________________ > >>Htmlparser-developer mailing list > >>Htm...@li... > >>https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > >> > >> > >> > > > > > > > >------------------------------------------------------- > >Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > >The only event dedicated to issues related to Linux > enterprise solutions > >www.enterpriselinuxforum.com > > > >_______________________________________________ > >Htmlparser-developer mailing list > >Htm...@li... > >https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > > > > > > > > > ------------------------------------------------------- > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > The only event dedicated to issues related to Linux > enterprise solutions > www.enterpriselinuxforum.com > > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > |
From: <dha...@or...> - 2003-05-14 14:16:48
|
Hi, Since SimpleNodeIterator is basically the same as NodeIterator, I think it should just extend NodeIterator and redefine the methods without throwing the exceptions. Either that way or make NodeIterator extend SimpleNodeIterator. I think its better design and leads to more uniform code. Secondly, some anomalies that I noted. Not really serious but just from a more consistent viewpoint. Parser.elements() returns a NodeIterator while CompositeTag.getChildren() returns a NodeList. Ideally since both are returning the nodes, a common mechanism should be used. Also NodeList.elements() returns a SimpleNodeIterator. This should also be synchronized with the above so that one common class is returned. This is pretty typical of Java classes : Vector has add() method and Hashtable has put() method clearly showing that both were written separately. There are many more such examples actually. We should try and avoid it. Dhaval |
From: Derrick O. <Der...@ro...> - 2003-05-14 11:48:47
|
The visitor pattern that currently is in place has 'call backs' for each of the types of nodes. I guess you're suggesting that a 'supertype' callback be added. I don't think this would be a problem. The class implementing the visitor interface needs to add one more method, although it could be vacuous, and the logic behind the mechanics has to call two methods, the supertype method and the specific type. dha...@or... wrote: >Why does the NodeVisitor not have a method called visitNode() analogous to >visitTag()? > >Regards, > >Dhaval Udani >Senior Analyst >M-Line, QPEG >OrbiTech Solutions Ltd. >+91-22-28290019 Extn. 1457 > > > > > >>-----Original Message----- >>From: DerrickOswald [mailto:Der...@ro...] >>Sent: Wednesday, May 14, 2003 3:09 AM >>To: htmlparser-developer >>Cc: DerrickOswald >>Subject: Re: [Htmlparser-developer] HTMLParser 1.2 - 1.3 >> >> >> >>The visitor paradigm is one way. >>See StringBean which implements NodeVisitor. >>You could pass all nodes to the same method: >> >> public void visitTag (Tag tag) { do_something (tag); } >> public void visitEndTag (Tag tag) { do_something (tag); } >>etc. >> void do_something (Node node) { <do something> } >> >>dha...@or... wrote: >> >> >> >>>Hi, >>> >>>I had started using HTMLParser version 1.2 sometime in >>> >>> >>August of last year. At >> >> >>>that time the parser had more of a flat structure unlike >>> >>> >>today's tree structure >> >> >>>with parents, children etc. >>> >>>At that time, I could find all the nodes irrespective of >>> >>> >>their depth in the >> >> >>>following manner: >>> >>>NodeIterator e = lHTMLParser.elements(); >>>while (e.hasMoreNodes()) >>>{ >>> Node lNode = (Node)e.nextNode(); >>> <do something> >>>} >>> >>> >>>With the advent of 1.3 the tree structure came in, in which >>> >>> >>some nodes were >> >> >>>inside other nodes. I have registered the scanners whose >>> >>> >>tags I want. However >> >> >>>if these tags are nested within other tags that I have >>> >>> >>registered, then the >> >> >>>above scenario does not work. I need to go deeper. That is >>> >>> >>not always feasible. >> >> >>>Is there any mechanism in 1.3 like the one above using which >>> >>> >>I can get all the >> >> >>>nodes irrespective of their nested level. >>> >>> >>>Regards, >>> >>>Dhaval Udani >>>Senior Analyst >>>M-Line, QPEG >>>OrbiTech Solutions Ltd. >>>+91-22-28290019 Extn. 1457 >>> >>> >>> >>>------------------------------------------------------- >>>Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara >>>The only event dedicated to issues related to Linux >>> >>> >>enterprise solutions >> >> >>>www.enterpriselinuxforum.com >>> >>>_______________________________________________ >>>Htmlparser-developer mailing list >>>Htm...@li... >>>https://lists.sourceforge.net/lists/listinfo/htmlparser-developer >>> >>> >>> >>> >>> >> >> >>------------------------------------------------------- >>Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara >>The only event dedicated to issues related to Linux >>enterprise solutions >>www.enterpriselinuxforum.com >> >>_______________________________________________ >>Htmlparser-developer mailing list >>Htm...@li... >>https://lists.sourceforge.net/lists/listinfo/htmlparser-developer >> >> >> > > > >------------------------------------------------------- >Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara >The only event dedicated to issues related to Linux enterprise solutions >www.enterpriselinuxforum.com > >_______________________________________________ >Htmlparser-developer mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > |
From: <dha...@or...> - 2003-05-14 10:46:58
|
No problemo!!! Derrick great job man fixing that problem with NodeReader. Things work so well I am amazed. This parser is super-stable compared to just 4 months back. I think this visitor pattern implementation and the Composite tag thing have done one helluva job!!! Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-28290019 Extn. 1457 > -----Original Message----- > From: DerrickOswald [mailto:Der...@ro...] > Sent: Sunday, May 11, 2003 10:41 AM > To: htmlparser-developer > Cc: DerrickOswald > Subject: [Htmlparser-developer] composite tag scanner > > > Dhaval, > > When I fixed bug #735183 Problem in Label Scanning, I had to > modify the > SelectTagTest and LabelScannerTest. I know, that's cheating. Can you > check that they still represent the spirit of the tests as > you coded them. > > Derrick > > > > ------------------------------------------------------- > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > The only event dedicated to issues related to Linux > enterprise solutions > www.enterpriselinuxforum.com > > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > |
From: <dha...@or...> - 2003-05-14 10:44:53
|
Considering that HTMLParser is a SAX-based parser, it should be possible to have all the nodes at the first level itself as a flat structure. Additionally the embedded nodes should also be referenced as children of other nodes. Am I correct in the understanding or is there something that I have missed out. Would definitely welcome some thoughts out here. Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-28290019 Extn. 1457 > -----Original Message----- > From: DerrickOswald [mailto:Der...@ro...] > Sent: Wednesday, May 14, 2003 3:09 AM > To: htmlparser-developer > Cc: DerrickOswald > Subject: Re: [Htmlparser-developer] HTMLParser 1.2 - 1.3 > > > > The visitor paradigm is one way. > See StringBean which implements NodeVisitor. > You could pass all nodes to the same method: > > public void visitTag (Tag tag) { do_something (tag); } > public void visitEndTag (Tag tag) { do_something (tag); } > etc. > void do_something (Node node) { <do something> } > > dha...@or... wrote: > > >Hi, > > > >I had started using HTMLParser version 1.2 sometime in > August of last year. At > >that time the parser had more of a flat structure unlike > today's tree structure > >with parents, children etc. > > > >At that time, I could find all the nodes irrespective of > their depth in the > >following manner: > > > >NodeIterator e = lHTMLParser.elements(); > >while (e.hasMoreNodes()) > >{ > > Node lNode = (Node)e.nextNode(); > > <do something> > >} > > > > > >With the advent of 1.3 the tree structure came in, in which > some nodes were > >inside other nodes. I have registered the scanners whose > tags I want. However > >if these tags are nested within other tags that I have > registered, then the > >above scenario does not work. I need to go deeper. That is > not always feasible. > >Is there any mechanism in 1.3 like the one above using which > I can get all the > >nodes irrespective of their nested level. > > > > > >Regards, > > > >Dhaval Udani > >Senior Analyst > >M-Line, QPEG > >OrbiTech Solutions Ltd. > >+91-22-28290019 Extn. 1457 > > > > > > > >------------------------------------------------------- > >Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > >The only event dedicated to issues related to Linux > enterprise solutions > >www.enterpriselinuxforum.com > > > >_______________________________________________ > >Htmlparser-developer mailing list > >Htm...@li... > >https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > > > > > > > > > ------------------------------------------------------- > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > The only event dedicated to issues related to Linux > enterprise solutions > www.enterpriselinuxforum.com > > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > |
From: <dha...@or...> - 2003-05-14 10:39:15
|
Yeah i understand that. The problem being that currently such a situation cannot be envisaged. However it may prove beneficial to other scanner writers if they ever come up with such scenarios. What say Derrick? Should we go ahead with something like this? Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-28290019 Extn. 1457 > -----Original Message----- > From: somik [mailto:so...@ya...] > Sent: Tuesday, May 13, 2003 6:11 PM > To: htmlparser-developer > Cc: somik > Subject: Re: [Htmlparser-developer] CompositeTagScanner - > Some comments > > > One word of caution, ensure that you are not > over-engineering. I didn't do > it, because it wasn't needed. That has been the key to our > approach- and > enabled us to keep the parser really small. > > Regards > Somik > ----- Original Message ----- > From: "Somik Raha" <so...@ya...> > To: <htm...@li...> > Sent: Tuesday, May 13, 2003 7:46 AM > Subject: Re: [Htmlparser-developer] CompositeTagScanner - > Some comments > > > > > > > Say something like this is there: > > > > > > <P> blah blah blah > > > <TABLE> > > > > > > Now what I am saying that in the P scanner, if TABLE is > provided as a > > member of > > > the STARTERS array then a </P> will be put up before the > beginning of > > <TABLE> > > > tag. In essence the way the ENDERS array looks for a tag > of type EndTag, > > > similarly STARTERS array would look for a start tag of > the type defined. > > > > > > I hope I've been clearer. Do let me know. > > > > I'm with you - initially I was checking for starters - > changed that to > > enders. But if we must have both, then we must have both. > Go for it. But > > also think deeply about the names- if it was confusing to > you and me, it > > would be for others too... > > > > Cheers, > > Somik > > > > > > > > ------------------------------------------------------- > > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, > Santa Clara > > The only event dedicated to issues related to Linux > enterprise solutions > > www.enterpriselinuxforum.com > > > > _______________________________________________ > > Htmlparser-developer mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > > ------------------------------------------------------- > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > The only event dedicated to issues related to Linux > enterprise solutions > www.enterpriselinuxforum.com > > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > |
From: <dha...@or...> - 2003-05-14 10:35:43
|
Why does the NodeVisitor not have a method called visitNode() analogous to visitTag()? Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-28290019 Extn. 1457 > -----Original Message----- > From: DerrickOswald [mailto:Der...@ro...] > Sent: Wednesday, May 14, 2003 3:09 AM > To: htmlparser-developer > Cc: DerrickOswald > Subject: Re: [Htmlparser-developer] HTMLParser 1.2 - 1.3 > > > > The visitor paradigm is one way. > See StringBean which implements NodeVisitor. > You could pass all nodes to the same method: > > public void visitTag (Tag tag) { do_something (tag); } > public void visitEndTag (Tag tag) { do_something (tag); } > etc. > void do_something (Node node) { <do something> } > > dha...@or... wrote: > > >Hi, > > > >I had started using HTMLParser version 1.2 sometime in > August of last year. At > >that time the parser had more of a flat structure unlike > today's tree structure > >with parents, children etc. > > > >At that time, I could find all the nodes irrespective of > their depth in the > >following manner: > > > >NodeIterator e = lHTMLParser.elements(); > >while (e.hasMoreNodes()) > >{ > > Node lNode = (Node)e.nextNode(); > > <do something> > >} > > > > > >With the advent of 1.3 the tree structure came in, in which > some nodes were > >inside other nodes. I have registered the scanners whose > tags I want. However > >if these tags are nested within other tags that I have > registered, then the > >above scenario does not work. I need to go deeper. That is > not always feasible. > >Is there any mechanism in 1.3 like the one above using which > I can get all the > >nodes irrespective of their nested level. > > > > > >Regards, > > > >Dhaval Udani > >Senior Analyst > >M-Line, QPEG > >OrbiTech Solutions Ltd. > >+91-22-28290019 Extn. 1457 > > > > > > > >------------------------------------------------------- > >Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > >The only event dedicated to issues related to Linux > enterprise solutions > >www.enterpriselinuxforum.com > > > >_______________________________________________ > >Htmlparser-developer mailing list > >Htm...@li... > >https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > > > > > > > > > ------------------------------------------------------- > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > The only event dedicated to issues related to Linux > enterprise solutions > www.enterpriselinuxforum.com > > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > |
From: Derrick O. <Der...@ro...> - 2003-05-13 21:50:16
|
The visitor paradigm is one way. See StringBean which implements NodeVisitor. You could pass all nodes to the same method: public void visitTag (Tag tag) { do_something (tag); } public void visitEndTag (Tag tag) { do_something (tag); } etc. void do_something (Node node) { <do something> } dha...@or... wrote: >Hi, > >I had started using HTMLParser version 1.2 sometime in August of last year. At >that time the parser had more of a flat structure unlike today's tree structure >with parents, children etc. > >At that time, I could find all the nodes irrespective of their depth in the >following manner: > >NodeIterator e = lHTMLParser.elements(); >while (e.hasMoreNodes()) >{ > Node lNode = (Node)e.nextNode(); > <do something> >} > > >With the advent of 1.3 the tree structure came in, in which some nodes were >inside other nodes. I have registered the scanners whose tags I want. However >if these tags are nested within other tags that I have registered, then the >above scenario does not work. I need to go deeper. That is not always feasible. >Is there any mechanism in 1.3 like the one above using which I can get all the >nodes irrespective of their nested level. > > >Regards, > >Dhaval Udani >Senior Analyst >M-Line, QPEG >OrbiTech Solutions Ltd. >+91-22-28290019 Extn. 1457 > > > >------------------------------------------------------- >Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara >The only event dedicated to issues related to Linux enterprise solutions >www.enterpriselinuxforum.com > >_______________________________________________ >Htmlparser-developer mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > |
From: Somik R. <so...@ya...> - 2003-05-13 12:42:09
|
One word of caution, ensure that you are not over-engineering. I didn't do it, because it wasn't needed. That has been the key to our approach- and enabled us to keep the parser really small. Regards Somik ----- Original Message ----- From: "Somik Raha" <so...@ya...> To: <htm...@li...> Sent: Tuesday, May 13, 2003 7:46 AM Subject: Re: [Htmlparser-developer] CompositeTagScanner - Some comments > > > Say something like this is there: > > > > <P> blah blah blah > > <TABLE> > > > > Now what I am saying that in the P scanner, if TABLE is provided as a > member of > > the STARTERS array then a </P> will be put up before the beginning of > <TABLE> > > tag. In essence the way the ENDERS array looks for a tag of type EndTag, > > similarly STARTERS array would look for a start tag of the type defined. > > > > I hope I've been clearer. Do let me know. > > I'm with you - initially I was checking for starters - changed that to > enders. But if we must have both, then we must have both. Go for it. But > also think deeply about the names- if it was confusing to you and me, it > would be for others too... > > Cheers, > Somik > > > > ------------------------------------------------------- > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > The only event dedicated to issues related to Linux enterprise solutions > www.enterpriselinuxforum.com > > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Somik R. <so...@ya...> - 2003-05-13 11:46:56
|
> Say something like this is there: > > <P> blah blah blah > <TABLE> > > Now what I am saying that in the P scanner, if TABLE is provided as a member of > the STARTERS array then a </P> will be put up before the beginning of <TABLE> > tag. In essence the way the ENDERS array looks for a tag of type EndTag, > similarly STARTERS array would look for a start tag of the type defined. > > I hope I've been clearer. Do let me know. I'm with you - initially I was checking for starters - changed that to enders. But if we must have both, then we must have both. Go for it. But also think deeply about the names- if it was confusing to you and me, it would be for others too... Cheers, Somik |