htmlparser-developer Mailing List for HTML Parser (Page 14)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(4) |
Nov
(1) |
Dec
(4) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(12) |
Feb
|
Mar
(7) |
Apr
(27) |
May
(14) |
Jun
(16) |
Jul
(27) |
Aug
(74) |
Sep
(1) |
Oct
(23) |
Nov
(12) |
Dec
(119) |
2003 |
Jan
(31) |
Feb
(23) |
Mar
(28) |
Apr
(59) |
May
(119) |
Jun
(10) |
Jul
(3) |
Aug
(17) |
Sep
(8) |
Oct
(38) |
Nov
(6) |
Dec
(1) |
2004 |
Jan
(4) |
Feb
(4) |
Mar
(1) |
Apr
(2) |
May
|
Jun
(7) |
Jul
(6) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2005 |
Jan
|
Feb
(1) |
Mar
|
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
(10) |
Oct
(4) |
Nov
(15) |
Dec
|
2006 |
Jan
|
Feb
(1) |
Mar
|
Apr
(4) |
May
(11) |
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2007 |
Jan
(3) |
Feb
(2) |
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2008 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(5) |
Oct
(1) |
Nov
|
Dec
|
2009 |
Jan
|
Feb
(1) |
Mar
|
Apr
(2) |
May
|
Jun
(4) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
(2) |
2010 |
Jan
(1) |
Feb
|
Mar
|
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(6) |
Oct
|
Nov
(1) |
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(1) |
2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
From: Derrick O. <Der...@ro...> - 2003-05-02 10:59:57
|
I looked at this earlier too, regarding the org.apache.commons.logging package: http://jakarta.apache.org/commons/logging.html which provides a thin logging-system agnostic interface. There didn't appear to be anything in the licence (http://jakarta.apache.org/commons/license.html) that precludes repackaging the code into the htmlparser tree. It isn't very large. We would have to say "This product includes software developed by the Apache Software Foundation (http://www.apache.org/)." somewhere in the documentation. There is however, the task of re-working every file in the source tree to use the logging wrapper mechanism, which is non-trivial (190 files, 83 of which are tests). I would suggest this be undertaken when version 1.3 is finished. I think we can arbitrarily set a cut-off point for 1.3 next week, unless a major show-stopper is discovered. Of course for backwards compatibility, we should just deprecate the ParserFeedback (et al) and provide an implementation for it in terms of the new logging code. Derrick dha...@or... wrote: >Hi guys, > >I remember we had a discussion about the feedback mechanism earlier. I >just wanted to restart it by suggesting use of the Logging Wrapper from >Jakarta. > >I have noticed that if anyone wants to use the ParserFeedback to log >then they will need to mostly extend the DefaultParserFeedback class and >override the methods appropriately. If we can map the ParserFeedback >class to the Logging Wrapper applications can easily use the Feedback >mechanism to log to Log4j and JDK 1.4 without having to do a thing. most >users according tome woudl be using one of these systems. I believe the >argument then was coupling with a third-party library. But I believe the >flexibility it offers outstrips the coupling drawback. > >Furthermore imagine an application which is using some other logging >tool. They have coded their entire logging framework using the Logging >Wrapper and have used an adapter to log to their logging tool. If they >use the parser and want to log its output as well, they will have to >write one more adapter. Instead if the parser provides a mechanism for >using the Logging Wrapper, they would not need to do anything. > >We ahve actually had requests wherein different clients have asked for >different logging tools to be used!!! Hence the request. > >We could simply extend from DefaultParserFeedback for LogWrapperFeedback >and make it implement the commons logging interface. > >Do let me know your thoughts/opinions/suggestions on the same. > >Regards, >Dhaval > > > > |
From: <dha...@or...> - 2003-05-02 08:18:45
|
Hi guys, I remember we had a discussion about the feedback mechanism earlier. I just wanted to restart it by suggesting use of the Logging Wrapper from Jakarta. I have noticed that if anyone wants to use the ParserFeedback to log then they will need to mostly extend the DefaultParserFeedback class and override the methods appropriately. If we can map the ParserFeedback class to the Logging Wrapper applications can easily use the Feedback mechanism to log to Log4j and JDK 1.4 without having to do a thing. most users according tome woudl be using one of these systems. I believe the argument then was coupling with a third-party library. But I believe the flexibility it offers outstrips the coupling drawback. Furthermore imagine an application which is using some other logging tool. They have coded their entire logging framework using the Logging Wrapper and have used an adapter to log to their logging tool. If they use the parser and want to log its output as well, they will have to write one more adapter. Instead if the parser provides a mechanism for using the Logging Wrapper, they would not need to do anything. We ahve actually had requests wherein different clients have asked for different logging tools to be used!!! Hence the request. We could simply extend from DefaultParserFeedback for LogWrapperFeedback and make it implement the commons logging interface. Do let me know your thoughts/opinions/suggestions on the same. Regards, Dhaval |
From: <dha...@or...> - 2003-05-02 08:09:59
|
> Doesent sound good - maybe you can debug and tell me why.. (and hopefully > provide a fix to Derrick :). Well the Node [] returned by getChildrenAsNodeArray is a copy of the original children nodelist. I used the NodeList obtained from getChildren() and changed contents in that to get my work done. It worked!!! > You can remove the nodes you wish - NodeList has remove() in it. How about a removeAll(). Felt the need for that since I was replacing the entire child list with a single node. Will be useful for others also who want to change number of child nodes. At present I had to remove each one individually. Only advantage ofcourse is cleaner (not to mention easier) developer code. > Feel free to add methods that you think make it more intuitive for your > use - when we write a library, we really are doing a lot of guesswork about > what will be useful. Its only when a real user like you comes along that we > know for sure whats good and whats not. Well I am totally ga...ga over the CompositeTag class. Its introduction in 1.3 has made the parser so much more resilient and complete with parent...children....grandchildren...et al. Also ability to do things like getChildrenHTML() and perfect toHtml() methods are absolutely amazing. Now for the setLabel method of LabelTag. Considering all things(i.e. as much as my mind can think of), how about using a NodeList as a parameter? Developers can either use the orginal NodeList with some modifications or create an entirely new one and pass it to the method which in turn will effetively replace the childTags variable in CompositeTag. An overloaded String parameter based one can also be given with information related to its possible slow performance due to internal parsing. All this is only to shield users from inner-level code. Otherwise everything that is required to be done can get done but only after getting some knowledge over these mailing lists......thanx to somik and Derrick. Also can we have a no-args constructor for LabelScanner. I think I had sent these files to Somik for updation into CVS (alongwith SelectTag to use NodeList instead of List) Regards, Dhaval ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Derrick O. <Der...@ro...> - 2003-05-02 01:24:48
|
If you don't specify a starting buffer size you get space for 16 characters. Then each time it exceeds the allocated amount it allocates twice the space and copies the characters appended so far, which is time consuming and memory intensive, perhaps many times (the spent buffers are reclaimed by garbage collection eventually, but till then they're gobbling memory). Since the average html page exceeds 16 characters it's speedier to specify a larger buffer to start. Maybe 4096 is a bit agressive, but I bet it covers a majority of the pages out there and a study would need to be done to find out a more optimal size. This is only allocated once per StringBean so it's not a big deal either way. Mr LING MA wrote: >Hi: >will the StringBean program use less memory if >initialize StringBuffer as 0 bytes ? why 4096? > >Ling Ma > >__________________________________ >Do you Yahoo!? >The New Yahoo! Search - Faster. Easier. Bingo. >http://search.yahoo.com > > >------------------------------------------------------- >This sf.net email is sponsored by:ThinkGeek >Welcome to geek heaven. >http://thinkgeek.com/sf >_______________________________________________ >Htmlparser-developer mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > |
From: Derrick O. <Der...@ro...> - 2003-05-02 01:12:29
|
I got rid of one of these April 23, but not all of them obviously. It's removed now. It will be available in the next build, or now from CVS. Marc Novakowski wrote: >I just updated to the latest CVS and when I try to compile/run the tests, I get the following build error. The problem is that I'm using JDK1.3, and the replaceAll() method is JDK1.4 only. Are we going to maintain JDK1.3 compatibility in the tests, or should I start using JDK1.4 for compiling/running? > >Marc > > > >test: > [echo] ********************************** > [echo] * Running unit tests.... * > [echo] ********************************** > [javac] Compiling 66 source files > [javac] C:\htmlparser\src\org\htmlparser\tests\ParserTestCase.java:232: cannot resolve symbol > [javac] symbol : method replaceAll (java.lang.String,java.lang.String) > [javac] location: class java.lang.String > [javac] expected = expected.replaceAll("\n"," "); > [javac] ^ > [javac] C:\htmlparser\src\org\htmlparser\tests\ParserTestCase.java:233: cannot resolve symbol > [javac] symbol : method replaceAll (java.lang.String,java.lang.String) > [javac] location: class java.lang.String > [javac] actual = actual.replaceAll("\n"," "); > [javac] ^ > [javac] 2 errors > >BUILD FAILED >file:c:/htmlparser/build.xml:123: Compile failed; see the compiler error output for details. > > >------------------------------------------------------- >This sf.net email is sponsored by:ThinkGeek >Welcome to geek heaven. >http://thinkgeek.com/sf >_______________________________________________ >Htmlparser-developer mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > |
From: Mr L. MA <law...@ya...> - 2003-05-01 19:46:36
|
Hi: will the StringBean program use less memory if initialize StringBuffer as 0 bytes ? why 4096? Ling Ma __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com |
From: Marc N. <ma...@ke...> - 2003-05-01 17:28:37
|
I just updated to the latest CVS and when I try to compile/run the = tests, I get the following build error. The problem is that I'm using = JDK1.3, and the replaceAll() method is JDK1.4 only. Are we going to = maintain JDK1.3 compatibility in the tests, or should I start using = JDK1.4 for compiling/running? Marc test: [echo] ********************************** [echo] * Running unit tests.... * [echo] ********************************** [javac] Compiling 66 source files [javac] = C:\htmlparser\src\org\htmlparser\tests\ParserTestCase.java:232: cannot = resolve symbol [javac] symbol : method replaceAll = (java.lang.String,java.lang.String) [javac] location: class java.lang.String [javac] expected =3D expected.replaceAll("\n"," "); [javac] ^ [javac] = C:\htmlparser\src\org\htmlparser\tests\ParserTestCase.java:233: cannot = resolve symbol [javac] symbol : method replaceAll = (java.lang.String,java.lang.String) [javac] location: class java.lang.String [javac] actual =3D actual.replaceAll("\n"," "); [javac] ^ [javac] 2 errors BUILD FAILED file:c:/htmlparser/build.xml:123: Compile failed; see the compiler error = output for details. |
From: Somik R. <so...@ya...> - 2003-05-01 14:08:09
|
> The mechanism suggested by you to replace the node is not working. I am > attaching a piece of code I wrote in LabelScannerTest to test the same. Doesent sound good - maybe you can debug and tell me why.. (and hopefully provide a fix to Derrick :). You can remove the nodes you wish - NodeList has remove() in it. Feel free to add methods that you think make it more intuitive for your use - when we write a library, we really are doing a lot of guesswork about what will be useful. Its only when a real user like you comes along that we know for sure whats good and whats not. Regards, Somik |
From: <dha...@or...> - 2003-05-01 11:18:34
|
Hi Somik, The mechanism suggested by you to replace the node is not working. I am attaching a piece of code I wrote in LabelScannerTest to test the same. public void testSettingLabels() throws ParserException { createParser("<label><span>Jane <b> Doe </b> Smith</span></label>"); parser.registerScanners(); LabelScanner labelScanner = new LabelScanner("-l"); parser.addScanner(labelScanner); parseAndAssertNodeCount(1); assertTrue(node[0] instanceof LabelTag); LabelTag labelTag = (LabelTag) node[0]; assertStringEquals("Label","<LABEL><SPAN>Jane <B> Doe </B> Smith</SPAN></LABEL>",labelTag.toHtml()); Node [] nodeArray = labelTag.getChildrenAsNodeArray(); StringNode node = new StringNode(new StringBuffer("New Label"), 0, 0); nodeArray[0] = node; for(int i=1;i<nodeArray.length;i++) { nodeArray[i] = null; } assertEquals("Label value","New Label",labelTag.getChildrenHTML()); assertEquals("Label value","New Label",labelTag.getLabel()); assertStringEquals("Label","<LABEL>New Label</LABEL>",labelTag.toHtml()); } Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-28290019 Extn. 1457 -----Original Message----- From: somik [mailto:so...@ya...] Sent: Thursday, May 01, 2003 4:27 AM To: htmlparser-developer Cc: somik Subject: RE: [Htmlparser-developer] Label Tag --- dha...@or... wrote: > Thanx a lot. Exactly what I needed. Pretty stupid of > me to ask ont eh > forum actually. Should have searched the javadocs > first. Sorry about it. > Will try to make sure that it does not happen again. No, sorry wont do - you have to copy 10 essays (or programs) in detention :). Chill out! > I was thinking of a setLabel wherein the user would > give a single string > (it might have many tags). Internally, setLabel() > would parse it into > its corresponding tags and assign it as the children > of the Label tag. > What do you guys think? Lemme know. That would work - but would be costly (in performance). You really dont want to rig up an internal parser just to change the label tag data. A simpler way is to find out which string node you wish to change (digupStringNode(), searchFor(), or quite simply, the exact index which you might happen to know). Then, StringNode newNode = new StringNode( newDataBuffer, 0, 0 ); To replace the child, you can either do something as simple as, getChildrenAsNodeArray()[posOfOldNode] = newNode; or NodeList nodeList = getChildren(); and make a similar modification in the nodelist. Regards, Somik __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: <dha...@or...> - 2003-05-01 10:27:44
|
Hi, >> I was thinking of a setLabel wherein the user would >> give a single string >> (it might have many tags). Internally, setLabel() >> would parse it into >> its corresponding tags and assign it as the children >> of the Label tag. >> What do you guys think? Lemme know. > That would work - but would be costly (in > performance). You really dont want to rig up an > internal parser just to change the label tag data. > A simpler way is to find out which string node you > wish to change (digupStringNode(), searchFor(), or > quite simply, the exact index which you might happen > to know). > Then, > StringNode newNode = new StringNode( > newDataBuffer, 0, 0 > ); > To replace the child, you can either do something as > simple as, > getChildrenAsNodeArray()[posOfOldNode] = newNode; > > or > NodeList nodeList = getChildren(); > and make a similar modification in the nodelist. Understood. However in this mechanism the number of new nodes that I might want to replace will be limited by the number of nodes already present. Furthermore, I can't even reduce the number of nodes. For example there may have been 5 child nodes earlier but now there are only 2. I believe by explicitly setting each array element to null may be the only option. Another aspect is that the API user needs to be aware of all these other classes(StringNode, NodeList, Node etc..). can we have some mechanism by which an API user is isolated from all this and gets functionality as I xplained above. Dhaval |
From: Somik R. <so...@ya...> - 2003-04-30 22:56:35
|
--- dha...@or... wrote: > Thanx a lot. Exactly what I needed. Pretty stupid of > me to ask ont eh > forum actually. Should have searched the javadocs > first. Sorry about it. > Will try to make sure that it does not happen again. No, sorry wont do - you have to copy 10 essays (or programs) in detention :). Chill out! > I was thinking of a setLabel wherein the user would > give a single string > (it might have many tags). Internally, setLabel() > would parse it into > its corresponding tags and assign it as the children > of the Label tag. > What do you guys think? Lemme know. That would work - but would be costly (in performance). You really dont want to rig up an internal parser just to change the label tag data. A simpler way is to find out which string node you wish to change (digupStringNode(), searchFor(), or quite simply, the exact index which you might happen to know). Then, StringNode newNode = new StringNode( newDataBuffer, 0, 0 ); To replace the child, you can either do something as simple as, getChildrenAsNodeArray()[posOfOldNode] = newNode; or NodeList nodeList = getChildren(); and make a similar modification in the nodelist. Regards, Somik __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com |
From: <dha...@or...> - 2003-04-30 14:43:58
|
Thanx a lot. Exactly what I needed. Pretty stupid of me to ask ont eh forum actually. Should have searched the javadocs first. Sorry about it. Will try to make sure that it does not happen again. I was thinking of a setLabel wherein the user would give a single string (it might have many tags). Internally, setLabel() would parse it into its corresponding tags and assign it as the children of the Label tag. What do you guys think? Lemme know. -----Original Message----- From: somik [mailto:so...@ya...] Sent: Tuesday, April 29, 2003 9:17 PM To: htmlparser-developer Cc: somik Subject: RE: [Htmlparser-developer] Label Tag > I am parsing a tag as below: > > <label><span>Jane <b> Doe </b> Smith</span></label> > > How can I obtain the text : "<span>Jane <b> Doe </b> > Smith</span>" Seems like CompositeTag.getChildHtml() is what you need. Regards, Somik __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Somik R. <so...@ya...> - 2003-04-29 15:47:09
|
> I am parsing a tag as below: > > <label><span>Jane <b> Doe </b> Smith</span></label> > > How can I obtain the text : "<span>Jane <b> Doe </b> > Smith</span>" Seems like CompositeTag.getChildHtml() is what you need. Regards, Somik __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com |
From: <dha...@or...> - 2003-04-29 08:44:28
|
Hi, I am parsing a tag as below: <label><span>Jane <b> Doe </b> Smith</span></label> How can I obtain the text : "<span>Jane <b> Doe </b> Smith</span>" Thanx in advance. Regards, Dhaval -----Original Message----- From: somik [mailto:so...@ya...] Sent: Monday, April 28, 2003 9:38 PM To: htmlparser-developer Cc: somik Subject: RE: [Htmlparser-developer] Label Tag > However since LabelTag is a composite tag and can > have children, does it > not mean that it will have to override the toHtml() > method. I believe it > can use the super() method and will need to add to > it. CompositeTag implements toHtml() uniformly to iterate over children. If you are inheriting from CompositeTag, there is no reason you need to change toHtml(). That is why you need to modify the contents of the children to affect the output of toHtml(). Regards, Somik __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Mr L. MA <law...@ya...> - 2003-04-28 17:02:37
|
Thanks guys, Brilliant work! I checked the new release and it fixed outofmemory and form tag bug I came into 2 days ago. Ling Ma __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com |
From: Somik R. <so...@ya...> - 2003-04-28 16:08:09
|
> However since LabelTag is a composite tag and can > have children, does it > not mean that it will have to override the toHtml() > method. I believe it > can use the super() method and will need to add to > it. CompositeTag implements toHtml() uniformly to iterate over children. If you are inheriting from CompositeTag, there is no reason you need to change toHtml(). That is why you need to modify the contents of the children to affect the output of toHtml(). Regards, Somik __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com |
From: <dha...@or...> - 2003-04-28 15:55:46
|
> Can you tell me what you mean by "internal tag > representation". What you are > saying is that basically I need to write toHtml() in > the LabelTag class the way > it should be. Oh no - the idea of an internal representation is that you should be insulated from it. So, you should never have to write toHtml(). We ought to make that method final sometime. By conforming to the representation, you will continue to get the benefits of toHtml(). [Dhaval] However since LabelTag is a composite tag and can have children, does it not mean that it will have to override the toHtml() method. I believe it can use the super() method and will need to add to it. |
From: Somik R. <so...@ya...> - 2003-04-28 15:47:13
|
> Can you tell me what you mean by "internal tag > representation". What you are > saying is that basically I need to write toHtml() in > the LabelTag class the way > it should be. Oh no - the idea of an internal representation is that you should be insulated from it. So, you should never have to write toHtml(). We ought to make that method final sometime. By conforming to the representation, you will continue to get the benefits of toHtml(). > Well so basically I will have to create StringNodes > and try to set that as the > label. Sounds interesting. Will definitely try it > out. Yup - you got it! Cheers, Somik __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com |
From: Derrick O. <Der...@ro...> - 2003-04-28 11:39:03
|
This release begins the process of wrapping up version 1.3 and proceeding to version 1.4. Several enhancements have been added based on feedback, and of course many bug fixes. Integration Build 1.3 - 20030427 -------------------------------- [1] Fixed bug #722941 encoding not supported [2] Update to build.xml to new naming scheme, clean up several javaDoc warnings [3] Fixed bug #725420 NPE in StringBean.visitTag [4] Fixed bug #717573 NullPointerException when unclosed HTML tag inside JSP tag [5] Fixed bug #722870 OutOfMemory error [6] Fixed bug #725374 NPE while parsing text This will lead to a change in behavior, empty angle brackets are no longer returned as nodes, i.e. "this text<> has angles" used to return three nodes now returns one [7] Fixed bug #725376 StringIndexOutOfBounds Exception [8] Added setText() to StringNode to ameliorate bug #726913 toHtml() method incomplete [9] Removed deprecated method nextHTMLNode() from NodeIterator [10] Removed unused vector in FormScanner [11] Added getChildren() to CompositeTag [12] Added BulletListScanner [13] Modified CompositeTagScanner scanning mechanism, to allow for end tag lists The success of htmlparser, with over 15,000 downloads and 17 developers has caused the splitting of the Admin role. Derrick Oswald has been added as a build-meister, meaning Somik Raha can concentrate on less mundane things. |
From: <dha...@or...> - 2003-04-28 06:38:04
|
Hi Somik, > You need to evolve the LabelTag interface, to have setLabel - which maps on > to the children of the label tag. Setting the text involves creation of > string nodes and adding them as children to the label (don't forget to > delete the previous children). I've added getChildren() to CompositeTag to > help you out. Well, I think I am understanding the Parser better. >> Also the following is required: >> 1. Obviously toHtml() must consider the changed text. > Nope. toHtml() considers the internal tag representation alone. You should > map to that in the specific tag code. Can you tell me what you mean by "internal tag representation". What you are saying is that basically I need to write toHtml() in the LabelTag class the way it should be. >> 2. LabelTag should have setLabel() method synoymous to getLabel() which >> internally calls setText() of Tag class. > Yes- it will be great if you can write setLabel() - and no, it should not > call setText(). Well so basically I will have to create StringNodes and try to set that as the label. Sounds interesting. Will definitely try it out. Regards, Dhaval |
From: <dha...@or...> - 2003-04-28 06:33:27
|
Dhaval Udani wrote: > I was checking out the code of form scanner and I saw that it contained > a list of all the INPUT tags and all the TEXTAREA tags. In addition we > need to add the list of SELECT tags also out here.. > > Thanks for catching this. Done. > [Udani, Dhaval H.] > I yet don't see this in the latest code. I meant this is now transferred to the FormTag instead of the FormScanner. Could you add the select tags to the FormTag, and make all of them lazy ? [Dhaval] I'll do this. |
From: Somik R. <so...@ya...> - 2003-04-28 03:39:28
|
Hi Team, First, Derrick - thanks for taking over. I cannot tell you how = relieved I am. Second, I've fixed a Stack Overflow bug - but not in time for this = release. I thought I should share the solution as it could be important = for future work in this area. It took me quite some time to fix this = actually. The problem: ************** When faced with tags like : <ul> <li> <ul> <li> <li> <li> ... 200 more <li> tags </ul> </li> </ul> we'd end up with a stack overflow. There were multiple problems. The first big problem was - ability to = close tags on encountering "endtags". Prior to this, CompositeTagScanner = was only tackling begin tags. But the second and more dangerous problem was the correction algo = itself. The decision to put in an end tag would happen after the next = tag was parsed. This would cause recursion till the stack limit was = reached - we got to see it bcos of a good bug report about a page with = tons of li tags. After trying all sorts of ideas, I was about to settle = on the necessity of a tree holding the stack trace, when I figured that = the relationship can be simply represented within the Bullet, = BulletListScanners with a stack.=20 =20 This was a special case as there were rules like : [1] <ul> can have <li> children [2] <li> can have <ul> children [3] <li> cannot have <li> children You can look at the code in BulletScanner, BulletListScanner. Regards, Somik |
From: Somik R. <so...@ya...> - 2003-04-27 22:54:13
|
Hi Dhaval, > I want to parse a LABEL tag and replace the data between the start and end > tags. I am able to obtain the data using the getLabel method. However to > replace, firstly there is no synonymous setLabel() method. Hence I used the > setText() method of Tag class. After that I printed my tag using the toHtml() > method. However I received the previous text itself, not the one I replaced. You cannot and should not rely on setText() to change tag contents. setText() is used by certain automata during the process of parsing itself, for special reasons ( link tag modifying the url, image tag doing the same..) In order to change contents within the tag, use setAttribute(). But.. that doesent help u, does it ? :) You need to evolve the LabelTag interface, to have setLabel - which maps on to the children of the label tag. Setting the text involves creation of string nodes and adding them as children to the label (don't forget to delete the previous children). I've added getChildren() to CompositeTag to help you out. > Also the following is required: > 1. Obviously toHtml() must consider the changed text. Nope. toHtml() considers the internal tag representation alone. You should map to that in the specific tag code. > 2. LabelTag should have setLabel() method synoymous to getLabel() which > internally calls setText() of Tag class. Yes- it will be great if you can write setLabel() - and no, it should not call setText(). Regards, Somik |
From: Somik R. <so...@ya...> - 2003-04-27 22:36:57
|
Dhaval Udani wrote: > I was checking out the code of form scanner and I saw that it contained > a list of all the INPUT tags and all the TEXTAREA tags. In addition we > need to add the list of SELECT tags also out here.. > > Thanks for catching this. Done. > [Udani, Dhaval H.] > I yet don't see this in the latest code. I meant this is now transferred to the FormTag instead of the FormScanner. Could you add the select tags to the FormTag, and make all of them lazy ? Thanks and Regards, Somik |
From: Somik R. <so...@ya...> - 2003-04-27 22:32:21
|
Hi Dhaval, > 1. FormScanner contains an unused variable textAreaVector. I've taken this out. > 2. FormTag contains a NodeList instance variable for INPUT tags and TEXTAREA > tags. However none for SELECT tags. It needs to be added. Could you make this addition test-first and give it to Derrick ? Thanks and Regards, Somik |