Thread: RE: [Htmlparser-developer] Label Tag
Brought to you by:
derrickoswald
From: <dha...@or...> - 2003-04-24 14:43:50
|
Hi, I want to parse a LABEL tag and replace the data between the start and end tags. I am able to obtain the data using the getLabel method. However to replace, firstly there is no synonymous setLabel() method. Hence I used the setText() method of Tag class. After that I printed my tag using the toHtml() method. However I received the previous text itself, not the one I replaced. On closer inspection of Tag class I realize that this newly set value is not being considered during toHtml() itself and there are no test cases for setText() in TagTest. I have logged the same as a bug (#726913) and attached a test condition as well. Also the following is required: 1. Obviously toHtml() must consider the changed text. 2. LabelTag should have setLabel() method synoymous to getLabel() which internally calls setText() of Tag class. Dhaval -----Original Message----- From: Udani, Dhaval H. Sent: Thursday, April 17, 2003 4:15 PM To: htmlparser-developer Cc: Udani, Dhaval H. Subject: [Htmlparser-developer] Label Tag Hi, Is there any synoymous for the setLabel method in the LabelTag to go with the getLabel method. I need functionality tochange the Label of a Label tag. How would I achieve this? Dhaval ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: <dha...@or...> - 2003-04-28 06:38:04
|
Hi Somik, > You need to evolve the LabelTag interface, to have setLabel - which maps on > to the children of the label tag. Setting the text involves creation of > string nodes and adding them as children to the label (don't forget to > delete the previous children). I've added getChildren() to CompositeTag to > help you out. Well, I think I am understanding the Parser better. >> Also the following is required: >> 1. Obviously toHtml() must consider the changed text. > Nope. toHtml() considers the internal tag representation alone. You should > map to that in the specific tag code. Can you tell me what you mean by "internal tag representation". What you are saying is that basically I need to write toHtml() in the LabelTag class the way it should be. >> 2. LabelTag should have setLabel() method synoymous to getLabel() which >> internally calls setText() of Tag class. > Yes- it will be great if you can write setLabel() - and no, it should not > call setText(). Well so basically I will have to create StringNodes and try to set that as the label. Sounds interesting. Will definitely try it out. Regards, Dhaval |
From: Somik R. <so...@ya...> - 2003-04-28 15:47:13
|
> Can you tell me what you mean by "internal tag > representation". What you are > saying is that basically I need to write toHtml() in > the LabelTag class the way > it should be. Oh no - the idea of an internal representation is that you should be insulated from it. So, you should never have to write toHtml(). We ought to make that method final sometime. By conforming to the representation, you will continue to get the benefits of toHtml(). > Well so basically I will have to create StringNodes > and try to set that as the > label. Sounds interesting. Will definitely try it > out. Yup - you got it! Cheers, Somik __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com |
From: <dha...@or...> - 2003-04-28 15:55:46
Attachments:
BDY.RTF
|
> Can you tell me what you mean by "internal tag > representation". What you are > saying is that basically I need to write toHtml() in > the LabelTag class the way > it should be. Oh no - the idea of an internal representation is that you should be insulated from it. So, you should never have to write toHtml(). We ought to make that method final sometime. By conforming to the representation, you will continue to get the benefits of toHtml(). [Dhaval] However since LabelTag is a composite tag and can have children, does it not mean that it will have to override the toHtml() method. I believe it can use the super() method and will need to add to it. |
From: Somik R. <so...@ya...> - 2003-04-28 16:08:09
|
> However since LabelTag is a composite tag and can > have children, does it > not mean that it will have to override the toHtml() > method. I believe it > can use the super() method and will need to add to > it. CompositeTag implements toHtml() uniformly to iterate over children. If you are inheriting from CompositeTag, there is no reason you need to change toHtml(). That is why you need to modify the contents of the children to affect the output of toHtml(). Regards, Somik __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com |
From: <dha...@or...> - 2003-04-29 08:44:28
Attachments:
BDY.RTF
|
Hi, I am parsing a tag as below: <label><span>Jane <b> Doe </b> Smith</span></label> How can I obtain the text : "<span>Jane <b> Doe </b> Smith</span>" Thanx in advance. Regards, Dhaval -----Original Message----- From: somik [mailto:so...@ya...] Sent: Monday, April 28, 2003 9:38 PM To: htmlparser-developer Cc: somik Subject: RE: [Htmlparser-developer] Label Tag > However since LabelTag is a composite tag and can > have children, does it > not mean that it will have to override the toHtml() > method. I believe it > can use the super() method and will need to add to > it. CompositeTag implements toHtml() uniformly to iterate over children. If you are inheriting from CompositeTag, there is no reason you need to change toHtml(). That is why you need to modify the contents of the children to affect the output of toHtml(). Regards, Somik __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Somik R. <so...@ya...> - 2003-04-29 15:47:09
|
> I am parsing a tag as below: > > <label><span>Jane <b> Doe </b> Smith</span></label> > > How can I obtain the text : "<span>Jane <b> Doe </b> > Smith</span>" Seems like CompositeTag.getChildHtml() is what you need. Regards, Somik __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com |
From: <dha...@or...> - 2003-04-30 14:43:58
Attachments:
BDY.RTF
|
Thanx a lot. Exactly what I needed. Pretty stupid of me to ask ont eh forum actually. Should have searched the javadocs first. Sorry about it. Will try to make sure that it does not happen again. I was thinking of a setLabel wherein the user would give a single string (it might have many tags). Internally, setLabel() would parse it into its corresponding tags and assign it as the children of the Label tag. What do you guys think? Lemme know. -----Original Message----- From: somik [mailto:so...@ya...] Sent: Tuesday, April 29, 2003 9:17 PM To: htmlparser-developer Cc: somik Subject: RE: [Htmlparser-developer] Label Tag > I am parsing a tag as below: > > <label><span>Jane <b> Doe </b> Smith</span></label> > > How can I obtain the text : "<span>Jane <b> Doe </b> > Smith</span>" Seems like CompositeTag.getChildHtml() is what you need. Regards, Somik __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Somik R. <so...@ya...> - 2003-04-30 22:56:35
|
--- dha...@or... wrote: > Thanx a lot. Exactly what I needed. Pretty stupid of > me to ask ont eh > forum actually. Should have searched the javadocs > first. Sorry about it. > Will try to make sure that it does not happen again. No, sorry wont do - you have to copy 10 essays (or programs) in detention :). Chill out! > I was thinking of a setLabel wherein the user would > give a single string > (it might have many tags). Internally, setLabel() > would parse it into > its corresponding tags and assign it as the children > of the Label tag. > What do you guys think? Lemme know. That would work - but would be costly (in performance). You really dont want to rig up an internal parser just to change the label tag data. A simpler way is to find out which string node you wish to change (digupStringNode(), searchFor(), or quite simply, the exact index which you might happen to know). Then, StringNode newNode = new StringNode( newDataBuffer, 0, 0 ); To replace the child, you can either do something as simple as, getChildrenAsNodeArray()[posOfOldNode] = newNode; or NodeList nodeList = getChildren(); and make a similar modification in the nodelist. Regards, Somik __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com |
From: <dha...@or...> - 2003-05-01 10:27:44
Attachments:
BDY.RTF
|
Hi, >> I was thinking of a setLabel wherein the user would >> give a single string >> (it might have many tags). Internally, setLabel() >> would parse it into >> its corresponding tags and assign it as the children >> of the Label tag. >> What do you guys think? Lemme know. > That would work - but would be costly (in > performance). You really dont want to rig up an > internal parser just to change the label tag data. > A simpler way is to find out which string node you > wish to change (digupStringNode(), searchFor(), or > quite simply, the exact index which you might happen > to know). > Then, > StringNode newNode = new StringNode( > newDataBuffer, 0, 0 > ); > To replace the child, you can either do something as > simple as, > getChildrenAsNodeArray()[posOfOldNode] = newNode; > > or > NodeList nodeList = getChildren(); > and make a similar modification in the nodelist. Understood. However in this mechanism the number of new nodes that I might want to replace will be limited by the number of nodes already present. Furthermore, I can't even reduce the number of nodes. For example there may have been 5 child nodes earlier but now there are only 2. I believe by explicitly setting each array element to null may be the only option. Another aspect is that the API user needs to be aware of all these other classes(StringNode, NodeList, Node etc..). can we have some mechanism by which an API user is isolated from all this and gets functionality as I xplained above. Dhaval |
From: <dha...@or...> - 2003-05-01 11:18:34
Attachments:
BDY.RTF
|
Hi Somik, The mechanism suggested by you to replace the node is not working. I am attaching a piece of code I wrote in LabelScannerTest to test the same. public void testSettingLabels() throws ParserException { createParser("<label><span>Jane <b> Doe </b> Smith</span></label>"); parser.registerScanners(); LabelScanner labelScanner = new LabelScanner("-l"); parser.addScanner(labelScanner); parseAndAssertNodeCount(1); assertTrue(node[0] instanceof LabelTag); LabelTag labelTag = (LabelTag) node[0]; assertStringEquals("Label","<LABEL><SPAN>Jane <B> Doe </B> Smith</SPAN></LABEL>",labelTag.toHtml()); Node [] nodeArray = labelTag.getChildrenAsNodeArray(); StringNode node = new StringNode(new StringBuffer("New Label"), 0, 0); nodeArray[0] = node; for(int i=1;i<nodeArray.length;i++) { nodeArray[i] = null; } assertEquals("Label value","New Label",labelTag.getChildrenHTML()); assertEquals("Label value","New Label",labelTag.getLabel()); assertStringEquals("Label","<LABEL>New Label</LABEL>",labelTag.toHtml()); } Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-28290019 Extn. 1457 -----Original Message----- From: somik [mailto:so...@ya...] Sent: Thursday, May 01, 2003 4:27 AM To: htmlparser-developer Cc: somik Subject: RE: [Htmlparser-developer] Label Tag --- dha...@or... wrote: > Thanx a lot. Exactly what I needed. Pretty stupid of > me to ask ont eh > forum actually. Should have searched the javadocs > first. Sorry about it. > Will try to make sure that it does not happen again. No, sorry wont do - you have to copy 10 essays (or programs) in detention :). Chill out! > I was thinking of a setLabel wherein the user would > give a single string > (it might have many tags). Internally, setLabel() > would parse it into > its corresponding tags and assign it as the children > of the Label tag. > What do you guys think? Lemme know. That would work - but would be costly (in performance). You really dont want to rig up an internal parser just to change the label tag data. A simpler way is to find out which string node you wish to change (digupStringNode(), searchFor(), or quite simply, the exact index which you might happen to know). Then, StringNode newNode = new StringNode( newDataBuffer, 0, 0 ); To replace the child, you can either do something as simple as, getChildrenAsNodeArray()[posOfOldNode] = newNode; or NodeList nodeList = getChildren(); and make a similar modification in the nodelist. Regards, Somik __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Somik R. <so...@ya...> - 2003-05-01 14:08:09
|
> The mechanism suggested by you to replace the node is not working. I am > attaching a piece of code I wrote in LabelScannerTest to test the same. Doesent sound good - maybe you can debug and tell me why.. (and hopefully provide a fix to Derrick :). You can remove the nodes you wish - NodeList has remove() in it. Feel free to add methods that you think make it more intuitive for your use - when we write a library, we really are doing a lot of guesswork about what will be useful. Its only when a real user like you comes along that we know for sure whats good and whats not. Regards, Somik |
From: <dha...@or...> - 2003-05-02 08:09:59
Attachments:
BDY.RTF
|
> Doesent sound good - maybe you can debug and tell me why.. (and hopefully > provide a fix to Derrick :). Well the Node [] returned by getChildrenAsNodeArray is a copy of the original children nodelist. I used the NodeList obtained from getChildren() and changed contents in that to get my work done. It worked!!! > You can remove the nodes you wish - NodeList has remove() in it. How about a removeAll(). Felt the need for that since I was replacing the entire child list with a single node. Will be useful for others also who want to change number of child nodes. At present I had to remove each one individually. Only advantage ofcourse is cleaner (not to mention easier) developer code. > Feel free to add methods that you think make it more intuitive for your > use - when we write a library, we really are doing a lot of guesswork about > what will be useful. Its only when a real user like you comes along that we > know for sure whats good and whats not. Well I am totally ga...ga over the CompositeTag class. Its introduction in 1.3 has made the parser so much more resilient and complete with parent...children....grandchildren...et al. Also ability to do things like getChildrenHTML() and perfect toHtml() methods are absolutely amazing. Now for the setLabel method of LabelTag. Considering all things(i.e. as much as my mind can think of), how about using a NodeList as a parameter? Developers can either use the orginal NodeList with some modifications or create an entirely new one and pass it to the method which in turn will effetively replace the childTags variable in CompositeTag. An overloaded String parameter based one can also be given with information related to its possible slow performance due to internal parsing. All this is only to shield users from inner-level code. Otherwise everything that is required to be done can get done but only after getting some knowledge over these mailing lists......thanx to somik and Derrick. Also can we have a no-args constructor for LabelScanner. I think I had sent these files to Somik for updation into CVS (alongwith SelectTag to use NodeList instead of List) Regards, Dhaval ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Somik R. <so...@ya...> - 2003-05-02 15:55:06
|
> Well the Node [] returned by getChildrenAsNodeArray is a copy of the > original children nodelist. I used the NodeList obtained from > getChildren() and changed contents in that to get my work done. It > worked!!! Ah yes, I recall the copy now. > How about a removeAll(). Felt the need for that since I was replacing > the entire child list with a single node. Will be useful for others also > who want to change number of child nodes. At present I had to remove > each one individually. Only advantage ofcourse is cleaner (not to > mention easier) developer code. > Sounds good. I didn't need it, so didn't put it, but if you do - go ahead. > Now for the setLabel method of LabelTag. Considering all things(i.e. as > much as my mind can think of), how about using a NodeList as a > parameter? Developers can either use the orginal NodeList with some > modifications or create an entirely new one and pass it to the method > which in turn will effetively replace the childTags variable in > CompositeTag. An overloaded String parameter based one can also be given > with information related to its possible slow performance due to > internal parsing. All this is only to shield users from inner-level > code. Otherwise everything that is required to be done can get done but > only after getting some knowledge over these mailing lists......thanx to > somik and Derrick. > It is doubtful that users of the parser will normally need this. Advanced users would - who write their own scanners. We'd probably like them to know something about the internals - not too much of course. Striking the right balance is hard. Its always good to revisit a problem. Now that you know what you do, what do you think is the minimum and simplest, given that you don't want to lose performance ? (You are in the advanced user category now) > Also can we have a no-args constructor for LabelScanner. I think I had > sent these files to Somik for updation into CVS (alongwith SelectTag to > use NodeList instead of List) Hmm.. I seem to have misplaced them (apologies). Can you mail them to Derrick ? Derrick-> Dhaval is inside a firewall, and does not have access to our CVS repository. Regards, Somik |
From: <dha...@or...> - 2003-05-05 07:21:11
Attachments:
BDY.RTF
|
>> Now for the setLabel method of LabelTag. Considering all things(i.e. as >> much as my mind can think of), how about using a NodeList as a >> parameter? Developers can either use the orginal NodeList with some >> modifications or create an entirely new one and pass it to the method >> which in turn will effetively replace the childTags variable in >> CompositeTag. An overloaded String parameter based one can also be given >> with information related to its possible slow performance due to >> internal parsing. All this is only to shield users from inner-level >> code. Otherwise everything that is required to be done can get done but >> only after getting some knowledge over these mailing lists......thanx to >> somik and Derrick. > It is doubtful that users of the parser will normally need this. Advanced > users would - who write their own scanners. We'd probably like them to know > something about the internals - not too much of course. Striking the right > balance is hard. Its always good to revisit a problem. Now that you know > what you do, what do you think is the minimum and simplest, given that you > don't want to lose performance ? (You are in the advanced user category now) Lets go in for the NodeList based approach. (However I do believe that the String absed approach will give a lot of flexibility even though it may impact performance.......which may not be crucial to all). > Hmm.. I seem to have misplaced them (apologies). Can you mail them to > Derrick ? > Derrick-> Dhaval is inside a firewall, and does not have access to our CVS > repository. Will do. Derrick, I'll take the latest sources and give my changes over them. Also will add removeAll method to NodeList. Regards, Dhaval |
From: Somik R. <so...@ya...> - 2003-04-27 22:54:13
|
Hi Dhaval, > I want to parse a LABEL tag and replace the data between the start and end > tags. I am able to obtain the data using the getLabel method. However to > replace, firstly there is no synonymous setLabel() method. Hence I used the > setText() method of Tag class. After that I printed my tag using the toHtml() > method. However I received the previous text itself, not the one I replaced. You cannot and should not rely on setText() to change tag contents. setText() is used by certain automata during the process of parsing itself, for special reasons ( link tag modifying the url, image tag doing the same..) In order to change contents within the tag, use setAttribute(). But.. that doesent help u, does it ? :) You need to evolve the LabelTag interface, to have setLabel - which maps on to the children of the label tag. Setting the text involves creation of string nodes and adding them as children to the label (don't forget to delete the previous children). I've added getChildren() to CompositeTag to help you out. > Also the following is required: > 1. Obviously toHtml() must consider the changed text. Nope. toHtml() considers the internal tag representation alone. You should map to that in the specific tag code. > 2. LabelTag should have setLabel() method synoymous to getLabel() which > internally calls setText() of Tag class. Yes- it will be great if you can write setLabel() - and no, it should not call setText(). Regards, Somik |