htmlparser-user Mailing List for HTML Parser (Page 92)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(7) |
Feb
|
Mar
(9) |
Apr
(50) |
May
(20) |
Jun
(47) |
Jul
(37) |
Aug
(32) |
Sep
(30) |
Oct
(11) |
Nov
(37) |
Dec
(47) |
2003 |
Jan
(31) |
Feb
(70) |
Mar
(67) |
Apr
(34) |
May
(66) |
Jun
(25) |
Jul
(48) |
Aug
(43) |
Sep
(58) |
Oct
(25) |
Nov
(10) |
Dec
(25) |
2004 |
Jan
(38) |
Feb
(17) |
Mar
(24) |
Apr
(25) |
May
(11) |
Jun
(6) |
Jul
(24) |
Aug
(42) |
Sep
(13) |
Oct
(17) |
Nov
(13) |
Dec
(44) |
2005 |
Jan
(10) |
Feb
(16) |
Mar
(16) |
Apr
(23) |
May
(6) |
Jun
(19) |
Jul
(39) |
Aug
(15) |
Sep
(40) |
Oct
(49) |
Nov
(29) |
Dec
(41) |
2006 |
Jan
(28) |
Feb
(24) |
Mar
(52) |
Apr
(41) |
May
(31) |
Jun
(34) |
Jul
(22) |
Aug
(12) |
Sep
(11) |
Oct
(11) |
Nov
(11) |
Dec
(4) |
2007 |
Jan
(39) |
Feb
(13) |
Mar
(16) |
Apr
(24) |
May
(13) |
Jun
(12) |
Jul
(21) |
Aug
(61) |
Sep
(31) |
Oct
(13) |
Nov
(32) |
Dec
(15) |
2008 |
Jan
(7) |
Feb
(8) |
Mar
(14) |
Apr
(12) |
May
(23) |
Jun
(20) |
Jul
(9) |
Aug
(6) |
Sep
(2) |
Oct
(7) |
Nov
(3) |
Dec
(2) |
2009 |
Jan
(5) |
Feb
(8) |
Mar
(10) |
Apr
(22) |
May
(85) |
Jun
(82) |
Jul
(45) |
Aug
(28) |
Sep
(26) |
Oct
(50) |
Nov
(8) |
Dec
(16) |
2010 |
Jan
(3) |
Feb
(11) |
Mar
(39) |
Apr
(56) |
May
(80) |
Jun
(64) |
Jul
(49) |
Aug
(48) |
Sep
(16) |
Oct
(3) |
Nov
(5) |
Dec
(5) |
2011 |
Jan
(13) |
Feb
|
Mar
(1) |
Apr
(7) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(8) |
Sep
|
Oct
(6) |
Nov
(2) |
Dec
|
2012 |
Jan
(5) |
Feb
|
Mar
(3) |
Apr
(3) |
May
(4) |
Jun
(8) |
Jul
(1) |
Aug
(5) |
Sep
(10) |
Oct
(3) |
Nov
(2) |
Dec
(4) |
2013 |
Jan
(4) |
Feb
(2) |
Mar
(7) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
2015 |
Jan
(4) |
Feb
(2) |
Mar
(8) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
(1) |
Sep
(1) |
Oct
(4) |
Nov
(3) |
Dec
(4) |
2016 |
Jan
(4) |
Feb
(6) |
Mar
(9) |
Apr
(9) |
May
(6) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2017 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(2) |
Aug
(3) |
Sep
(6) |
Oct
(3) |
Nov
(2) |
Dec
(5) |
2018 |
Jan
(3) |
Feb
(13) |
Mar
(28) |
Apr
(5) |
May
(4) |
Jun
(2) |
Jul
(2) |
Aug
(8) |
Sep
(2) |
Oct
(1) |
Nov
(5) |
Dec
(1) |
2019 |
Jan
(8) |
Feb
(1) |
Mar
|
Apr
(1) |
May
(4) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
(1) |
2021 |
Jan
(3) |
Feb
(2) |
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Somik R. <so...@ya...> - 2002-08-16 03:16:13
|
Hi Raghav, There was an aborted attempt at integrating it with swing - I gave = up, lost patience :) You are free to try if you are interested - if you manage to do it, = it would be a big contribution to the community. Cheers, Somik ----- Original Message -----=20 From: Raghavender Srimantula=20 To: htm...@li...=20 Sent: Friday, August 16, 2002 1:38 AM Subject: Re: [Htmlparser-user] Change in Layout Hi Somik, This is Raghav. the htmlparser which we are using, is this integrated = with=20 java swing. I beleive java swing also can render the html tags. Raghav >From: "Somik Raha" <so...@ya...> >Reply-To: htm...@li... >To: <htm...@li...> >Subject: Re: [Htmlparser-user] Change in Layout >Date: Tue, 13 Aug 2002 10:42:36 +0900 > >Hi Dhaval, >The order may be the same but the absence or presence of newline in >certain cases can lead to a subsequent difference in the outlook. > >I am intrigued by this. Can you tell me the specific cases ? > >Moving to a different topic, the integration release has fixed the >problem of "checked", I believe. Is that correct? > >Yes - latest one has lots of fixes, and should be the most stable one = yet -=20 >primarily bcos the shameful infinite loop has problem has been fixed = :) > >Cheers, >Somik _________________________________________________________________ MSN Photos is the easiest way to share and print your photos:=20 http://photos.msn.com/support/worldwide.aspx ------------------------------------------------------- This sf.net email is sponsored by: OSDN - Tired of that same old cell phone? Get a new here for FREE! https://www.inphonic.com/r.asp?r=3Dsourceforge1&refcode1=3Dvs3390 _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Raghavender S. <kin...@ho...> - 2002-08-15 16:39:05
|
Hi Somik, This is Raghav. the htmlparser which we are using, is this integrated with java swing. I beleive java swing also can render the html tags. Raghav >From: "Somik Raha" <so...@ya...> >Reply-To: htm...@li... >To: <htm...@li...> >Subject: Re: [Htmlparser-user] Change in Layout >Date: Tue, 13 Aug 2002 10:42:36 +0900 > >Hi Dhaval, >The order may be the same but the absence or presence of newline in >certain cases can lead to a subsequent difference in the outlook. > >I am intrigued by this. Can you tell me the specific cases ? > >Moving to a different topic, the integration release has fixed the >problem of "checked", I believe. Is that correct? > >Yes - latest one has lots of fixes, and should be the most stable one yet - >primarily bcos the shameful infinite loop has problem has been fixed :) > >Cheers, >Somik _________________________________________________________________ MSN Photos is the easiest way to share and print your photos: http://photos.msn.com/support/worldwide.aspx |
From: <dha...@or...> - 2002-08-13 11:16:11
|
Hi Somik, I am intrigued by this. Can you tell me the specific cases ? [Udani, Dhaval H.]=A0 Assume a TEXTAREA tag as follows: <TEXTAREA=A0 name=3D"Comments">How are you?</TEXTAREA> =A0 as compared to =20 <TEXTAREA=A0 name=3D"Comments"> How are you? </TEXTAREA> =A0 If the form is submitted to a servlet=A0the latter will be received with 2 newline characters whereas the former will have none. For this particular case even tabs make a difference. =A0 Apart from that=20 <TD><IMG SRC=3D"today.gif"></TD> =A0 =20 <TD> <IMG SRC=3D"today.gif"> </TD> =A0 also can look different for certain gif sizes and cell sizes. =A0 Dhaval |
From: Somik R. <so...@ya...> - 2002-08-13 01:49:46
|
Hi Dhaval, The order may be the same but the absence or presence of newline in certain cases can lead to a subsequent difference in the outlook. I am intrigued by this. Can you tell me the specific cases ? Moving to a different topic, the integration release has fixed the problem of "checked", I believe. Is that correct? Yes - latest one has lots of fixes, and should be the most stable one = yet - primarily bcos the shameful infinite loop has problem has been = fixed :) Cheers, Somik |
From: <dha...@or...> - 2002-08-12 05:16:56
|
I understand - but by saying *HTML page should not be altered* , I am assuming you mean that the elements should be the same functionally - which is the case. Why should end of line characters make a difference, because all the tags come in the same order as you'd expect. [Udani, Dhaval H.]=A0 Initially even I thought the same but when I actually got to work with some pages and saw the difference online that I realized that the newline character plays a part in presentation as well.=20 =A0 The order may be the same=A0but the absence or presence of newline in certain cases can lead to a subsequent difference in the outlook. =A0 Moving to a different topic, the integration release has fixed the problem of "checked",=A0I believe. Is that correct? =A0 Regards, Somik =20 |
From: Somik R. <so...@ya...> - 2002-08-12 04:47:24
|
Hi Dhaval, I convert it into a JSP by adding JSP code which does not alter the layout at all. The HTML code I am adding is more on the line of some script functions, some event-handler code etc.The presentation is not touched with. In fact this is the primary requirement of this tool that the layout of the orginal HTML page should not be altered. For example, I decide that before <HTML> I want to put in some JSP code. Another instance is that after <HEAD> I want to add some <SCRIPT> code. Yet another instance would be that for <INPUT> tag I want to add the ONFOCUS event-handler but if it already exists I want to only append to it. =20 I understand - but by saying *HTML page should not be altered* , I am = assuming you mean that the elements should be the same functionally - = which is the case. Why should end of line characters make a difference, = because all the tags come in the same order as you'd expect. Regards, Somik |
From: <dha...@or...> - 2002-08-12 04:44:58
|
=A0 I am not sure I fully understand. The other teams are creating HTML with their own look and feel. You are converting it to a JSP. Naturally the alignment would have changes by your additions itself. Now, if the original HTML is preserved in functionality but not in exact layout as it arrived, I did not understand how that causes a problem in your other teams. Are they reading your jsp file through some program ?=20 [Udani, Dhaval H.]=A0 I convert it into a JSP=A0by adding JSP code which does not alter the layout at all.=A0The HTML code I am adding is more on the line of some script functions, some event-handler code etc.The presentation is not touched with. In fact this is the primary requirement of this tool that the layout of the orginal HTML page should not be altered. =A0 For example, I decide that before <HTML> I want to put in some JSP code. Another instance is that after <HEAD> I want to add some <SCRIPT> code. =A0 Yet another instance would be that for <INPUT> tag I want to add the ONFOCUS event-handler=A0but if it already exists I want to only append to it. =A0 In this manner my tool never changes the presentation of the HTML page just adds some scripting and JSP code. =A0 I hope I have made my point clearer. =A0 Regards, Dhaval |
From: Somik R. <so...@ya...> - 2002-08-10 08:22:39
|
Dhaval Udani wrote : My team is building a framework which is used by many projects in my organization. All the other projects create HTML with their own look-and-feel. To use the framework, they need to convert these files into a JSP(using a tool developed by my team). The tool apart from jsut changing the extension ;) also adds lots of JSP code and makes certain modifications to the HTML tags(not the presentation tags though). After the JSP is created if the layout changes, they will ahve to again spend time correcting this anomaly and will need to keep doing it everytime they change their HTML page or the tool is updated. Now I guess you can understand why I feel so strongly about maintaining layout. I am not sure I fully understand. The other teams are creating HTML with = their own look and feel. You are converting it to a JSP. Naturally the = alignment would have changes by your additions itself. Now, if the = original HTML is preserved in functionality but not in exact layout as = it arrived, I did not understand how that causes a problem in your other = teams. Are they reading your jsp file through some program ?=20 If you can give some more details, a clearer picture might emerge. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-08-10 08:15:00
|
Hi Folks, Next release (v1.2-2002-08-11) is out. From the change log : [1] Fixed bug 590703 - Empty values dont get parsed [2] Fixed bug 591435 - Missing values cause keys to be missed [3] Removed all infinite loops in scanners, replaced with throwing HTMLParserException [4] Fixed bug in HTMLTitleScanner, allowing certain malformed title tags to be parsed [5] Modified HTMLReader - now accepts Reader instead of BufferedReader [6] HTMLParser constructor now throws HTMLParserException. [7] Fixed bug 592355 - Empty tags throw exceptions from some scanners. Now, if the tag is empty, it is not passed down to scanners. Also, fixed the related issue in HTMLStringNode, causing empty tags to be treated as tags and not strings. A very significant fix is #3 - I would highly recommend upgrading your copies asap. Also, following suggestions of Amit Rana, the constructor itself throws HTMLParserException. You can expect some more API changes in the coming weeks, as we attempt to integrate Claude's other contributions (Parser Feedback). We've got over 150 tests and all passing. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-08-09 08:48:22
|
Hi Dhaval, How can I parse this correctly? Will all the text after the <OPTION> tag considered to be a single node whether it is in the same line or different line. =20 Yes- it will be given to you in a HTMLStringNode. And of course = </OPTION> is parsed into a HTMLEndTag. Take a look at the title scanner = - you would probably need to do something similar, except the logic = might be a bit more involved. Regards Somik ----- Original Message -----=20 From: dha...@or...=20 To: htm...@li...=20 Sent: Friday, August 09, 2002 5:40 PM Subject: [Htmlparser-user] Tag for OPTION Hi, I am writing a tag-scanner pair for <OPTION> tag. I need some help. An option tag can be written as=20 <OPTION value=3D"Java">Java Language</OPTION> <OPTION value=3D"C">C Language</OPTION> OR <OPTION value=3D"Java">Java Language <OPTION value=3D"C">C Language How can I parse this correctly? Will all the text after the <OPTION> = tag considered to be a single node whether it is in the same line or different line. Thanx in advance. Regards, Dhaval |
From: <dha...@or...> - 2002-08-09 08:41:45
|
Hi, I am writing a tag-scanner pair for <OPTION> tag. I need some help. =A0 An option tag can be written as=20 <OPTION value=3D"Java">Java Language</OPTION> <OPTION value=3D"C">C Language</OPTION> OR =A0<OPTION value=3D"Java">Java Language <OPTION value=3D"C">C Language How can I parse this correctly? Will all the text after the <OPTION> tag considered to be a single node whether it is in the same line or different line. Thanx in advance. Regards, Dhaval |
From: <dha...@or...> - 2002-08-08 07:37:30
|
Hi, =A0 I would definitely appreciate converting the hard-coded end-of-line character with a detected end-of-line character from the system property. Currently I read the entire file and replace the hard-coded EOL with the system property EOL. =A0 I think the last EOL for toHTML() should be removed and instead all "\n" should be also parsed and reproduced exactly in the same way. Preserving layout shoudl be as important as performance. Also my feeling is that this tool will be used mostly by developers during development time and not at runtime(though it is always possible) and hence performance may not be an issue here. =A0 Please feel free to criticize my opinion. =A0 Typically my predicament is as follows : =A0 My team is=A0building a framework which is used by many projects in my organization. All the other projects create HTML with their own look-and-feel. To use the framework, they need to convert these files into a JSP(using a tool developed by my team). The tool apart from jsut changing the extension ;) also adds lots of JSP code and makes certain modifications to the HTML tags(not the presentation tags though). After the JSP is created if the layout changes, they will ahve to again spend time correcting this anomaly and will need to keep doing it everytime they change their HTML page or the tool is updated. Now I guess you can understand why I feel so strongly about maintaining layout. =A0 At the same time I am aware that the parser is here for everyone's need and will be driven accordingly. Hence am just presenting my point of view. Regards,=20 Dhaval Udani=20 Senior Analyst=20 M-Line, QPEG=20 OrbiTech Solutions Ltd.=20 +91-22-8290019 Extn. 1457=20 =A0 -----Original Message----- From: somik [mailto:so...@ya...] Sent: Thursday, August 08, 2002 12:33 PM To: htmlparser-user Cc: somik Subject: Re: [Htmlparser-user] Change in Layout =20 =20 =20 Hi Dhaval, =A0=A0=A0 This is actually a feature. If we try to give the exact same output as originally parsed, the performance of the parser could be compromised. Hence, giving=A0a corresponding output with slightly different formatting was chosen - in order to keep the design of the parser simple. =A0=A0=A0 However, related to this is an interesing issue - for which community feedback would be valuable. Currently, the formatting of toHTML() is rather arbitrary (in my opinion). By this I am particularly referring to the usage of end of line characters. Considering that=A0end of line characters differ=A0for each operating system - would it be a good idea to replace the hard-coded end of line characters with a the detected end of line char for a particular OS ? =A0 Regards, Somik ----- Original Message -----=20 From: dha...@or...=20 To: htm...@li...=20 Sent: Thursday, August 08, 2002 3:52 PM Subject: [Htmlparser-user] Change in Layout Hi, =20 I have an HTML page which I am rying to modify. During this process, I have come across a quirk. I don't know whether the problem is browser related or parser related. =20 The following HTML code : <TD align=3D"left" valign=3D"top" width=3D"18"><img src=3D"images/right_h1.gif" width=3D"18" height=3D"22"></TD> =20 gets converted to <TD align=3D"left" valign=3D"top" width=3D"18"> <img src=3D"images/right_h1.gif" width=3D"18" height=3D"22"> </TD> =20 This happens whenever I print back the parsed data using tag.toHTML(). =20 These 2 seem to be the same but presentation-wise I see different outputs. Is it write on part of tag.toHTML() to printout the EOL character at the end of the tag. =20 Regards,=20 =20 Dhaval Udani=20 Senior Analyst=20 M-Line, QPEG=20 OrbiTech Solutions Ltd.=20 +91-22-8290019 Extn. 1457=20 =20 =20 =20 =A0=A0 -----Original Message----- =A0=A0 From: somik [ mailto:so...@ya...] =A0=A0 Sent: Wednesday, August 07, 2002 10:26 AM =A0=A0 To: htmlparser-user =A0=A0 Cc: somik; htmlparser-developer =A0=A0 Subject: Re: [Htmlparser-user] Another Ill-Formed Example =A0=A0=20 =A0=A0=20 =20 =A0=A0=20 =A0=A0 Hi Claude, =A0=A0 This has been handled, related to the earlier fix. All potential =A0=A0 infinite loops have been removed, and there will be no more hangings =A0=A0 - only HTMLParserExceptions from now on. =A0=A0 There will be a release having all these fixes this weekend. =A0=A0=20 =A0=A0 Regards, =A0=A0 Somik =20 =A0=A0=A0=A0=A0 ----- Original Message -----=20 =A0=A0=A0=A0=A0 From: Claude Duguay=20 =A0=A0=A0=A0=A0 To: htm...@li...=20 =A0=A0=A0=A0=A0 Sent: Wednesday, August 07, 2002 3:35 AM =A0=A0=A0=A0=A0 Subject: [Htmlparser-user] Another Ill-Formed Examp= le =20 =20 =A0=A0=A0=A0=A0 Here's some markup we found in another document tha= t causes the =A0=A0=A0=A0=A0 HTMLParser to hang. =20 =A0=A0=A0=A0=A0 "<TITLE>KRP VALIDATION<PROCESS/TITLE>" =20 =A0=A0=A0=A0=A0 So far, we've had 4 documents cause our process to = come to a =A0=A0=A0=A0=A0 grinding halt. I would much prefer a policy of exce= ption throwing =A0=A0=A0=A0=A0 to hangs asap, followed by consideration of whether= unusual markup =A0=A0=A0=A0=A0 can be handled more elegantly in a subsequent phase= . Thanks to =A0=A0=A0=A0=A0 everyone, as always. =20 =A0=A0=A0=A0=A0=20 =20 =A0=A0=20 =20 =20 =20 =20 |
From: Somik R. <so...@ya...> - 2002-08-08 07:10:24
|
Hi Dhaval, This is actually a feature. If we try to give the exact same output = as originally parsed, the performance of the parser could be = compromised. Hence, giving a corresponding output with slightly = different formatting was chosen - in order to keep the design of the = parser simple. However, related to this is an interesing issue - for which = community feedback would be valuable. Currently, the formatting of = toHTML() is rather arbitrary (in my opinion). By this I am particularly = referring to the usage of end of line characters. Considering that end = of line characters differ for each operating system - would it be a good = idea to replace the hard-coded end of line characters with a the = detected end of line char for a particular OS ? Regards, Somik ----- Original Message -----=20 From: dha...@or...=20 To: htm...@li...=20 Sent: Thursday, August 08, 2002 3:52 PM Subject: [Htmlparser-user] Change in Layout Hi, I have an HTML page which I am rying to modify. During this process, I have come across a quirk. I don't know whether the problem is browser related or parser related. The following HTML code : <TD align=3D"left" valign=3D"top" width=3D"18"><img = src=3D"images/right_h1.gif" width=3D"18" height=3D"22"></TD> gets converted to <TD align=3D"left" valign=3D"top" width=3D"18"> <img src=3D"images/right_h1.gif" width=3D"18" height=3D"22"> </TD> This happens whenever I print back the parsed data using tag.toHTML(). These 2 seem to be the same but presentation-wise I see different outputs. Is it write on part of tag.toHTML() to printout the EOL character at the end of the tag. Regards,=20 Dhaval Udani=20 Senior Analyst=20 M-Line, QPEG=20 OrbiTech Solutions Ltd.=20 +91-22-8290019 Extn. 1457=20 -----Original Message----- From: somik [mailto:so...@ya...] Sent: Wednesday, August 07, 2002 10:26 AM To: htmlparser-user Cc: somik; htmlparser-developer Subject: Re: [Htmlparser-user] Another Ill-Formed Example =20 =20 =20 Hi Claude, This has been handled, related to the earlier fix. All potential infinite loops have been removed, and there will be no more = hangings - only HTMLParserExceptions from now on. There will be a release having all these fixes this weekend. =20 Regards, Somik ----- Original Message -----=20 From: Claude Duguay=20 To: htm...@li...=20 Sent: Wednesday, August 07, 2002 3:35 AM Subject: [Htmlparser-user] Another Ill-Formed Example Here's some markup we found in another document that causes the HTMLParser to hang. "<TITLE>KRP VALIDATION<PROCESS/TITLE>" So far, we've had 4 documents cause our process to come to a grinding halt. I would much prefer a policy of exception = throwing to hangs asap, followed by consideration of whether unusual = markup can be handled more elegantly in a subsequent phase. Thanks to everyone, as always. =20 =20 |
From: <dha...@or...> - 2002-08-08 06:54:38
|
Hi, =A0 I have an HTML page which=A0I am rying to modify. During this process, I have come across a quirk. I don't know whether the problem is browser related or parser related. =A0 The following HTML code : <TD align=3D"left" valign=3D"top" width=3D"18"><img src=3D"images/right_h= 1.gif" width=3D"18" height=3D"22"></TD> =A0 gets converted to <TD align=3D"left" valign=3D"top" width=3D"18"> <img src=3D"images/right_h1.gif" width=3D"18" height=3D"22"> </TD> =A0 This happens whenever I print back the parsed data using tag.toHTML(). =A0 These 2 seem to be the same but presentation-wise I see different outputs. Is it write on part of tag.toHTML() to printout the EOL character at the end of the tag. Regards,=20 Dhaval Udani=20 Senior Analyst=20 M-Line, QPEG=20 OrbiTech Solutions Ltd.=20 +91-22-8290019 Extn. 1457=20 =A0 -----Original Message----- From: somik [mailto:so...@ya...] Sent: Wednesday, August 07, 2002 10:26 AM To: htmlparser-user Cc: somik; htmlparser-developer Subject: Re: [Htmlparser-user] Another Ill-Formed Example =20 =20 =20 Hi Claude, =A0=A0=A0 This has been handled, related to the earlier fix. All poten= tial infinite loops have been removed, and there will be no more hangings - only HTMLParserExceptions from now on. =A0=A0=A0 There will be=A0a release having all these fixes this weeken= d. =A0 Regards, Somik ----- Original Message -----=20 From: Claude Duguay=20 To: htm...@li...=20 Sent: Wednesday, August 07, 2002 3:35 AM Subject: [Htmlparser-user] Another Ill-Formed Example Here's some markup we found in another document that causes the HTMLParser to hang. "<TITLE>KRP VALIDATION<PROCESS/TITLE>" So far, we've had 4 documents cause our process to come to a grinding halt. I would much prefer a policy of exception throwing to hangs asap, followed by consideration of whether unusual markup can be handled more elegantly in a subsequent phase. Thanks to everyone, as always. =A0 =20 |
From: Somik R. <so...@ya...> - 2002-08-07 05:02:31
|
MessageHi Claude, This has been handled, related to the earlier fix. All potential = infinite loops have been removed, and there will be no more hangings - = only HTMLParserExceptions from now on. There will be a release having all these fixes this weekend. Regards, Somik ----- Original Message -----=20 From: Claude Duguay=20 To: htm...@li...=20 Sent: Wednesday, August 07, 2002 3:35 AM Subject: [Htmlparser-user] Another Ill-Formed Example Here's some markup we found in another document that causes the = HTMLParser to hang. "<TITLE>KRP VALIDATION<PROCESS/TITLE>" So far, we've had 4 documents cause our process to come to a grinding = halt. I would much prefer a policy of exception throwing to hangs asap, = followed by consideration of whether unusual markup can be handled more = elegantly in a subsequent phase. Thanks to everyone, as always. =20 |
From: Somik R. <so...@ya...> - 2002-08-07 04:13:37
|
Hi Dhaval, I would like to know the purpose of the toHTML() function in HTMLTag class. Is this function supposed to output the parsed tag as parsed by the HTMLParser or am I supposed to override this function in my custom tag class to depict theHTML formed if any of the attributes of the tag have been changed.=20 This method is supposed to give you a reconstruction of the HTML Tag = parsed. Even if the tag parsed was malformed, the result of this method = will be correct HTML. In short I woudl like to know whether I need to override this method since at present I can only see the original parsed data in it and not my modified data. Thats right - for your application - you will need to change the = functionality of this method. I've been thinking of introducing HTMLRenderers, for a tag - so you can = dynamically change it for the tags that you want. That would be a = cleaner solution - so you might have an interface public interface HTMLRenderer { public String toHTML() { } } and for your input tag, you should have a static method public static void addHTMLRenderer(HTMLRenderer renderer) { } Then, HTMLInputTag.toHTML() may have an implementation -=20 public String toHTML() { if (renderer!=3Dnull) return renderer.toHTML(); else=20 return super.toHTML(); } This way, we can have both the normal functionality, and the modified = functionality. Regards, Somik |
From: Claude D. <CD...@ar...> - 2002-08-06 18:35:04
|
Here's some markup we found in another document that causes the HTMLParser to hang. "<TITLE>KRP VALIDATION<PROCESS/TITLE>" So far, we've had 4 documents cause our process to come to a grinding halt. I would much prefer a policy of exception throwing to hangs asap, followed by consideration of whether unusual markup can be handled more elegantly in a subsequent phase. Thanks to everyone, as always. =20 |
From: <dha...@or...> - 2002-08-06 11:41:27
|
Hi, =A0 I would like to know the purpose of the toHTML() function in HTMLTag class. Is this function supposed to output the parsed tag as parsed by the HTMLParser or am I supposed to override this function in my custom tag class to depict theHTML formed if any of the attributes of the tag have been changed. In short I woudl like to know whether I need to override this method since at present I can only see the original parsed data in it and not my modified data. =A0 If this is true, then we must ensure that all tag writers need to override this function as per their individual tags. =A0 Regards,=20 Dhaval Udani=20 Senior Analyst=20 M-Line, QPEG=20 OrbiTech Solutions Ltd.=20 +91-22-8290019 Extn. 1457=20 =A0 -----Original Message----- From: somik [mailto:so...@ya...] Sent: Tuesday, August 06, 2002 2:40 PM To: htmlparser-user Cc: somik Subject: Re: [Htmlparser-user] Scanner constructor =20 =20 =20 Hi Dhaval About the checked issue, I believe the parser will not throw an error in such a situation if checked is given in an INPUT tag. =20 Right - but it does not give u the field as well. =A0 While registering scanners with the HTML parser I will need to create an instance of my scanner and supply a filter string. What are these filter strings and what should I use? =20 You can ignore the filter string, its for the command line parser testing. =A0 Cheers, Somik ----- Original Message -----=20 From: dha...@or...=20 To: htm...@li...=20 Sent: Tuesday, August 06, 2002 6:12 PM Subject: [Htmlparser-user] Scanner constructor Hi, =20 I am writing a scanner. The constructor for a scanner contains a parameter for a filter. While registering scanners with the HTML parser I will need to create an instance of my scanner and supply a filter string. What are these filter strings and what should I use? =20 Somik, About the checked issue, I believe the parser will not throw an error in such a situation if checked is given in an INPUT tag. =20 Regards, Dhaval =20 =20 =20 |
From: Somik R. <so...@ya...> - 2002-08-06 09:17:04
|
Hi Dhaval About the checked issue, I believe the parser will not throw an error in such a situation if checked is given in an INPUT tag. Right - but it does not give u the field as well. While registering scanners with the HTML parser I will need to create an instance of my scanner and supply a filter string. What are these filter strings and what should I use? You can ignore the filter string, its for the command line parser = testing. Cheers, Somik ----- Original Message -----=20 From: dha...@or...=20 To: htm...@li...=20 Sent: Tuesday, August 06, 2002 6:12 PM Subject: [Htmlparser-user] Scanner constructor Hi, I am writing a scanner. The constructor for a scanner contains a parameter for a filter. While registering scanners with the HTML parser I will need to create = an instance of my scanner and supply a filter string. What are these = filter strings and what should I use? Somik, About the checked issue, I believe the parser will not throw an error = in such a situation if checked is given in an INPUT tag. Regards, Dhaval |
From: <dha...@or...> - 2002-08-06 09:13:23
|
Hi, =A0 I am writing a scanner. The constructor for a scanner contains a parameter for a filter. While registering scanners with the HTML parser I will need to create an instance of my scanner and supply a filter string. What are these filter strings and what should I use? =A0 Somik, About the checked issue, I believe the parser will not throw an error in such a situation if checked is given in an INPUT tag. =A0 Regards, Dhaval |
From: Somik R. <so...@ya...> - 2002-08-06 07:23:36
|
Hi Dhaval, I woudl like to know how "checked" would be reflected in the HTMLTag during the parsing procedure. This is what "should" happen - the tag will be treated the same as=20 <INPUT type=3D"checkbox" name=3D"Authorize" value=3D"Y" checked=3D""> However, this is not what actually happens - I've written a testcase to = demonstrate this, and we should be fixing it soon. Kaarle - I've opened a bug report, can you check this ? Thanks a lot. Regards, Somik =20 ----- Original Message -----=20 From: dha...@or...=20 To: htm...@li...=20 Sent: Tuesday, August 06, 2002 4:04 PM Subject: RE: [Htmlparser-user] Parsing query Hi, I have a small doubt. For a checkbox or a radio button the following kind of tag is very normal. <INPUT type=3D"checkbox" name=3D"Authorize" value=3D"Y" checked> I woudl like to know how "checked" would be reflected in the HTMLTag during the parsing procedure. Thanx in advance, Dhaval |
From: <dha...@or...> - 2002-08-06 07:04:43
|
Hi, =A0 I have a small doubt. For a checkbox or a radio button the following kind of tag is very normal. =A0 <INPUT type=3D"checkbox" name=3D"Authorize" value=3D"Y" checked> =A0 I woudl like to know how "checked" would be reflected in the HTMLTag during the parsing procedure. =A0 Thanx in advance, Dhaval |
From: Somik R. <so...@ya...> - 2002-08-04 07:27:46
|
I forgot to mention - the most important bug fix in this release is in = parseParameters() (588885), done by Kaarle Kaila, because of which we = have been able to incorporate "intelligence" in the parsing, making = Cedric Rosa a happy man. Thanks a ton, Kaarle. Cheers, Somik |
From: Somik R. <so...@ya...> - 2002-08-04 07:22:36
|
Hi Folks, Its time again, another integration release is out. Check = http://htmlparser.sourceforge.net. So whats new? Major API change - the parser now has chained = exceptions. If some problem occurs, your application will have a chance = to take care of it, instead of simply crashing. Also, the exception = messages are more meaningful, giving a better picture of what went = wrong. Thanks to Claude Duguay for the ChainedException classes, and bug = reports. And many thanks to the best tester of HTMLParser - Cedric Rosa = - for countless bug reports - pls keep up the good work. From the change log,=20 [1] Fixed bug 590250, problem in HTMLStringNode, by which a single character on the last line was causing a parser crash [2] Optimized and refactored HTMLParameterParaser.parseParameters() [3] Modified PerformanceTest to exclude first reading in average = computation [4] Fixed bug in HTMLParameterParser.parseParameters(), due to which params with spaces before =3D were not being picked up [5] Made massive API changes - throwing exceptions and using = HTMLEnumeration [6] Fixed HTMLRemarkNode bug - we can recognize stuff like now. [7] Fixed HTMLImageScanner bug - we can now fix image tags like IMG SRC"somepic.jpg" - the missing equal to can be deduced [8] Fixed HTMLLinkScanner bug - end tags within a link were not being = included inside the link data. Please give your feedback regarding the API changes.=20 NOTE=20 [1] this release would break your existing applications due to the API = change. Simply wrap the parsing in a try-catch block to cath a = HTMLParserException and your apps should work again. [2] There is one known bug (590703) caught by two testcases in = parseParameters(). This is a minor bug which shouldnt affect = applications, and should be fixed in the next release. Cheers, Somik |
From: Somik R. <so...@ya...> - 2002-08-01 02:42:12
|
I am trying to write my own scanner class. In the scan method what should be ideally returned? i.e. the Javadoc says that the return type is an HTMLNOde however I am not clear as to exactly what shoudl be returned. Can anyone help me out here? HTMLNode is only an abstract class. You should not worry about = implementing it completely, because it is already done in the HTMLTag = class. So your new tag type object should derive from HTMLTag. Take a look at the implementation of some of the existing scanners = (particularly HTMLMetaTagScanner).=20 Bytway, if you writing a new type of scanner - from your earlier mail - = the HTMLInputTagScanner, you might want to check it in to the main code = base so we can put it in with the main distribution and you'd also get = support for maintaining it. If you need CVS developer access, sign up at = http://sourceforge.net/account/register.php and send me your sourceforge = id. You can directly check in your code into CVS, and that way = collaboration becomes easier and faster. Cheers, Somik |