Thread: RE: [Htmlparser-developer] HTMLParser Candidate Release 1 is out
Brought to you by:
derrickoswald
From: <dha...@or...> - 2002-11-05 13:31:34
Attachments:
BDY.RTF
|
Hi Somik, =A0 My program compiles perfectly with the new version of HTMLParser, hence ensuring that it is backward compatible to the extent that I have used it. However when I ran my program with it, I got an=A0error. This has happened since earlier the HTMLFormScanner was not being registered by default whereas now it is happening. Once I remove this scanner, once again my program starts working :) =A0 As for my bugs, I had reported last time that in case there were any HTML special characters(like '<' etc.)=A0within comments, there were errors in parsing. I think this specifically applies to HTML comments within <SCRIPT> tags which have HTML special characters. I think this is happening since the <SCRIPT> tag is parsed using HTMLScriptScanner and somehow the comments are not being parsed correctly. (Frankly, I don't know why it is happening but I do know that the code below is being parsed primarily by HTMLScriptScanner and not by HTMLRemarkNode which in turn shoudl happen within HTMScriptScanner probably). =A0 For e.g.=20 =A0 <SCRIPT Language=3D"JavaScript"> <!-- function validateForm() { =A0var i =3D 10; =A0if(i < 5) =A0i =3D i - 1 ;=20 =A0return true; } // --> </SCRIPT> =A0 gets parsed as =A0 <SCRIPT Language=3D"JavaScript"> =A0if(i < 5) =A0i =3D i - 1 ;=20 =A0return true; } // --> </SCRIPT> =A0 =A0 I think special care needs to be taken for HTML commetns which enclose JavaScript since the ending comment has to be of the form: // --> (the first two characters specifying that teh characters on the line are comments in JavaScript) =A0 At present the ending JavaScript comments appear as //=20 --> =A0 which would be an incorrect way to end HTML comments enclosing JavaScript code. =A0 I have added the same as a bug report as well. Urge you to check it out. =A0 Some points I noted while scanning through the javadocs: 1. HTMLFormTag needs documentation so to does HTMLFormScanner. 2. HTMLParser::registerScanners() - HTMLFormScanner is now registered by default. (documentation needs to be updated) 3. HTMLFormScanner - Does not have a no-args constructor like the other scanners. =A0 Regards,=20 Dhaval Udani=20 Senior Analyst=20 M-Line, QPEG=20 OrbiTech Solutions Ltd.=20 +91-22-8290019 Extn. 1457=20 =A0 -----Original Message----- From: somik [mailto:so...@ya...] Sent: Thursday, October 31, 2002 5:57 PM To: htmlparser-developer Cc: somik; htmlparser-user Subject: [Htmlparser-developer] HTMLParser Candidate Release 1 is out =20 =20 =20 Hi Folks, =20 =A0=A0=A0 HTMLParser 20021031 (C1) is out. This is candidate release 1= . If there are no issues, then this will become a production release. =A0=A0=A0=20 =A0=A0=A0 There are bug fixes in this release, and some improvements. = Most important improvement - allowing renderers to be plugged in so as to allow customization of functionality of toHTML(). Check the javadoc of com.kizna.html.HTMLNode. This has been a repeating request, to be able to modify the output of toHTML, especially for designers of web crawlers who want to change the link before saving it. =A0 =A0=A0=A0 Thanks to Kaarle Kaila for the bug fix in HTMLParameterParse= r. Thanks to Domenico Lordi for improvements to HTMLLinkScanner and HTMLLinkTag.=20 =A0=A0=A0=20 =A0=A0=A0 Here is the change log : Integration Build 1.2 - 20021031 [1] Changed string creation to static strings in HTMLTagParser [2] HTMLLinkProcessor can handle urls beginning with file:// (bug fix - 629601) [3] All scanners get the feedback object initialized from HTMLParser [4] Fixed bug 624045 (in HTMLParameterParser) - erroneous space key removed [5] Added HTMLRenderer and external rendering support in HTMLNode. [6] Line no and details incorporated for feedback and exceptions [7] HTMLLinkProcessor: "javascript:" recognition [8] HTMLLinkScanner: added flags for javascript, ftp, http, https [9] HTMLLinkTag: constructor for new flags, methods isJavascriptLink, setJavascriptLink, etc... =A0=A0=A0 Please visit http://htmlparser.sourceforge.net to download t= his release. =A0 =A0=A0=A0 <<Next step>> =A0 As far as architecture is concerned, I think this is it. The feedback mechanism has been more or less integrated, though we're not using the info method at all. Claude -- your help in doing a review on this issue would be highly appreciated. =A0 Dhaval -- Have all the issues that you raised been fixed ? Annette -- Can you give your feedback on the HTMLRenderer and if it is useful for your project ? If anyone has any issues, please raise them now, or forever hold your peace.. =A0 =A0=A0=A0 <<Need Help>> =A0 In order that this may be a truly professional product, it would be highly appreciated, if the members of the user and developer list contribute a small portion of their time in finalizing this production release.=20 =A0 These are the areas where you can help : [1] Test the release and please report bugs WITH your names (pls sign in at sourceforge before u file your bug reports) [2] Check the javadocs - quality control - if anything is missing, please update, and check in. [3] Write articles - based on applications you have written, which we can put up for others to read. Articles could cover design areas, performance, scalability, etc.. [4] Be active on the htmlparser-user mailing list to help others in the community [5] Send a testimonial which we can put up to show that open source software really can achieve professional targets (send this to so...@ya...) =A0 Of course, pls do any of the above only if you have benefitted from this project in any way.=20 Thank you very much, and awaiting your feedback. =A0 Regards, Somik =20 |
From: <dha...@or...> - 2002-11-07 07:04:43
Attachments:
BDY.RTF
|
Hey Somik, I think some mistake on my part (thougve'nt been able to figure out what) but script and comments are now working fine. However the other thing that told you about, i.e. // and --> appearing on 2 different lines needs to be fixed. Also the </SCRIPT> should not be on the same line as --> but on a new line. Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-8290019 Extn. 1457 -----Original Message----- From: somik [mailto:so...@ya...] Sent: Wednesday, November 06, 2002 12:39 AM To: htmlparser-developer Cc: somik Subject: Re: [Htmlparser-developer] HTMLParser Candidate Release 1 is out Dear Dhaval, More responses like yours, and we should be able to finish the production release in no time. >However when I ran my program with it, I got an error. This has >happened since earlier the HTMLFormScanner was not being registered by >default whereas now it is happening. Once I remove this scanner, once >again my program starts working :) Previously this scanner was not registered, as erroneous form tags would crash the scanner. After that, the exception model was introduced, and I thought that it would be good to have a form scanner that throws an exception when it goes into a potential loop. Can you post the exact exception, and your opinion as to why it should not have been thrown ? >I have added the same as a bug report as well. Urge you to check it out. As always, there's only one way to prove a bug - with a testcase. I tried writing one - but mine passes! I have added it to CVS, and for your purpose, here is HTMLScriptScannerTest.java for you. Use it and build the parser. I might have gone wrong in building it, so you can modify this, make it fail, and send it back to me. >Some points I noted while scanning through the javadocs: >1. HTMLFormTag needs documentation so to does HTMLFormScanner. >2. HTMLParser::registerScanners() - HTMLFormScanner is now registered by >default. (documentation needs to be updated) >3. HTMLFormScanner - Does not have a no-args constructor like the other >scanners. I will work on these.. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-11-09 05:08:37
|
Hi Dhaval, > I think some mistake on my part (thougve'nt been able to figure out > what) but script and comments are now working fine. No problem. What about the Form scanner issue ? > However the other > thing that told you about, i.e. // and --> appearing on 2 different > lines needs to be fixed. Also the </SCRIPT> should not be on the same > line as --> but on a new line. Hmm.. this is actually a reconstruction issue. Lets look into this together. Can you also peep into the code (HTMLScriptScanner.scan())? Cheers, Somik |
From: <dha...@or...> - 2002-11-26 04:03:03
Attachments:
BDY.RTF
|
Hi Somik, > No problem. What about the Form scanner issue ? The Form Scanner is registered by default now while it was not the case earlier. Hence if I try to scan tags like INPUT, SELECT, TEXTAREA which have to be within FORM tags then they are not picked up. After I remove the form scanner from the registered list then it works fine. > Hmm.. this is actually a reconstruction issue. Lets look into this together. > Can you also peep into the code (HTMLScriptScanner.scan())? Yeah. I'll do that. Bye, Dhaval ------------------------------------------------------- This sf.net email is sponsored by: See the NEW Palm Tungsten T handheld. Power & Color in a compact size! http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0001en _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Somik R. <so...@ya...> - 2002-11-26 04:07:33
|
Hi Dhaval, > > No problem. What about the Form scanner issue ? > > The Form Scanner is registered by default now while it was not the case > earlier. Hence if I try to scan tags like INPUT, SELECT, TEXTAREA which > have to be within FORM tags then they are not picked up. After I remove > the form scanner from the registered list then it works fine. Oh, in that case, we should not be registering the input, select, textareas directly. Instead, we could do it within form tags, like its done in the frame scanner (enumerating thru frameset elements from within). What do u think ? Regards, Somik |
From: <dha...@or...> - 2002-11-26 04:23:28
Attachments:
BDY.RTF
|
Hi Somik, > Oh, in that case, we should not be registering the input, select, textareas directly. Instead, we could do it within form > tags, like its done in the frame scanner (enumerating thru frameset elements from within). Yeah that would make a lot of sense since obviously these tags can only be within the form tag. ------------------------------------------------------- This SF.net email is sponsored by: Get the new Palm Tungsten T handheld. Power & Color in a compact size! http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0002en _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Somik R. <so...@ya...> - 2002-11-05 19:08:18
Attachments:
HTMLScriptScannerTest.java
|
Dear Dhaval, More responses like yours, and we should be able to finish the production release in no time. >However when I ran my program with it, I got an error. This has >happened since earlier the HTMLFormScanner was not being registered by >default whereas now it is happening. Once I remove this scanner, once >again my program starts working :) Previously this scanner was not registered, as erroneous form tags would crash the scanner. After that, the exception model was introduced, and I thought that it would be good to have a form scanner that throws an exception when it goes into a potential loop. Can you post the exact exception, and your opinion as to why it should not have been thrown ? >I have added the same as a bug report as well. Urge you to check it out. As always, there's only one way to prove a bug - with a testcase. I tried writing one - but mine passes! I have added it to CVS, and for your purpose, here is HTMLScriptScannerTest.java for you. Use it and build the parser. I might have gone wrong in building it, so you can modify this, make it fail, and send it back to me. >Some points I noted while scanning through the javadocs: >1. HTMLFormTag needs documentation so to does HTMLFormScanner. >2. HTMLParser::registerScanners() - HTMLFormScanner is now registered by >default. (documentation needs to be updated) >3. HTMLFormScanner - Does not have a no-args constructor like the other >scanners. I will work on these.. Regards, Somik |