RE: [Htmlparser-developer] HTMLParser Candidate Release 1 is out
Brought to you by:
derrickoswald
From: <dha...@or...> - 2002-11-05 13:31:34
|
Hi Somik, =A0 My program compiles perfectly with the new version of HTMLParser, hence ensuring that it is backward compatible to the extent that I have used it. However when I ran my program with it, I got an=A0error. This has happened since earlier the HTMLFormScanner was not being registered by default whereas now it is happening. Once I remove this scanner, once again my program starts working :) =A0 As for my bugs, I had reported last time that in case there were any HTML special characters(like '<' etc.)=A0within comments, there were errors in parsing. I think this specifically applies to HTML comments within <SCRIPT> tags which have HTML special characters. I think this is happening since the <SCRIPT> tag is parsed using HTMLScriptScanner and somehow the comments are not being parsed correctly. (Frankly, I don't know why it is happening but I do know that the code below is being parsed primarily by HTMLScriptScanner and not by HTMLRemarkNode which in turn shoudl happen within HTMScriptScanner probably). =A0 For e.g.=20 =A0 <SCRIPT Language=3D"JavaScript"> <!-- function validateForm() { =A0var i =3D 10; =A0if(i < 5) =A0i =3D i - 1 ;=20 =A0return true; } // --> </SCRIPT> =A0 gets parsed as =A0 <SCRIPT Language=3D"JavaScript"> =A0if(i < 5) =A0i =3D i - 1 ;=20 =A0return true; } // --> </SCRIPT> =A0 =A0 I think special care needs to be taken for HTML commetns which enclose JavaScript since the ending comment has to be of the form: // --> (the first two characters specifying that teh characters on the line are comments in JavaScript) =A0 At present the ending JavaScript comments appear as //=20 --> =A0 which would be an incorrect way to end HTML comments enclosing JavaScript code. =A0 I have added the same as a bug report as well. Urge you to check it out. =A0 Some points I noted while scanning through the javadocs: 1. HTMLFormTag needs documentation so to does HTMLFormScanner. 2. HTMLParser::registerScanners() - HTMLFormScanner is now registered by default. (documentation needs to be updated) 3. HTMLFormScanner - Does not have a no-args constructor like the other scanners. =A0 Regards,=20 Dhaval Udani=20 Senior Analyst=20 M-Line, QPEG=20 OrbiTech Solutions Ltd.=20 +91-22-8290019 Extn. 1457=20 =A0 -----Original Message----- From: somik [mailto:so...@ya...] Sent: Thursday, October 31, 2002 5:57 PM To: htmlparser-developer Cc: somik; htmlparser-user Subject: [Htmlparser-developer] HTMLParser Candidate Release 1 is out =20 =20 =20 Hi Folks, =20 =A0=A0=A0 HTMLParser 20021031 (C1) is out. This is candidate release 1= . If there are no issues, then this will become a production release. =A0=A0=A0=20 =A0=A0=A0 There are bug fixes in this release, and some improvements. = Most important improvement - allowing renderers to be plugged in so as to allow customization of functionality of toHTML(). Check the javadoc of com.kizna.html.HTMLNode. This has been a repeating request, to be able to modify the output of toHTML, especially for designers of web crawlers who want to change the link before saving it. =A0 =A0=A0=A0 Thanks to Kaarle Kaila for the bug fix in HTMLParameterParse= r. Thanks to Domenico Lordi for improvements to HTMLLinkScanner and HTMLLinkTag.=20 =A0=A0=A0=20 =A0=A0=A0 Here is the change log : Integration Build 1.2 - 20021031 [1] Changed string creation to static strings in HTMLTagParser [2] HTMLLinkProcessor can handle urls beginning with file:// (bug fix - 629601) [3] All scanners get the feedback object initialized from HTMLParser [4] Fixed bug 624045 (in HTMLParameterParser) - erroneous space key removed [5] Added HTMLRenderer and external rendering support in HTMLNode. [6] Line no and details incorporated for feedback and exceptions [7] HTMLLinkProcessor: "javascript:" recognition [8] HTMLLinkScanner: added flags for javascript, ftp, http, https [9] HTMLLinkTag: constructor for new flags, methods isJavascriptLink, setJavascriptLink, etc... =A0=A0=A0 Please visit http://htmlparser.sourceforge.net to download t= his release. =A0 =A0=A0=A0 <<Next step>> =A0 As far as architecture is concerned, I think this is it. The feedback mechanism has been more or less integrated, though we're not using the info method at all. Claude -- your help in doing a review on this issue would be highly appreciated. =A0 Dhaval -- Have all the issues that you raised been fixed ? Annette -- Can you give your feedback on the HTMLRenderer and if it is useful for your project ? If anyone has any issues, please raise them now, or forever hold your peace.. =A0 =A0=A0=A0 <<Need Help>> =A0 In order that this may be a truly professional product, it would be highly appreciated, if the members of the user and developer list contribute a small portion of their time in finalizing this production release.=20 =A0 These are the areas where you can help : [1] Test the release and please report bugs WITH your names (pls sign in at sourceforge before u file your bug reports) [2] Check the javadocs - quality control - if anything is missing, please update, and check in. [3] Write articles - based on applications you have written, which we can put up for others to read. Articles could cover design areas, performance, scalability, etc.. [4] Be active on the htmlparser-user mailing list to help others in the community [5] Send a testimonial which we can put up to show that open source software really can achieve professional targets (send this to so...@ya...) =A0 Of course, pls do any of the above only if you have benefitted from this project in any way.=20 Thank you very much, and awaiting your feedback. =A0 Regards, Somik =20 |