RE: [Htmlparser-developer] HTMLParser Candidate Release 1 is out

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Somik,
=A0
My program compiles perfectly with the new version of HTMLParser, hence
ensuring that it is backward compatible to the extent that I have used
it. However when I ran my program with it, I got an=A0error. This has
happened since earlier the HTMLFormScanner was not being registered by
default whereas now it is happening. Once I remove this scanner, once
again my program starts working :)
=A0
As for my bugs, I had reported last time that in case there were any
HTML special characters(like '<' etc.)=A0within comments, there were
errors in parsing. I think this specifically applies to HTML comments
within <SCRIPT> tags which have HTML special characters. I think this is
happening since the <SCRIPT> tag is parsed using HTMLScriptScanner and
somehow the comments are not being parsed correctly. (Frankly, I don't
know why it is happening but I do know that the code below is being
parsed primarily by HTMLScriptScanner and not by HTMLRemarkNode which in
turn shoudl happen within HTMScriptScanner probably).
=A0
For e.g.=20
=A0
<SCRIPT Language=3D"JavaScript">
<!--
function validateForm()
{
=A0var i =3D 10;
=A0if(i < 5)
=A0i =3D i - 1 ;=20
=A0return true;
}
// -->
</SCRIPT>
=A0
gets parsed as
=A0
<SCRIPT Language=3D"JavaScript">
=A0if(i < 5)
=A0i =3D i - 1 ;=20
=A0return true;
}
// -->
</SCRIPT>
=A0
=A0
I think special care needs to be taken for HTML commetns which enclose
JavaScript since the ending comment has to be of the form:
// --> (the first two characters specifying that teh characters on the
line are comments in JavaScript)
=A0
At present the ending JavaScript comments appear as
//=20
-->
=A0
which would be an incorrect way to end HTML comments enclosing
JavaScript code.
=A0
I have added the same as a bug report as well. Urge you to check it out.
=A0
Some points I noted while scanning through the javadocs:
1. HTMLFormTag needs documentation so to does HTMLFormScanner.
2. HTMLParser::registerScanners() - HTMLFormScanner is now registered by
default. (documentation needs to be updated)
3. HTMLFormScanner - Does not have a no-args constructor like the other
scanners.
=A0

Regards,=20

Dhaval Udani=20
Senior Analyst=20
M-Line, QPEG=20
OrbiTech Solutions Ltd.=20
+91-22-8290019 Extn. 1457=20

=A0

   -----Original Message-----
   From: somik [mailto:so...@ya...]
   Sent: Thursday, October 31, 2002 5:57 PM
   To: htmlparser-developer
   Cc: somik; htmlparser-user
   Subject: [Htmlparser-developer] HTMLParser Candidate Release 1 is out
  =20
  =20

  =20
   Hi Folks,
  =20
   =A0=A0=A0 HTMLParser 20021031 (C1) is out. This is candidate release 1=
. If
   there are no issues, then this will become a production release.
   =A0=A0=A0=20
   =A0=A0=A0 There are bug fixes in this release, and some improvements. =
Most
   important improvement - allowing renderers to be plugged in so as to
   allow customization of functionality of toHTML(). Check the javadoc
   of com.kizna.html.HTMLNode. This has been a repeating request, to be
   able to modify the output of toHTML, especially for designers of web
   crawlers who want to change the link before saving it.
   =A0
   =A0=A0=A0 Thanks to Kaarle Kaila for the bug fix in HTMLParameterParse=
r.
   Thanks to Domenico Lordi for improvements to HTMLLinkScanner and
   HTMLLinkTag.=20
   =A0=A0=A0=20
   =A0=A0=A0 Here is the change log :
   Integration Build 1.2 - 20021031

   [1] Changed string creation to static strings in HTMLTagParser
   [2] HTMLLinkProcessor can handle urls beginning with file:// (bug fix
   - 629601)
   [3] All scanners get the feedback object initialized from HTMLParser
   [4] Fixed bug 624045 (in HTMLParameterParser) - erroneous space key
   removed
   [5] Added HTMLRenderer and external rendering support in HTMLNode.
   [6] Line no and details incorporated for feedback and exceptions
   [7] HTMLLinkProcessor: "javascript:" recognition
   [8] HTMLLinkScanner: added flags for javascript, ftp, http, https
   [9] HTMLLinkTag: constructor for new flags, methods isJavascriptLink,
   setJavascriptLink, etc...

   =A0=A0=A0 Please visit http://htmlparser.sourceforge.net to download t=
his
   release.
   =A0
   =A0=A0=A0 <<Next step>>
   =A0
   As far as architecture is concerned, I think this is it. The feedback
   mechanism has been more or less integrated, though we're not using
   the info method at all. Claude -- your help in doing a review on this
   issue would be highly appreciated.
   =A0
   Dhaval -- Have all the issues that you raised been fixed ?
   Annette -- Can you give your feedback on the HTMLRenderer and if it
   is useful for your project ?
   If anyone has any issues, please raise them now, or forever hold your
   peace..
   =A0
   =A0=A0=A0 <<Need Help>>
   =A0
   In order that this may be a truly professional product, it would be
   highly appreciated, if the members of the user and developer list
   contribute a small portion of their time in finalizing this
   production release.=20
   =A0
   These are the areas where you can help :
   [1] Test the release and please report bugs WITH your names (pls sign
   in at sourceforge before u file your bug reports)
   [2] Check the javadocs - quality control - if anything is missing,
   please update, and check in.
   [3] Write articles - based on applications you have written, which we
   can put up for others to read. Articles could cover design areas,
   performance, scalability, etc..
   [4] Be active on the htmlparser-user mailing list to help others in
   the community
   [5] Send a testimonial which we can put up to show that open source
   software really can achieve professional targets (send this to
   so...@ya...)
   =A0
   Of course, pls do any of the above only if you have benefitted from
   this project in any way.=20
   Thank you very much, and awaiting your feedback.
   =A0
   Regards,
   Somik

  =20