RE: [Htmlparser-developer] HTMLParser Candidate Release 1 is out
Brought to you by:
derrickoswald
|
From: <dha...@or...> - 2002-11-05 13:31:34
|
Hi Somik,
=A0
My program compiles perfectly with the new version of HTMLParser, hence
ensuring that it is backward compatible to the extent that I have used
it. However when I ran my program with it, I got an=A0error. This has
happened since earlier the HTMLFormScanner was not being registered by
default whereas now it is happening. Once I remove this scanner, once
again my program starts working :)
=A0
As for my bugs, I had reported last time that in case there were any
HTML special characters(like '<' etc.)=A0within comments, there were
errors in parsing. I think this specifically applies to HTML comments
within <SCRIPT> tags which have HTML special characters. I think this is
happening since the <SCRIPT> tag is parsed using HTMLScriptScanner and
somehow the comments are not being parsed correctly. (Frankly, I don't
know why it is happening but I do know that the code below is being
parsed primarily by HTMLScriptScanner and not by HTMLRemarkNode which in
turn shoudl happen within HTMScriptScanner probably).
=A0
For e.g.=20
=A0
<SCRIPT Language=3D"JavaScript">
<!--
function validateForm()
{
=A0var i =3D 10;
=A0if(i < 5)
=A0i =3D i - 1 ;=20
=A0return true;
}
// -->
</SCRIPT>
=A0
gets parsed as
=A0
<SCRIPT Language=3D"JavaScript">
=A0if(i < 5)
=A0i =3D i - 1 ;=20
=A0return true;
}
// -->
</SCRIPT>
=A0
=A0
I think special care needs to be taken for HTML commetns which enclose
JavaScript since the ending comment has to be of the form:
// --> (the first two characters specifying that teh characters on the
line are comments in JavaScript)
=A0
At present the ending JavaScript comments appear as
//=20
-->
=A0
which would be an incorrect way to end HTML comments enclosing
JavaScript code.
=A0
I have added the same as a bug report as well. Urge you to check it out.
=A0
Some points I noted while scanning through the javadocs:
1. HTMLFormTag needs documentation so to does HTMLFormScanner.
2. HTMLParser::registerScanners() - HTMLFormScanner is now registered by
default. (documentation needs to be updated)
3. HTMLFormScanner - Does not have a no-args constructor like the other
scanners.
=A0
Regards,=20
Dhaval Udani=20
Senior Analyst=20
M-Line, QPEG=20
OrbiTech Solutions Ltd.=20
+91-22-8290019 Extn. 1457=20
=A0
-----Original Message-----
From: somik [mailto:so...@ya...]
Sent: Thursday, October 31, 2002 5:57 PM
To: htmlparser-developer
Cc: somik; htmlparser-user
Subject: [Htmlparser-developer] HTMLParser Candidate Release 1 is out
=20
=20
=20
Hi Folks,
=20
=A0=A0=A0 HTMLParser 20021031 (C1) is out. This is candidate release 1=
. If
there are no issues, then this will become a production release.
=A0=A0=A0=20
=A0=A0=A0 There are bug fixes in this release, and some improvements. =
Most
important improvement - allowing renderers to be plugged in so as to
allow customization of functionality of toHTML(). Check the javadoc
of com.kizna.html.HTMLNode. This has been a repeating request, to be
able to modify the output of toHTML, especially for designers of web
crawlers who want to change the link before saving it.
=A0
=A0=A0=A0 Thanks to Kaarle Kaila for the bug fix in HTMLParameterParse=
r.
Thanks to Domenico Lordi for improvements to HTMLLinkScanner and
HTMLLinkTag.=20
=A0=A0=A0=20
=A0=A0=A0 Here is the change log :
Integration Build 1.2 - 20021031
[1] Changed string creation to static strings in HTMLTagParser
[2] HTMLLinkProcessor can handle urls beginning with file:// (bug fix
- 629601)
[3] All scanners get the feedback object initialized from HTMLParser
[4] Fixed bug 624045 (in HTMLParameterParser) - erroneous space key
removed
[5] Added HTMLRenderer and external rendering support in HTMLNode.
[6] Line no and details incorporated for feedback and exceptions
[7] HTMLLinkProcessor: "javascript:" recognition
[8] HTMLLinkScanner: added flags for javascript, ftp, http, https
[9] HTMLLinkTag: constructor for new flags, methods isJavascriptLink,
setJavascriptLink, etc...
=A0=A0=A0 Please visit http://htmlparser.sourceforge.net to download t=
his
release.
=A0
=A0=A0=A0 <<Next step>>
=A0
As far as architecture is concerned, I think this is it. The feedback
mechanism has been more or less integrated, though we're not using
the info method at all. Claude -- your help in doing a review on this
issue would be highly appreciated.
=A0
Dhaval -- Have all the issues that you raised been fixed ?
Annette -- Can you give your feedback on the HTMLRenderer and if it
is useful for your project ?
If anyone has any issues, please raise them now, or forever hold your
peace..
=A0
=A0=A0=A0 <<Need Help>>
=A0
In order that this may be a truly professional product, it would be
highly appreciated, if the members of the user and developer list
contribute a small portion of their time in finalizing this
production release.=20
=A0
These are the areas where you can help :
[1] Test the release and please report bugs WITH your names (pls sign
in at sourceforge before u file your bug reports)
[2] Check the javadocs - quality control - if anything is missing,
please update, and check in.
[3] Write articles - based on applications you have written, which we
can put up for others to read. Articles could cover design areas,
performance, scalability, etc..
[4] Be active on the htmlparser-user mailing list to help others in
the community
[5] Send a testimonial which we can put up to show that open source
software really can achieve professional targets (send this to
so...@ya...)
=A0
Of course, pls do any of the above only if you have benefitted from
this project in any way.=20
Thank you very much, and awaiting your feedback.
=A0
Regards,
Somik
=20
|