[Htmlparser-developer] Integration Release 1.3-20020112 is out
Brought to you by:
derrickoswald
|
From: Somik R. <so...@ya...> - 2003-01-13 04:50:15
|
Hi Folks,
This week's integration release is out. This release has significant
contributions from Derrick Oswald and Josh Kerievsky. Derrick is building a
nice UI for the parser - and making tons of improvements. Thanks to Josh's
insight, we have done some major refactorings on the scanners - resulting in
a massive drop in code duplication. Here are some statistics - the scanners
package in the last release had 1693 lines of code. In the current release,
this has dropped to 1300 lines of code.
We have a new class HTMLCompositeTagScanner which does the hard-work for
picking up child tags. Most scanners use this code. HTMLTagScanner too does
some useful work- and from this release, new scanners dont need to override
evaluate() or scan(). Take a look at the refactored scanner code and you
might be surprised with its size and simplicity.
Here's the change log :
Integration build 1.3 - 20030112
--------------------------------
[1] Assume charset is correct for JVM's without Charset class to check it
[2] Beanize the parser
[3] Switch to swingui junit runner by default
[4] Half baked beans
[5] Fix javadoc warnings in JDK 1.4
[6] Added StringFindingVisitor + test code + new visitors packages
[7] Fixed bug 659723, but HTMLStringNode is not thread-safe anymore.
[8] JDK 1.2 compilability
[9] Modified HTMLEnumeration interface (made less verbose)
[10] Added HTMLCompositeTagScanner
[11] Refactored following scanners to use HTMLCompositeTagScanner :
(i) HTMLStyleScnner
(ii) HTMLSelectScanner
(iii) HTMLFrameSetScanner
(iv) HTMLTitleScanner
(v) HTMLTextAreaScanner
(vi) HTMLScriptScanner
(vii) HTMLFrameSetScanner
[12] Made StringNode the last parse attempt, so now Reader trys in this
order:
remark
tag
endtag
string
(this will return more HTMLStringNode objects than it did before).
[13] Improve speed by performing tag/string triage based on '<' as next
character.
[14] Refactored HTMLTagScanner. The following scanners use refactored code:
(i) HTMLBaseHREFScanner
(ii) HTMLDoctypeScanner
(iii) HTMLFrameScanner
(iv) HTMLJspScanner
(v) HTMLMetaTagScanner
Regards,
Somik
|