[Htmlparser-developer] Integration Release 1.3-20020112 is out

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Folks,
    This week's integration release is out. This release has significant
contributions from Derrick Oswald and Josh Kerievsky. Derrick is building a
nice UI for the parser - and making tons of improvements. Thanks to Josh's
insight, we have done some major refactorings on the scanners - resulting in
a massive drop in code duplication. Here are some statistics - the scanners
package in the last release had 1693 lines of code. In the current release,
this has dropped to 1300 lines of code.

We have a new class HTMLCompositeTagScanner which does the hard-work for
picking up child tags. Most scanners use this code. HTMLTagScanner too does
some useful work-  and from this release, new scanners dont need to override
evaluate() or scan(). Take a look at the refactored scanner code and you
might be surprised with its size and simplicity.

    Here's the change log :

Integration build 1.3 - 20030112
--------------------------------
[1] Assume charset is correct for JVM's without Charset class to check it
[2] Beanize the parser
[3] Switch to swingui junit runner by default
[4] Half baked beans
[5] Fix javadoc warnings in JDK 1.4
[6] Added StringFindingVisitor + test code + new visitors packages
[7] Fixed bug 659723, but HTMLStringNode is not thread-safe anymore.
[8] JDK 1.2 compilability
[9] Modified HTMLEnumeration interface (made less verbose)
[10] Added HTMLCompositeTagScanner
[11] Refactored following scanners to use HTMLCompositeTagScanner :
    (i) HTMLStyleScnner
    (ii) HTMLSelectScanner
    (iii) HTMLFrameSetScanner
    (iv) HTMLTitleScanner
    (v) HTMLTextAreaScanner
    (vi) HTMLScriptScanner
    (vii) HTMLFrameSetScanner
[12] Made StringNode the last parse attempt, so now Reader trys in this
order:
remark
tag
endtag
string
(this will return more HTMLStringNode objects than it did before).
[13] Improve speed by performing tag/string triage based on '<' as next
character.
[14] Refactored HTMLTagScanner. The following scanners use refactored code:
    (i) HTMLBaseHREFScanner
    (ii) HTMLDoctypeScanner
    (iii) HTMLFrameScanner
    (iv) HTMLJspScanner
    (v) HTMLMetaTagScanner

Regards,
Somik