HTML Parser Integration Release 1.5-20040613 available

This semi-regular integration build provides refactored classes to reduce component size and allow 'code to the interface' programming; almost. Additional filters for cascading style sheet selectors (CssSelectorNodeFilter) and regular expresssions (RegExFilter) have been added. Besides the bug fix for SCRIPT tags with apostrophes in comments, three enhancement requests have been implemented (of note, the parser now accepts gzip/deflate content encodings). The logo has also been updated.

Changes since Version 1.4
Configuration Management
Removed the need for the Translate class to be packaged with htmllexer.jar.
This results in a lighter weight component. Updated the logo and included
the LGPL license.
Obviated LinkProcessor and moved it's functionality to the Page class.
Added Tag, Text and Remark interfaces and moved concrete node
implementations to the nodes package, removing the lexer.nodes package.
Added CssSelectorNodeFilter and RegExFilter.

Enhancement Requests
943593 LinkProcessor.extract(link,base) weird behaviour?
943197 Accept gzip / deflate content encodings
874000 Remove specialized tag signatures from NodeVisitor

Bug Fixes
919738 Text has not been extracted correctly using StringBean
936392 ScriptTag visitor fails for comments with '

Posted by Derrick Oswald 2004-06-15

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks