Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo


HtmlCleaner / News: Recent posts

HtmlCleaner release 2.2

New version brings most of required features and number of bug fixes. HtmlCleaner is now thread-safe, it introduces html-based serializers, API is extended to ease document manipulation. Parser is about 20% faster and now it runs on Java 1.5+, benefiting from language improvements.

Posted by Vladimir Nikic 2010-12-28

HtmlCleaner release 2.1

- Parsing transformations are developed in order to easily skip or change specified tags or attributes during the cleanup process.
- Few more constructors added in class HtmlCleaner giving possibility to reuse same cleaner properties with multiple cleaner instances.
- Code cleanup.

Posted by Vladimir Nikic 2008-09-02

Web site redesigned

Together with new milestone version 2.0, project web site is complitely redesigned giving better look and better organized information.
<a href="http://htmlcleaner.sourceforge.net/">Go to HtmlCleaner web site</a>

Posted by Vladimir Nikic 2008-07-15

HtmlCleaner release 2.0

New version comes with a number of improvements and fixes. Some of them are:

- Complete code refactoring, making the Cleaner's API better and more flexible.
- Methods for DOM manipulation added.
- Basic XPath support added.
- New parameters introduced to control cleaner's behavior.

Posted by Vladimir Nikic 2008-07-15

HtmlCleaner release 1.6

- New flag parameter ignoreQuestAndExclam is introduced offering control over special tags - <?TAGNAME....>, <!TAGNAME....>.
- Bug fixes.

Posted by Vladimir Nikic 2007-12-26

HtmlCleaner release 1.55

- Added Reader based HtmlCleaner constructors.
- New parameter pruneTags is introduced offering a way to remove undesired tags with all the children from XML tree after parsing and cleaning.
- Bug fixes.

Posted by Vladimir Nikic 2007-09-27

HtmlCleaner release 1.5

- Several bug fixes.
- Added option to escape XML content in DOM serializer - HtmlCleaner.createDOM(boolean escapeXml)

Posted by Vladimir Nikic 2007-09-08

HtmlCleaner release 1.4

- New flag allowHtmlInsideAttributes is introduced in order to give the parser flexibility in handling attribute values.
- Several bug fixes.

Posted by Vladimir Nikic 2007-08-24

HtmlCleaner release 1.3

* New browser-compact serializer added, that preserves single whitespace where multiple occure.
* New flag namespacesAware is introduced in order to control namespace prefixes and namespace declarations. It should be used instead of omitXmlnsAttributes that existed in previous versions and had limited functionality.
* New flag allowMultiWordAttributes is introduced giving HtmlCleaner's parser flexibility to (dis)allow tag attributes consisting of multiple words.
* New flag useEmptyElementTags is introduced in order to controll output of tags with empty body
(<xxx/> vs <xxx></xxx>).
* Several bug fixes.

Posted by Vladimir Nikic 2007-07-12

HtmlCleaner release 1.2

- Several bugs fixed.
- New flags added to control behaviour of unknown/deprecated tags.
- New flag added to optionally remove HTML envelope from resulting XML.
- JDOM serializer added.

Posted by Vladimir Nikic 2007-05-05

SVN support added

Posted by Vladimir Nikic 2007-04-16

HtmlCleaner release 1.13

Serialization of XML to Java DOM supported with createDOM() method of HtmlCleaner class.

Posted by Vladimir Nikic 2007-04-13

HtmlCleaner release 1.12

Hexadecimal entities escaping supported (i.e. &#x09;).

Posted by Vladimir Nikic 2007-01-28

HtmlCleaner release 1.1

- Compact XML serializer improved.
- Minor XML escaping bug fixed.

Posted by Vladimir Nikic 2007-01-11

HtmlCleaner v1.0.5 released

- A html tokenizing bug fixed.
- Methods of the class TagNode made public in order to enable creating custom XML serializers.
- Method writeXml(XmlSerializer) added to HtmlCleaner class in order to support creating custom XML serializers.

Posted by Vladimir Nikic 2007-01-02

HtmlCleaner version 1.0 released

Minor bug in advanced XML escaping fixed.

Posted by Vladimir Nikic 2006-12-23

HtmlCleaner version 0.9 released

- HtmlCleaner Ant task added
- XML compact serializer added - stripps all unneeded whitespaces from the result
- Few minor bugs fixed

Posted by Vladimir Nikic 2006-12-05

Initial version of HtmlCleaner released

HtmlCleaner is open-source HTML parser written in Java. For specified HTML it prooduces well-formed XML.

Posted by Vladimir Nikic 2006-11-27