HtmlCleaner release 2.14 is here

HtmlCleaner release 2.14

This contains the following bug fixes:

149 StackOverflowError
148 Giving mixed-case filenames doesn't work on case-sensitive filesystems
147 Correction of ul structure
146 2.13 does not correct table structure
144 elements such as meta and link are removed
140 CRITICAL: endless loop in some tags (ref #129, #126)
139 option tag displayed after optgroup
136 ClassCastException... read more

Posted by Scott Wilson 2015-08-24

HtmlCleaner 2.12 is out!

What another release already??

Well, a big thanks to Wolfgang Koppenberger who spotted a problem in 2.11 with OPTION tags which needed fixing and releasing right away.

Apologies to anyone using 2.11 who encountered that issue.

Posted by Scott Wilson 2015-05-15

HtmlCleaner 2.11 released

Adds much better HTML5 support, pipelining of HTML from stdin (and XML to stdout), and more!

Here's the changelog:

  • Feature 19: Support use of stdin and stdout for pipes on command line
  • Feature 10: Make OSGI-compatible bundle
  • Feature 15: Improved HTML5 support
  • Fixed issue 135: Some pages cause two different NullPointerExceptions
  • Fixed issue 134: Some pages cause IndexOutOfBoundsException
  • Fixed issue 133: Some pages cause NullPointerException
  • Fixed issue 132: ClassCastException: ArrayList cannot be cast to org.htmlcleaner.BaseToken... read more
Posted by Scott Wilson 2015-05-12

HtmlCleaner release 2.2

New version brings most of required features and number of bug fixes. HtmlCleaner is now thread-safe, it introduces html-based serializers, API is extended to ease document manipulation. Parser is about 20% faster and now it runs on Java 1.5+, benefiting from language improvements.

Posted by Vladimir Nikic 2010-12-28

HtmlCleaner release 2.1

- Parsing transformations are developed in order to easily skip or change specified tags or attributes during the cleanup process.
- Few more constructors added in class HtmlCleaner giving possibility to reuse same cleaner properties with multiple cleaner instances.
- Code cleanup.

Posted by Vladimir Nikic 2008-09-02

Web site redesigned

Together with new milestone version 2.0, project web site is complitely redesigned giving better look and better organized information.
<a href="">Go to HtmlCleaner web site</a>

Posted by Vladimir Nikic 2008-07-15

HtmlCleaner release 2.0

New version comes with a number of improvements and fixes. Some of them are:

- Complete code refactoring, making the Cleaner's API better and more flexible.
- Methods for DOM manipulation added.
- Basic XPath support added.
- New parameters introduced to control cleaner's behavior.

Posted by Vladimir Nikic 2008-07-15

HtmlCleaner release 1.6

- New flag parameter ignoreQuestAndExclam is introduced offering control over special tags - <?TAGNAME....>, <!TAGNAME....>.
- Bug fixes.

Posted by Vladimir Nikic 2007-12-26

HtmlCleaner release 1.55

- Added Reader based HtmlCleaner constructors.
- New parameter pruneTags is introduced offering a way to remove undesired tags with all the children from XML tree after parsing and cleaning.
- Bug fixes.

Posted by Vladimir Nikic 2007-09-27

HtmlCleaner release 1.5

- Several bug fixes.
- Added option to escape XML content in DOM serializer - HtmlCleaner.createDOM(boolean escapeXml)

Posted by Vladimir Nikic 2007-09-08

HtmlCleaner release 1.4

- New flag allowHtmlInsideAttributes is introduced in order to give the parser flexibility in handling attribute values.
- Several bug fixes.

Posted by Vladimir Nikic 2007-08-24

HtmlCleaner release 1.3

* New browser-compact serializer added, that preserves single whitespace where multiple occure.
* New flag namespacesAware is introduced in order to control namespace prefixes and namespace declarations. It should be used instead of omitXmlnsAttributes that existed in previous versions and had limited functionality.
* New flag allowMultiWordAttributes is introduced giving HtmlCleaner's parser flexibility to (dis)allow tag attributes consisting of multiple words.
* New flag useEmptyElementTags is introduced in order to controll output of tags with empty body
(<xxx/> vs <xxx></xxx>).
* Several bug fixes.

Posted by Vladimir Nikic 2007-07-12

HtmlCleaner release 1.2

- Several bugs fixed.
- New flags added to control behaviour of unknown/deprecated tags.
- New flag added to optionally remove HTML envelope from resulting XML.
- JDOM serializer added.

Posted by Vladimir Nikic 2007-05-05

SVN support added

Posted by Vladimir Nikic 2007-04-16

HtmlCleaner release 1.13

Serialization of XML to Java DOM supported with createDOM() method of HtmlCleaner class.

Posted by Vladimir Nikic 2007-04-13

HtmlCleaner release 1.12

Hexadecimal entities escaping supported (i.e. &#x09;).

Posted by Vladimir Nikic 2007-01-28

HtmlCleaner release 1.1

- Compact XML serializer improved.
- Minor XML escaping bug fixed.

Posted by Vladimir Nikic 2007-01-11

HtmlCleaner v1.0.5 released

- A html tokenizing bug fixed.
- Methods of the class TagNode made public in order to enable creating custom XML serializers.
- Method writeXml(XmlSerializer) added to HtmlCleaner class in order to support creating custom XML serializers.

Posted by Vladimir Nikic 2007-01-02

HtmlCleaner version 1.0 released

Minor bug in advanced XML escaping fixed.

Posted by Vladimir Nikic 2006-12-23

HtmlCleaner version 0.9 released

- HtmlCleaner Ant task added
- XML compact serializer added - stripps all unneeded whitespaces from the result
- Few minor bugs fixed

Posted by Vladimir Nikic 2006-12-05

Initial version of HtmlCleaner released

HtmlCleaner is open-source HTML parser written in Java. For specified HTML it prooduces well-formed XML.

Posted by Vladimir Nikic 2006-11-27