Menu

Htmlcleaner 2.24 is here!

Bug fixes:

  • 220 Information is lost in case of double escaping in attributes
  • 219 H3 closing tag incorrectly placed
  • 217 StackOverflow in DomSerializer
  • 216 elementNames(org.htmlcleaner.HtmlCleanerTest) test failure

A new serializer, TraversalDomSerializer, has been added. This is an experimental serializer that currently creates output that is not exactly the same as the regular DomSerializer, but may be useful where you need to reduce the memory footprint of HtmlCleaner for processing extremely large pages.

NOTE: As a side effect of implementing the new serializer, some lower-level interfaces (e.g. HtmlToken, BaseToken) have had to be refactored. This may affect some existing integrations if you interact with Token-level APIs - take care when upgrading to this version if you do.

Note that for issue 220 a change has been made to attribute processing: entities in attributes are normalised and escaped on serialisation rather than in TagNode. For example, double-escaped entities will appear when querying the TagNode interface, but will be normalised into standard HTML format using the serialiser.

Thanks to Simon Urli, Daniel Gonzalez, Sam Hutchins, and niol for their help with this release.

Posted by Scott Wilson 2020-04-29

Log in to post a comment.