Jericho HTML Parser is a simple but powerful java library allowing analysis and manipulation of parts of an HTML document, including some common server-side tags, while reproducing verbatim any unrecognised or invalid HTML. Also provides useful HTML form utilities.
Version 2.3 includes important bug fixes as well as some minor improvements.
Changes since version 2.2:
- Bug Fixes:
-  NullPointerException in Source.indent.
-  Incorrect detection of non-html element with nested
empty-element tag of same name.
-  Fault in caching mechanism.
- Source.fullSequentialParse() sometimes resulted in unregistered
tags being returned in tag searches.
- Invalid Empty-element tags whose name is in either of the sets
HTMLElements.getEndTagRequiredElementNames() were rejected by the
parser if the slash immediately follows the tag name.
- StartTag.tidy() only included a slash before the closing delimiter
of the tag if the tag name was in the set of
HTMLElements.getEndTagForbiddenElementNames(). It now includes the
slash for all tag names not in getEndTagOptionalElementNames().
- Source.fullSequentialParse() now clears the cache automatically
instead of throwing an IllegalStateException if the cache is not
- Changes to behaviour of Source.indent:
- preserves indenting in SCRIPT elements, server elements,
HTML comments and CDATA sections.
- keeps SCRIPT elements, HTML comments, XML declarations,
XML processing instructions and markup declarations inline.
- Minor documentation improvements.