Download Latest Version jericho-html-3.4.zip (2.9 MB)
Email in envelope

Get an email when there's a new version of Jericho HTML Parser

Name Modified Size InfoDownloads / Week
Parent folder
jericho-html-3.2.zip 2011-04-20 2.4 MB
readme.txt 2011-03-05 2.1 kB
Totals: 2 Items   2.4 MB 4
Release Notes:
Version 3.2 includes important bug fixes and various enhancements including HTML5 support.

Change Log:
- Bug Fixes:
  - [2826979] IllegalCharsetNameException thrown when illegal encoding
    specified in the document.
  - [2837434] Potential multithreading bug in Source.getNewLine()
  - [3036182] NullPointerException when run with stringent java.policy
  - TextExtractor did not include any attribute values.
  - All unterminated character references were decoded regardless of the
    configuration settings (bug introduced in 3.1).
  - Renderer class - <div> under <li> resulted in new line.
  - SourceFormatter did not handle TEXTAREA elements correctly.
  - No exceptions thrown if invalid charset is specified by server or in
    source document.
  - Byte order mark character was included in the source document.
- HTML5 elements added to HTMLElementName and HTMLElements classes.
- Detects HTML5 character encoding declaration.
- Uses Windows-1252 as the default 8-bit encoding when available instead
  of the subset encoding ISO-8859-1.
- Added Renderer.setIncludeAlternateText(boolean) method.
- Added Renderer.renderAlternateText(StartTag) method.
- Added Renderer.setIncludeFirstElementTopMargin(boolean) method.
- Added Renderer.setDefaultTopMargin(String,int) static method.
- Added Renderer.setDefaultBottomMargin(String,int) static method.
- Added Renderer.setDefaultIndent(String,boolean) static method.
- Renderer now evaluates inline styles for top, bottom and left margins.
- Added Attribute.getStartTag() method.
- Added Segment.getURIAttributes() method.
- Added Segment.getStyleURISegments() method.
- Added deregister() methods to the extended tag type classes.
- Added MicrosoftConditionalCommentTagTypes class.
- Added StartTagType.SERVER_COMMON_COMMENT tag type.
- SourceFormatter now inlines DOCTYPE tags.
- Added Segment.getMaxDepthIndicator() method.
- Added static Config.IsHTMLEmptyElementTagRecognised parameter.
- Deprecated MicrosoftTagTypes class.
- Upgraded to the following logger APIs:
  slf4j-api-1.6.1, log4j-1.2.16
Source: readme.txt, updated 2011-03-05