Download Latest Version jericho-html-3.4.zip (2.9 MB)
Email in envelope

Get an email when there's a new version of Jericho HTML Parser

Name Modified Size InfoDownloads / Week
Parent folder
readme.txt 2012-10-30 2.6 kB
jericho-html-3.3.zip 2012-10-30 2.9 MB
Totals: 2 Items   2.9 MB 0
Release Notes:
Version 3.3 includes important bug fixes and various enhancements.

Change Log:
- Bug Fixes:
  - [3581664] CharacterReference.decode() does not decode entities
    containing digits - ½ ¼ ¾ ¹ ² ³
    ∴
  - [3311286] SourceCompactor does not respect TEXTAREA
  - [3519131] Renderer output incorrect when constructed with an
    Element object.
  - [3538829] Renderer output of font decoration on block boundaries
    incorrect.
  - Segment.getAllStartTags(name) and Segment.getFirstElement(name)
    do not work if the argument contains upper case characters.
  - The end delimiter of a common server tag inside an escaped server
    tag is falsely recognised as the end delimiter of the escaped tag.
- CHANGES THAT COULD AFFECT THE BEHAVIOUR OF EXISTING PROGRAMS:
  - [3427073] Segment.getStyleURISegments() now includes style element
    content as well as style attribute values.
  - [3427927] Segment.getURIAttributes() now includes the archive
    attributes of object and applet elements.
  - Comments no longer recognised inside script elements during full
    sequential parse. Previously they were recognised for compatibility
    with major browsers but modern browser behaviour has changed.
  - Changed the log level of all parsing errors from INFO to ERROR, and
    the log level of the Source.fullSequentialParse() advisory message
    from WARN to INFO. The previous levels gave the advisory message a
    higher severity than the parsing errors, preventing logging systems
    from hiding the advisory message while showing parsing errors.
    Character encoding warnings remain unchanged at WARN level.
  - Changed the behaviour of the Renderer.renderHyperlinkURL(StartTag)
    method so that relative URLs are not rendered.
  - Changed the behaviour of the Renderer so that hyperlink element
    content is not rendered if it is the same as the hyperlink URL,
    ignoring any http:// prefix or / suffix.
  - EndTag.tidy() now removes whitespace before the closing bracket.
- Added Source(File) constructor.
- Added OutputDocument.getSegment() method.
- Added OutputDocument.remove(int begin, int end) method.
- Added Renderer.setHRLineLength() method.
- Added RenderToText.jsp webapp sample.
- Added Segment.getRowColumnVector() method.
- Encoding detection now ignores common encodings specified in meta tags
  that have a code unit size incompatible with the preliminary encoding.
- Upgraded to the following logger APIs:
  slf4j-api-1.7.2, log4j-1.2.17
Source: readme.txt, updated 2012-10-30