Jericho HTML Parser 1.2 released

Jericho HTML Parser is a simple but powerful java HTML parser library allowing analysis and manipulation of HTML documents. Version 1.2 introduces the recognition of common server-side tags such as ASP, JSP, PSP, PHP and Mason. Various other performance and usability improvements are also included.

Change Log:

- Deprecated public fields in Attribute class in favour of accessor methods
- Following methods return empty list instead of null if no result:
(WARNING - This could possibly break existing programs)
Segment.findAllStartTags(String name)
Segment.findAllElements(String name)
- Added hashCode() method to Segment class
- Server tags such as ASP, JSP, PSP, PHP and Mason are now recognised
- Basic parser logging introduced (see Source.setLogWriter() method)
- Start tags with too many badly formed attributes rejected
(reduces number of false positives when searching for start tags)
- Added public IOutputSegment.COMPARATOR field
- Improved caching

A demonstration source file making use of the new features can be found at the following URL:

For more details see the javadocs:

Posted by Martin Jericho 2004-06-16