The archive-crawler project is building Heritrix: a flexible, extensible, robust, and scalable web crawler capable of fetching, archiving, and analyzing the full diversity and breadth of internet-accesible content.
This project defines the Simple API for Binary REpresentations (SABRE) for processing hierarchically structured, binary-oriented documents, comparable to the Simple API for XML (SAX). The library is e.g. used in the Java ISO Image Creator (JIIC).