This project is ANDS (Australian National Data Service) funded and aims to provide a mechanism for researchers within the University of Western Australia to upload research datasets to a central petastore. It also aims to publish metadata about the datasets to RDA (Research Data Australia) so that the datasets can be discovered (and potentially shared) by searching or browsing through the ANDS website.
In order to accomplish the above functionality, this project used the following open...
The archive-crawler project is building Heritrix: a flexible, extensible, robust, and scalable web crawler capable of fetching, archiving, and analyzing the full diversity and breadth of internet-accesible content.
A configurable knowledge management framework. It works out of the box, but it's meant mainly as a framework to build complex information retrieval and analysis systems. The 3 major components: Crawler, Analyzer and Indexer can also be used separately.
A pure-Java compression library suitable as a drop-in replacement to current native implementations of java.util.zip. Typical scenarios where it is useful are applets, where access to native code is not allowed. It may also be useful on platforms where it