OpenIMAJ: The Open toolkit for Intelligent Multimedia Analysis in Java.
OpenIMAJ contains a large collection of pure-Java classes for analysing multimedia documents, from tools for extracting image features, to tools for analysing webpages.
Webpages do not offer reliable metadata concerning their creation date and time. However, getting the document creation time is a necessary step for allowing to apply temporal normalization systems to webpages. DCTFinder is a system that parses a web page and extracts from its content the title and the creation date of this web page.
Data mining tool for sequences (e.g. trajectories on a map, visited webpages, etc.) that creates a succinct description of the sequences, given a taxonomy (e.g. regions and sub-regions in the map, categories and sub-categories of pages, etc.).
Crawl-By-Example runs a crawl, which classifies the processed pages by subjects and finds the best pages according to examples provided by the operator. Crawl-By-Example is a plugin to the Heritrix crawler, and was done as a part of GSoC06 program.