Share

Aperture

File Release Notes and Changelog

Release Name: 2006.1-alpha-3

Notes:
We are pleased to announce the third alpha release of the Aperture framework.
The most notable feature in this release is a new IcalCrawler. It works with 
iCal files generated by many calendaring applications (Apple iCal, Korganizer, 
Lotus Notes ...). It uses a ical-rdf mapping developed by the W3C Rdf 
Calendaring group. Apart from that there are numerous small improvements and 
bugfixes. The tutorial has been expanded with more code examples and UML 
diagrams to facilitate learning for new users.

This the last release before the switch to the RDF2Go framework. 
(The curious can already examine the RDF2Go branch in the cvs).

Aperture 2006.1-alpha-3 can be downloaded from here:
http://sourceforge.net/project/showfiles.php?group_id=150969&package_id=166878&release_id=460471

What's new in alpha-3?

- new IcalCrawler

- added MIME type detection for many formats:

- improved MIME type detection of MHTML files (web archives)

- introduced HtmlParserUtil, containing large parts of the HtmlExtractor
  implementation, as HTML (fragments) may occur in other document types
  as well (e.g. saved mails, see MimeExtractor)

- added ThreadedExtractorWrapper class, for catching and interrupting
  hanging Extractors

- added RepositoryAccessData, an AccessData implementation storing its
  information in a Repository

- added ability to specify a port number for an IMAP source

- set target platform to Java 5

Leo Sauermann
Christiaan Fluit
Gunnar Grimnes
Antoni Mylka 

Changes: ---------------------------------- version 2005.1-alpha-1, 2005-11-10 ---------------------------------- - First public release of the source code. ---------------------------------- version 2006.1-alpha-2, 2006-03-06 ---------------------------------- This releases adds a lot of crucial implementations and fixes to the previous release, which was mainly focused on establishing a number of core APIs. The latest release can be seen as the first release for practical, real world use. Due to the large number of changes and additions, we only list the most important functional changes: - Revised the RDFContainer interface and implementation. - Extended the number of Extractor implementations from 4 to 17, which adds support for the various MS Office file formats as well as a number of other formats. - Added the DataSource API and utility classes for configuring them. - Added the DataAccessor API and associated DataObject interfaces for accessing individual resources, with implementations for the "file", "http", "imap" and "outlook" schemes. - Added the Crawler API with implementations for file system crawling, web crawling, IMAP crawling (all stable) and Outlook crawling (alpha). - added the LinkExtractor API with an implementation for HTML documents, primarily meant to facilitate web crawling. - Added classes for dealing with non-validatable certificates when using a SSL connection. - Considerably extended documentation and example code. ---------------------------------- version 2006.1-alpha-3, 2006-11-02 ---------------------------------- The most notable feature of this release is the IcalCrawler. It can crawl files in the popular iCalendar format. Many calendaring applications use this format either natively, or via some export/import functionality. It uses the rdf-mapping developed by the w3c Rdf Calendaring group. (With some improvements - see javadoc). Additional improvements include: - added MIME type detection for many formats: - improved MIME type detection of MHTML files (web archives) - introduced HtmlParserUtil, containing large parts of the HtmlExtractor implementation, as HTML (fragments) may occur in other document types as well (e.g. saved mails, see MimeExtractor) - the maximumSize property is now a long instead of an int - added ThreadedExtractorWrapper class, for catching and interrupting hanging Extractors - added RepositoryAccessData, an AccessData implementation storing its information in a Repository - added ability to specify a port number for an IMAP source - set target platform to Java 5 updated dependencies: - HTMLParser 1.6 - Ical (from trunk, not the official release) - JavaMail 1.4 - POI 3.0 alpha 2 - PDFBox 0.7.3 (+ added FontBox, bcmail and bcprov, now required by PDFBox) bug fixes: - [ 1444917 ] PlainTextExtractor ignores ByteOrderMarks - [ 1444926 ] MagicMimeTypeIdentifier cannot handle text files with BOMs - [ 1445519 ] mails in non-western languages - [ 1445641 ] MimeExtractors cannot process MHTML files - [ 1445658 ] MimeExtractor should process HTML body parts - [ 1476150 ] ImapCrawler.getDataObject attempts to access a closed folder - [ 1480416 ] IOExceptions using ImapCrawler.getDataObject output - [ 1481111 ] XmlExtractor needs improved DTD handling - [ 1481132 ] OpenDocumentExtractor unable to load DTDs from jar files - [ 1481759 ] Unable to use IMAP over SSL with JavaMail 1.4 - [ 1558484 ] HttpAccessor should use timeouts on connection - [ 1567288 ] Incorrect link data by WebCrawler - many bugfixes in ImapCrawler's incremental crawling - same for WebCrawler and HttpAccessor - prevent NullPointerException in FileDataObjectBase.dispose - made sure ThreadedExtractorWrapper redirects all Exceptions of the wrapped Extractor