1. Summary
  2. Files
  3. Support
  4. Report Spam
  5. Create account
  6. Log in


Goals of the Project

Aperture is an open source library for crawling and indexing information sources such as file systems, websites and mail boxes. Aperture supports a number of common source types and document formats out-of-the-box and provides easy ways to extend it with custom implementations.

The Aperture code consists of a number of related but independently usable parts:

  • Crawling of information sources: file systems, websites, mail boxes
  • MIME type identification
  • Full-text and metadata extraction of various file formats
  • Opening of crawled resources

For each of these parts, a set of APIs has been developed and a number of implementations is provided.

Aperture has a strong focus on semantics. For example, a lot of effort is made in order to let the Extractors also extract as much metadata contained in the file formats as possible (e.g., titles, authors, comments, ...) and combine this with source-specific metadata (e.g., location, last modification date, ...). All metadata is mapped to properties from the NIE namespace to allow uniform processing of the crawled and extracted information. NIE is an ontology developed within the Nepomuk Project and now maintained by an open source community. See the website of the development project on Sourceforge, the website of the organization, the website of the ontology

Aperture is responsible for extraction, it doesn't try to tell you what to do with the data. You may want to store it, index it, query it, or simply grab the full text and print it. Code snippets within this wiki will show you some examples what can be done with the data. How to make it available for your application with the RDF APIs and query languages such as SPARQL or SeRQL.

Aperture Web Demo

Try out Aperture live:
http://www.dfki.uni-kl.de/ApertureWebProject/ - website where you can try out extractors (note that it works with a little outdated version of aperture, better take a look at the [ApertureExampleApplications] to see how you can give the newest Aperture a spin.

Extensibility and Plugins

Aperture consists of many parts that can be used as plugins - it can be extended and limited very easily.
The core plugins are contained in the distributions, besides them, there are contributions and the web project.

Aperture Team

Who is contributing to Aperture? What are the people behind?

Historical Background

Aperture started as a cooperation between the German Research Center for Artificial Intelligence (http://www.dfki.de) and the Dutch software company Aduna (http://www.aduna-software.com/).

Both organizations had already produced software sharing certain characteristics, such as targeting desktop search using Semantic Web technology. Therefore, they necessarily had to solve the same technical problems, like incremental crawling of a file system, text and metadata extraction and indexing and querying of metadata.

This made them realize that through cooperation on these issues they could get better code at lower individual efforts. Furthermore, this would enable other people to contribute as well.

In the summer of 2005 they decided to start a joint open source project that would be bootstrapped with the crawling and indexing code already developed in-house and that would serve as a basis for future development.


The Aperture code is published under a permissive BSD-like open source license that allows the use of Aperture in proprietary applications.

See the License distributed with the library for more details.

Old licensing policy (releases up to 1.2.0)

The Aperture code is published under an open source license (AFL 3.0 for APIs and example code, OSL 3.0 for API implementations) that allows the use of Aperture in proprietary applications.
In short, in order to use Aperture your own code does not have to be open source, with one exception: if you create a modified version of an Aperture API implementation (i.e., you base it on a file published under the OSL license), then you are required to publish that part under the same license.
For example, when you create an improved version of the Excel text and metadata extractor using the Aperture implementation as a starting point, then this derivative code must also be distributed under the OSL. Your own implementations of Aperture APIs are not required to be open sourced, nor is any code using Aperture components.

See the License distributed with the library for more details.