We are building a similar system, using among others Aperture to crawl data, then create SOLr documents and inject them.
Our suggestions to a Aperture + SOLr design (sorry if stating the obvious);
Don’t make Aperture depend directly on SOLr. Instead make a ‘pipe’ similar to the UIMA framework. Typically you want to extract the data using Aperture, then transform / consolidate it in your own processes, temporary buffer (to load balance), and then submit it. Define a standard interface to the pipe and allow flexible configuration of the chain. Possibly Apache Camel could be used for this (even tough we implemented our own pipe).
This also supports integration with not only SOLr, but also semantic stores such as Virtuoso or Sesame. At the end of the pipe a serializer can turn the generic document into either a SOLr or a RDF/OWL document. It may seem weird to take a RDF graph created by Aperture, turn it into a flat, abstract model, and then converting it back to RDF/OWL again, but consider that not everybody things the NEPOMUK model ideal.
From: Leo Sauermann [mailto:firstname.lastname@example.org]
Sent: Donnerstag, 20. August 2009 19:36
To: Darren Govoni
Cc: Aperture Devel
Subject: Re: [Aperture-devel] SOLR integration - excellent! we need a name...
I would reckon you write a SOLRCrawlerhandler,
and not a repository.
using the accessdata implementation of aperture, it is possible to track changes of resources,
given the aperture crawlers, you have a crawling framework.
what misses is the gui for configuring the datasources and the webapp (or other container) to run aperture.
then you would have an independent crawler based on aperture that feeds into a SOLR server.
alternatively, you could also plug in the extractors of aperture to an existing crawler.
It was Darren Govoni who said at the right time 19.08.2009 17:22 the following words:
Essentially, I use Aperture to retrieve the text+metadata and create a
Solr document with fields of the metadata and a 'text' field.
In this way, I'm not sure if Solr can be a 'Repository' implementation
in Aperture lingo or some new construct. I will look into it some more.
If Solr can be a Repository implementation, then I supposed Aperture can
talk to it during crawls. I'm essentially do this now, but outside the
Aperture framework classes at the moment.
On Tue, 2009-08-18 at 16:04 +0200, Antoni Mylka wrote:
Leo Sauermann pisze:Duncan, Darren,
Rock, this really brings together useful tools!
We now have two projects working on similar things,thanks for answering, and I think we have an opportunity hereto share some code!
I would propose:* You write on the mailinglist once the code is clean.* if you have questions upgrading to the new aperture, you ask on themailinglist how to solve them* propose a nice name for that baby! looking at the solr through anaperture, will that be "hubbble telescope" :-) hubbble also brings inthe word "hub" .... more ideas welcome
we must document all these steps, I created a ticket here:
I think this task is much more important than anything else in ourcommunity, therefore i rated it with a high priority,Antoni: correct me if I am wrong.It is for the community to decide :)
In general I wanted to fix bugs (there are already three new ones sincelast week) and then proceed with an mbox accessor (I got one contributedby an Aduna partner, just have to tweak it a little) and add support foradditional compression formats (thanks to the commons-compress library).I'm not a Lucene/Solr expert but would be very happy to help with theAperture side of things.
once this is done, you get your own folder "aperture-solr" to host the code.As you have long-term interest in this, I would hope that one of youjoins Aperture as committer and stays with us a bit longer, but I guessthat maintenance will be easy, given that probably many people are goingto contribute here.
------------------------------------------------------------------------------Let Crystal Reports handle the reporting - Free
Reports 2008 30-Day Crystaltrial. Simplify your report design, integration and deployment - and focus onwhat you do best, core application coding. Discover what's new with Reports now. http://p.sf.net/sfu/bobj-july Crystal_______________________________________________Aperture-devel mailing list
Let Crystal Reports handle the reporting - Free
Reports 2008 30-Day Crystal
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Reports now. http://p.sf.net/sfu/bobj-july Crystal
Aperture-devel mailing list
Dr. Leo Sauermann http://www.dfki.de/~sauermann
Deutsches Forschungszentrum fuer
Kuenstliche Intelligenz DFKI GmbH
Trippstadter Strasse 122
P.O. Box2080 Fon: +43 6991 gnowsis
Fax: +49 631 20575-102 Kaiserslautern
Germany Mail: email@example.com
Prof.Dr.Dr.h.c.mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313