Hi,

Wouldn't it be better to just wait for the existing code from Darren and Duncan and see how that works?

I would prefer to see something that works and then make decisions. (not that queuing would not work, but why invest time and brainpower if the problem is solved already)

best
Leo

It was Villemos, Gert who said at the right time 27.08.2009 13:42 the following words:

We are building a similar system, using among others Aperture to crawl data, then create SOLr documents and inject them.

 

Our suggestions to a Aperture + SOLr design (sorry if stating the obvious);

 

Don’t make Aperture depend directly on SOLr. Instead make a ‘pipe’ similar to the UIMA framework. Typically you want to extract the data using Aperture, then transform / consolidate it in your own processes, temporary buffer (to load balance), and then submit it. Define a standard interface to the pipe and allow flexible configuration of the chain. Possibly Apache Camel could be used for this (even tough we implemented our own pipe).

 

This also supports integration with not only SOLr, but also semantic stores such as Virtuoso or Sesame. At the end of the pipe a serializer can turn the generic document into either a SOLr or a RDF/OWL document. It may seem weird to take a RDF graph created by Aperture, turn it into a flat, abstract model, and then converting it back to RDF/OWL again, but consider that not everybody things the NEPOMUK model ideal.

 

Cheers,

Gert.

 

 

 


From: Leo Sauermann [mailto:leo.sauermann@dfki.de]
Sent: Donnerstag, 20. August 2009 19:36
To: Darren Govoni
Cc: Aperture Devel
Subject: Re: [Aperture-devel] SOLR integration - excellent! we need a name...

 

I would reckon you write a SOLRCrawlerhandler,
and not a repository.

using the accessdata implementation of aperture, it is possible to track changes of resources,
given the aperture crawlers, you have a crawling framework.

what misses is the gui for configuring the datasources and the webapp (or other container) to run aperture.

then you would have an independent crawler based on aperture that feeds into a SOLR server.

alternatively, you could also plug in the extractors of aperture to an existing crawler.

best
Leo


It was Darren Govoni who said at the right time 19.08.2009 17:22 the following words:

Essentially, I use Aperture to retrieve the text+metadata and create a
Solr document with fields of the metadata and a 'text' field.
 
In this way, I'm not sure if Solr can be a 'Repository' implementation
in Aperture lingo or some new construct. I will look into it some more.
 
If Solr can be a Repository implementation, then I supposed Aperture can
talk to it during crawls. I'm essentially do this now, but outside the
Aperture framework classes at the moment.
 
 
On Tue, 2009-08-18 at 16:04 +0200, Antoni Mylka wrote:
  
Leo Sauermann pisze:
    
Duncan, Darren,
 
Rock, this really brings together useful tools!
 
We now have two projects working on similar things,
thanks for answering, and I think we have an opportunity here
to share some code!
 
I would propose:
* You write on the mailinglist once the code is clean.
* if you have questions upgrading to the new aperture, you ask on the
mailinglist how to solve them
* propose a nice name for that baby! looking at the solr through an
aperture, will that be "hubbble telescope" :-) hubbble also brings in
the word "hub" .... more ideas welcome
 
we must document all these steps, I created a ticket here:
https://sourceforge.net/tracker/?func=detail&aid=2839174&group_id=150969&atid=779503
 
I think this task is much more important than anything else in our
community, therefore i rated it with a high priority,
Antoni: correct me if I am wrong.
      
It is for the community to decide :)
 
In general I wanted to fix bugs (there are already three new ones since
last week) and then proceed with an mbox accessor (I got one contributed
by an Aduna partner, just have to tweak it a little) and add support for
additional compression formats (thanks to the commons-compress library).
I'm not a Lucene/Solr expert but would be very happy to help with the
Aperture side of things.
 
    
once this is done, you get your own folder "aperture-solr" to host the code.
As you have long-term interest in this, I would hope that one of you
joins Aperture as committer and stays with us a bit longer, but I guess
that maintenance will be easy, given that probably many people are going
to contribute here.
 
best
Leo
      
+1 :)
 
Antoni Mylka
antoni.mylka@gmail.com
 
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Aperture-devel mailing list
Aperture-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aperture-devel
    
 
 
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Aperture-devel mailing list
Aperture-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aperture-devel
  




-- 
_____________________________________________________
Dr. Leo Sauermann       http://www.dfki.de/~sauermann 
 
Deutsches Forschungszentrum fuer 
Kuenstliche Intelligenz DFKI GmbH
Trippstadter Strasse 122
P.O. Box 2080           Fon:   +43 6991 gnowsis
D-67663 Kaiserslautern  Fax:   +49 631 20575-102
Germany                 Mail:  leo.sauermann@dfki.de
 
Geschaeftsfuehrung:
Prof.Dr.Dr.h.c.mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
_____________________________________________________

Please help Logica to respect the environment by not printing this email / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a respeitar o ambiente não imprimindo este correio electrónico. This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.

------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july

_______________________________________________ Aperture-devel mailing list Aperture-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/aperture-devel


-- 
_____________________________________________________
Dr. Leo Sauermann       http://www.dfki.de/~sauermann 

Deutsches Forschungszentrum fuer 
Kuenstliche Intelligenz DFKI GmbH
Trippstadter Strasse 122
P.O. Box 2080           Fon:   +43 6991 gnowsis
D-67663 Kaiserslautern  Fax:   +49 631 20575-102
Germany                 Mail:  leo.sauermann@dfki.de

Geschaeftsfuehrung:
Prof.Dr.Dr.h.c.mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
_____________________________________________________