From: Villemos, G. <ger...@lo...> - 2009-08-27 11:43:40
|
We are building a similar system, using among others Aperture to crawl data, then create SOLr documents and inject them. Our suggestions to a Aperture + SOLr design (sorry if stating the obvious); Don't make Aperture depend directly on SOLr. Instead make a 'pipe' similar to the UIMA framework. Typically you want to extract the data using Aperture, then transform / consolidate it in your own processes, temporary buffer (to load balance), and then submit it. Define a standard interface to the pipe and allow flexible configuration of the chain. Possibly Apache Camel could be used for this (even tough we implemented our own pipe). This also supports integration with not only SOLr, but also semantic stores such as Virtuoso or Sesame. At the end of the pipe a serializer can turn the generic document into either a SOLr or a RDF/OWL document. It may seem weird to take a RDF graph created by Aperture, turn it into a flat, abstract model, and then converting it back to RDF/OWL again, but consider that not everybody things the NEPOMUK model ideal. Cheers, Gert. ________________________________ From: Leo Sauermann [mailto:leo...@df...] Sent: Donnerstag, 20. August 2009 19:36 To: Darren Govoni Cc: Aperture Devel Subject: Re: [Aperture-devel] SOLR integration - excellent! we need a name... I would reckon you write a SOLRCrawlerhandler, and not a repository. using the accessdata implementation of aperture, it is possible to track changes of resources, given the aperture crawlers, you have a crawling framework. what misses is the gui for configuring the datasources and the webapp (or other container) to run aperture. then you would have an independent crawler based on aperture that feeds into a SOLR server. alternatively, you could also plug in the extractors of aperture to an existing crawler. best Leo It was Darren Govoni who said at the right time 19.08.2009 17:22 the following words: Essentially, I use Aperture to retrieve the text+metadata and create a Solr document with fields of the metadata and a 'text' field. In this way, I'm not sure if Solr can be a 'Repository' implementation in Aperture lingo or some new construct. I will look into it some more. If Solr can be a Repository implementation, then I supposed Aperture can talk to it during crawls. I'm essentially do this now, but outside the Aperture framework classes at the moment. On Tue, 2009-08-18 at 16:04 +0200, Antoni Mylka wrote: Leo Sauermann pisze: Duncan, Darren, Rock, this really brings together useful tools! We now have two projects working on similar things, thanks for answering, and I think we have an opportunity here to share some code! I would propose: * You write on the mailinglist once the code is clean. * if you have questions upgrading to the new aperture, you ask on the mailinglist how to solve them * propose a nice name for that baby! looking at the solr through an aperture, will that be "hubbble telescope" :-) hubbble also brings in the word "hub" .... more ideas welcome we must document all these steps, I created a ticket here: https://sourceforge.net/tracker/?func=detail&aid=2839174&group_id=150969 &atid=779503 I think this task is much more important than anything else in our community, therefore i rated it with a high priority, Antoni: correct me if I am wrong. It is for the community to decide :) In general I wanted to fix bugs (there are already three new ones since last week) and then proceed with an mbox accessor (I got one contributed by an Aduna partner, just have to tweak it a little) and add support for additional compression formats (thanks to the commons-compress library). I'm not a Lucene/Solr expert but would be very happy to help with the Aperture side of things. once this is done, you get your own folder "aperture-solr" to host the code. As you have long-term interest in this, I would hope that one of you joins Aperture as committer and stays with us a bit longer, but I guess that maintenance will be easy, given that probably many people are going to contribute here. best Leo +1 :) Antoni Mylka ant...@gm... ------------------------------------------------------------------------ ------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Aperture-devel mailing list Ape...@li... https://lists.sourceforge.net/lists/listinfo/aperture-devel ------------------------------------------------------------------------ ------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Aperture-devel mailing list Ape...@li... https://lists.sourceforge.net/lists/listinfo/aperture-devel -- _____________________________________________________ Dr. Leo Sauermann http://www.dfki.de/~sauermann Deutsches Forschungszentrum fuer Kuenstliche Intelligenz DFKI GmbH Trippstadter Strasse 122 P.O. Box 2080 Fon: +43 6991 gnowsis D-67663 Kaiserslautern Fax: +49 631 20575-102 Germany Mail: leo...@df... Geschaeftsfuehrung: Prof.Dr.Dr.h.c.mult. Wolfgang Wahlster (Vorsitzender) Dr. Walter Olthoff Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes Amtsgericht Kaiserslautern, HRB 2313 _____________________________________________________ Please help Logica to respect the environment by not printing this email / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico. This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you. |