Of course, that would also be part of SMILA, but they have their own
crawling architecture and the plans to use the aperture crawling
architecture in SMILA are only slowly moving forward. They only have
used the Aperture extractors, which is only half the thing.
Also, SMILA is a massively scaling enterprise architecture with a
distributed architecture and other nicetities for enterprise computing.
I need something that works "in a month", and without adding any more
btw: the list is called aperture-devel, check your mail client, you
wrote something else which I personally do not like very much.
It was Berwanger, Christian who said at the right time 21.12.2009 16:51
the following words:
Wasn't this the idea behind SMILA to provide an generic architecture and
that common used features?
How would this server integrated with SMILA?
From: Christian Reuschling [mailto:firstname.lastname@example.org]
Sent: 21 December 2009 16:26
Subject: Re: [Aperture-devel] aperture crawling server with configurable
datasources - and windows share crawling - SOLR guys?
oh yes - it would be a very interesting and useful scenario to have this
possibility. Entry points for using could be e.g.
- controling the crawling process with RPC's (running of the extisting
interface as a service)
- configuring of the data sources that should be crawled periodically.
- status interface for stuff like 'what the server does currently, or
statistics what he has done (which is independent from the used
- configuring of the persistance layer(s) that should be used (e.g.
Lucene, Databse, RDFStore, etc)
I imagine that currently more or less each project that uses aperture
has to to similar stuff on its own. (This e.g. is the truth for DynaQ).
Supporting people to minimize their entry level is always a good idea
for an open source project.
But we have to be careful not to lose flexibility. Big monolytic blocks
are real pain.
My 2 cts
On Mon, 21 Dec 2009 15:27:36 +0100
Leo Sauermann <email@example.com> wrote:
Ok, so crawling many datasources is exactly "spot on" for aperture,
how about an aperture server software that makes the whole thing a
"product" and not just a "library"?
It was Antoni Mylka who said at the right time 21.12.2009 12:10 the
The idea is great, if there is more interest in using Aperture in
SOLR then we could expand in this direction. What is needed is
feedback what data sources would you like to use with SOLR (or use
already). If there is need, we could think about expanding in that
Some ideas from the top of my head:
- elevate the MimeSubCrawler into an mbox subcrawler, that would
crawl plain text mailing list archives properly, or make proper mbox
crawling results appear in the FileSystemCrawler output
- extend the output of flickr/delicious/bibsonomy subcrawlers, let
them extract photo comments, or publication abstracts
- think about some SambaCrawler or FTPCrawler or pick up the
abandoned WebdavCrawler that that would crawl remote folders without
having to mount them locally first
- LdapCrawler? XMLDbCrawler?
sky is the limit
Leo Sauermann pisze:
in the organik-project.eu, and for other things, it would be
awesome to have an "aperture crawling server" that is an
installable war file that just crawls some configured datasources
and then does something with the crawled rdf.
for example, we have this company that has a fileshare on a windows
server. Now I would like to install "aperture crawling server" on
some machine as WAR file, instruct it using a web-interface to
crawl a datasource "windows share" (and maybe some internal
websites using the webcrawler and maybe some newsletter using imap)
and off it goes to do this.
Then I would like to configure the "SOLR crawling handler" or the
"drupal" crawling handler to tell the aperture crawling server what
to do with the RDF.
I know that the people having done the SOLR integration of Aperture
probably did something exactly like this - do we have some open
source code now available for it?
have you written an aperture-based crawling server that you can
share with me?
(doesn't have to be completly open source, but only looking at the
code and the architecture could teach me a lot)
would you think this is a rocking idea and join a new subproject in
Aperture for this?
(just say yes and then we go on from this, forming a group with
has anyone written a windows fileshare-crawler? What java libraries
are there to crawl windows/samba shares?
(that would be awesome)
---------- This SF.Net email is sponsored by the Verizon Developer
Community Take advantage of Verizon's best-in-class app development
support A streamlined, 14 day to market process makes app
distribution fast and easy Join now and get one step closer to
millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev
Aperture-devel mailing list
Please help Logica to respect the environment by not printing this email / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico.
This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
Aperture-devel mailing list
Dr. Leo Sauermann http://www.dfki.de/~sauermann
Deutsches Forschungszentrum fuer
Kuenstliche Intelligenz DFKI GmbH
Trippstadter Strasse 122
P.O. Box 2080 Fon: +43 6991 gnowsis
D-67663 Kaiserslautern Fax: +49 631 20575-102
Germany Mail: firstname.lastname@example.org
Prof.Dr.Dr.h.c.mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313