Hi

Of course, that would also be part of SMILA, but they have their own crawling architecture and the plans to use the aperture crawling architecture in SMILA are only slowly moving forward. They only have used the Aperture extractors, which is only half the thing.
Also, SMILA is a massively scaling enterprise architecture with a distributed architecture and other nicetities for enterprise computing.

I need something that works "in a month", and without adding any more fuzz.

btw: the list is called aperture-devel, check your mail client, you wrote something else which I personally do not like very much.

best
Leo

It was Berwanger, Christian who said at the right time 21.12.2009 16:51 the following words:
Hi

Wasn't this the idea behind SMILA to provide an generic architecture and
that common used features?

How would this server integrated with SMILA?

Christian

-----Original Message-----
From: Christian Reuschling [mailto:christian.reuschling@gmail.com] 
Sent: 21 December 2009 16:26
To: aperture-devel@lists.sourceforge.net
Subject: Re: [Aperture-devel] aperture crawling server with configurable
datasources - and windows share crawling - SOLR guys?

oh yes - it would be a very interesting and useful scenario to have this
possibility. Entry points for using could be e.g.

- controling the crawling process with RPC's (running of the extisting
  interface as a service)
- configuring of the data sources that should be crawled periodically. 
- status interface for stuff like 'what the server does currently, or
some
  statistics what he has done (which is independent from the used
  index/persistence layer)
- configuring of the persistance layer(s) that should be used (e.g.
SOLR,
  Lucene, Databse, RDFStore, etc)


I imagine that currently more or less each project that uses aperture
has to to similar stuff on its own. (This e.g. is the truth for DynaQ).
Supporting people to minimize their entry level is always a good idea
for an open source project.
But we have to be careful not to lose flexibility. Big monolytic blocks
are real pain.


My 2 cts

Chris



On Mon, 21 Dec 2009 15:27:36 +0100
Leo Sauermann <leo.sauermann@dfki.de> wrote:

  
Hi

Ok, so crawling many datasources is exactly "spot on" for aperture, 
how about an aperture server software that makes the whole thing a 
"product" and not just a "library"?

best
Leo

It was Antoni Mylka who said at the right time 21.12.2009 12:10 the 
following words:
    
Leo, Aperturians

The idea is great, if there is more interest in using Aperture in 
SOLR then we could expand in this direction. What is needed is 
feedback what data sources would you like to use with SOLR (or use 
already). If there is need, we could think about expanding in that
      
direction.
  
Some ideas from the top of my head:

- elevate the MimeSubCrawler into an mbox subcrawler, that would 
crawl plain text mailing list archives properly, or make proper mbox
      

  
crawling results appear in the FileSystemCrawler output
- extend the output of flickr/delicious/bibsonomy subcrawlers, let 
them extract photo comments, or publication abstracts
- think about some SambaCrawler or FTPCrawler or pick up the 
abandoned WebdavCrawler that that would crawl remote folders without
      

  
having to mount them locally first
- LdapCrawler? XMLDbCrawler?

sky is the limit

Antoni

Leo Sauermann pisze:
  
      
Hi Aperturians,

in the organik-project.eu, and for other things, it would be 
awesome to have an "aperture crawling server" that is an 
installable war file that just crawls some configured datasources 
and then does something with the crawled rdf.

for example, we have this company that has a fileshare on a windows
        

  
server. Now I would like to install "aperture crawling server" on 
some machine as WAR file, instruct it using a web-interface to 
crawl a datasource "windows share" (and maybe some internal 
websites using the webcrawler and maybe some newsletter using imap)
        
and off it goes to do this.
  
Then I would like to configure the "SOLR crawling handler" or the 
"drupal" crawling handler to tell the aperture crawling server what
        

  
to do with the RDF.

I know that the people having done the SOLR integration of Aperture
        

  
probably did something exactly like this - do we have some open 
source code now available for it?

have you written an aperture-based crawling server that you can 
share with me?
(doesn't have to be completly open source, but only looking at the 
code and the architecture could teach me a lot)

would you think this is a rocking idea and join a new subproject in
        

  
Aperture for this?
(just say yes and then we go on from this, forming a group with 
requirements, etc)

has anyone written a windows fileshare-crawler? What java libraries
        

  
are there to crawl windows/samba shares?
(that would be awesome)

best
Leo

    
        
--------------------------------------------------------------------
---------- This SF.Net email is sponsored by the Verizon Developer 
Community Take advantage of Verizon's best-in-class app development 
support A streamlined, 14 day to market process makes app 
distribution fast and easy Join now and get one step closer to 
millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev
_______________________________________________
Aperture-devel mailing list
Aperture-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aperture-devel
  
      
    


Please help Logica to respect the environment by not printing this email  / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.



  

------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev

_______________________________________________ Aperture-devel mailing list Aperture-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/aperture-devel


-- 
_____________________________________________________
Dr. Leo Sauermann       http://www.dfki.de/~sauermann 

Deutsches Forschungszentrum fuer 
Kuenstliche Intelligenz DFKI GmbH
Trippstadter Strasse 122
P.O. Box 2080           Fon:   +43 6991 gnowsis
D-67663 Kaiserslautern  Fax:   +49 631 20575-102
Germany                 Mail:  leo.sauermann@dfki.de

Geschaeftsfuehrung:
Prof.Dr.Dr.h.c.mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
_____________________________________________________