2 more bits:
look into the example code, there is a fileinspector which shows a lot.

if you want to do hardcore multi-cpu clustered indexing of > 1 mio files,
you should peek into the thing the SMILA folks do with aperture:


It was Antoni Myłka who said at the right time 28.01.2009 17:29 the following words:
Darren Govoni pisze:
  I'm new to aperture and think its great. I wanted to know if I scan
individual files with aperture and get their individual RDF metadata, if
I combine those triples in a triple-store if the result would be equal
to the single RDF model generated as if Aperture crawled all the files?

Probably. The FilesystemCrawler will also get you the folders, thus
replicating the entire folder structure, but if you only need the files
then you can call the FileAccessor directly on each file you need, and
then process the DataObject with an appropriate extractor.

That's what the FilesystemCrawler does anyway. It's really simple, only
300 lines of code (mostly commented-out), most of the actual meat
happens inside the FileAccessor and the Extractors, and you can use
those without the crawler.

The reason I ask is that I'm looking to do distributed processing of
files and can't crawl them with the same thread.

That's true. The crawlers are essentially single-threaded. You should
process all returned DataObjects on the same thread. Otherwise things
get fiendishly complicated (connection management in IMAP, the entire
subcrawler stack with files within files within files etc). If you must
process files on different threads, the crawler won't work. What you can
do though, is to have many crawlers on many threads crawling different
portions of the folder tree. The crawler has some features (like
handling folders with 100K files, or discarding symbolic links) you
might not want to have to reinvent.

Hope this helps.

Antoni Myłka

This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
Aperture-devel mailing list


DI Leo Sauermann       http://www.dfki.de/~sauermann 

Deutsches Forschungszentrum fuer 
Kuenstliche Intelligenz DFKI GmbH
Trippstadter Strasse 122
P.O. Box 2080           Fon:   +49 631 20575-116
D-67663 Kaiserslautern  Fax:   +49 631 20575-102
Germany                 Mail:  leo.sauermann@dfki.de

Prof.Dr.Dr.h.c.mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313