Where are RepositoryImpl and RepositoryAccessData located (in what jar and what version)?  I am thinking I need to download a new sesame-2.0 but all I could find was the SDK..  When I looked in the aperture CVS tree, lib is much newer and different than the version I am using.
This is what I have in the current version of aperture I am using:
When is a new release coming out?  Should I move to the CVS version?  I checked out the HEAD of CVS which looks like it has all the newest stuff but how stable it?
It seems to me that a simple solution for you would be to use an AccessData instance. The AccessData interface has been created specifically for the purpose of supporting the incremental crawling. There is an implementation called RepositoryAccessData that uses a repository to store the necessary information.

Attached below is a snippet of code you could put into the ExampleFileCrawler.crawl() method. This will make the crawler store the most basic information (in the case of the FileSystemCrawler this would be the path and the last modified date) in the accessDataRepository. The Crawler will use this information on any subsequent crawls to detect if a file had already been crawled before. If it had and it hadn't changed since that time, the uri of that file will be reported as objectNotModified. No extraction will be necessary then.

Note that I write this code from memory. It is meant to convey an idea and I didn't test it. Please post any further questions you might have.

Antoni Mylka


// setup a crawler that can handle this type of DataSource
FileSystemCrawler crawler = new FileSystemCrawler();
crawler.setDataAccessorRegistry(new DefaultDataAccessorRegistry());
crawler.setCrawlerHandler(new SimpleCrawlerHandler());

///////////////////// ADD THESE LINES /////////////////////////////
// setup an AccessData instance
File accessDataDirectory = new File("Path/to/the/accessdata/store");
Repository accessDataRepository = new RepositoryImpl(new NativeStore(accessDataDirectory));
RepositoryAccessData accessData = new RepositoryAccessData(accessDataRepository);

// start crawling
crawler.crawl ();