#98 Implement support for moving files

1.6.0 - features

When a crawler crawls a data source - it yields data objects with absolute URIs. Afterwards if we want to access those objects with a DataAccessor or a DataAccessor with SubCrawler, the objects need to be at the same physical location.

We need a mechanism that would allow us to crawl a folder, move that folder to a different location and still be able to access the files using the uris obtained by the crawler.

Use cases:
- crawling folders on removable media
- or on mounted network shares
- changing the domain of a website between crawls

This obviously includes incremental crawling so when we crawl a data source and the source moves, then the mechanism should allow for

- a subsequent incremental crawl should work correctly (e.g. return all Unmodified if nothing has changed apart from the location)
- subsequent calls to DataAccessor.getDataObject and SubCrawler.getDataObject should yield proper objects, (or null in case of getDataObjectIfModified and non-null AccessData)


  • Antoni Mylka

    Antoni Mylka - 2010-08-10
    • status: open --> closed
  • Antoni Mylka

    Antoni Mylka - 2010-08-10

    A lot of work has gone into this already. The movable data sources seem to work: Filesystem and Mbox. Additional ones can be enabled on demand when the need arises. I close this issue.


Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks