When a crawler crawls a data source - it yields data objects with absolute URIs. Afterwards if we want to access those objects with a DataAccessor or a DataAccessor with SubCrawler, the objects need to be at the same physical location.
We need a mechanism that would allow us to crawl a folder, move that folder to a different location and still be able to access the files using the uris obtained by the crawler.
- crawling folders on removable media
- or on mounted network shares
- changing the domain of a website between crawls
This obviously includes incremental crawling so when we crawl a data source and the source moves, then the mechanism should allow for
- a subsequent incremental crawl should work correctly (e.g. return all Unmodified if nothing has changed apart from the location)
- subsequent calls to DataAccessor.getDataObject and SubCrawler.getDataObject should yield proper objects, (or null in case of getDataObjectIfModified and non-null AccessData)