From: Grant I. <gsi...@ap...> - 2008-08-22 21:15:34
|
On Aug 22, 2008, at 4:23 PM, Antoni Myłka wrote: > Grant Ingersoll pisze: >> Hi, >> >> Was testing my new persistence stuff and noticed that I was getting >> callbacks to objectRemoved() when I didn't expect them. Upon >> debugging, I see that they are coming from: reportUntouched(): >> protected void reportUntouched() { >> ClosableIterator iter = accessData.getUntouchedIDsIterator(); >> while (iter.hasNext()) { >> handler.objectRemoved(this, iter.next().toString()); //// >> HERE >> crawlReport.increaseRemovedCount(); >> } >> accessData.removeUntouchedIDs(); >> } >> >> >> Shouldn't this be calling objectNotModified() or some new method >> named >> objectNotTouched()? If not, what is the use case where untouched >> means removed and how would I distinguish untouched from removed in >> the callback such that I can deal w/ it the way I need to? >> >> I suppose one workaround is to extend the crawlers used to override >> the reportUntouched(), but I suspect I am missing something in terms >> of how this is intended to be used. >> >> Thanks, >> Grant >> > > 'Touched id' means 'Crawler tried to get some information from the > AccessData about that id', which means that 'touched' id's are those > that were 'touched' by the crawler during a crawl. All resources that > have not been touched by the crawler are those that were present in > the > data source on the previous crawl, but the crawler hasn't encountered > them on a next crawl. That's why all 'untouched' ids are reported as > removed, they were there on the previous crawl but aren't there on the > current crawl. That's how we define 'removed'. It doesn't mean that > 'The > user has taken some action to actually 'remove' the object', not > 'removed by the user', just 'disappeared from the datasource'. > > Does it make a difference though? Maybe, in one possible use case (which I admittedly don't have at the moment): If your application is keeping historical information about a crawl. That is, your app allows the user to have access to all artifacts it has crawled over the life of the application, regardless of whether they still exist or not. Kind of like a version control system. In this case, you would want to be able to mark the item as deleted, but you wouldn't necessarily want to remove it. Just a thought. I think if my patch to implement this call back were accepted, it would make sense for the default implementation to call objectRemoved(). -Grant |