From: Antoni M. <ant...@gm...> - 2008-05-02 16:42:20
|
Hello Aperturians, the new AccessData is ready. 1. After some thought I switched to explicit touching in AccessData (touch() and isTouched methods). 2. Completed three implementations AccessDataImpl, FileAccessData and ModelAccessData. 3. All three pass a refactored unit test suite. 4. Updated the CrawlerBase - no deprecatedUrls - the handler is private, added delegate methods that take care about everything 5. Updated all crawlers to conform to those changes, previous references to deprecatedUrls and to the handler are commented out, but visible for the interested. Benefits: 1. crawling with subcrawlers should work correctly 2. deprecated urls are no more 3. the crawl report is reliable again, some crawlers didn't maintain it correctly 4. isTouched allowed me to get rid of the crawledUrls set in the WebCrawler which should lower the memory consumption 5. due to explicit touching everything should go relatively quickly Drawback: 1. The AccessData has become quite a complicated component with many methods and a sophisticated contract. It will require considerably more work for a "normal user" to write his/her own AccessData implementation, though the abstract AccessDataTest class should make it easier to write appropriate unit tests. It's possible to move touchRecursively, getAggregatedIDsClosure and recursive removal to the CrawlerBase, but it will require an addition of getAggregatedIDsIterator. Returning sets is bad if we want to conserve memory (which we do, don't we?). It would take a day. Please voice your opinions as soon as possible. We are already behind the schedule agreed on at the beginning of may and the website crawlers are still due for a refactoring. If anyone wishes to contribute to the noble cause of Aperture development - now it's the best moment. The unit tests pass, but I haven't done any functional testing yet. Will get back to it no sooner than tuesday evening. All kinds of comments welcome. Antoni Mylka ant...@gm... |