Also, I want to add:

Chris, your argumentation of "implementors will surely provide their own implementation and then will work with simple interfaces of crawler and accessdata" falls insofar apart, as
* people are lazy and expect it to work out of the box
* people will get good information about how to optimize it if the information is encoded within the AccessData interface.

=> the new interface methods also sum up this discussion we have here at the moment, this is needed documentation, otherwise our decisions will remain to rot in a mailinglist... :-)

About changing the interfaces:
according to our numbering policy [1], we may have to increase the major/minor version number now:
" Minor versions should always have complete API back-compatiblity. That's to say, any code developed against X.0 should continue to run without alteration against all X.N releases.
A major release may introduce incompatible API changes. The transition strategy is to introduce new APIs in release X.N, deprecating old APIs, then remove all deprecated APIs in release X+1.0."

Hence, its Aperture 1.1.0 release.



It was Antoni Myłka who said at the right time 24.04.2008 02:07 the following words:
short: Commited a braking change, only AccessDataImpl works, all other 
AccessData implementations are broken. Please have a go at Will work on it on Saturday. Please comment.

Christiaan Fluit pisze:
Antoni Myłka wrote:
I'd rather include those methods in the AccessData interface, but OK, it 
can also be done in the Crawler. If you have doubts about this, we may 
postpone it until Aperture 2.0
For now I prefer to do it in CrawlerBase as it does not break existing 
code of Aperture users (as far as I can tell).

I have commited a solution with new methods in AccessData. It 
necessitated a considerable extension of the AccessDataImpl. Right now 
the FileAccessData is broken (it works but storing and retrieving it 
from the file looses the aggregation relations). The model/repository 
access data implementations are also broken.

For Aperture 2 there is another thing to consider. I expect that it is 
more likely that an integrator will have to create its own AccessData 
implementation (due to the various ways in which you can realize 
persistency) than that he or she will write a custom Crawler, one that 
also doesn't extend from CrawlerBase. This might be a reason for keeping 
the AccessData API as simple as possible and doing all the recursion 
magic in CrawlerBase.

My reasoning was:
1 we can't store the aggregation relations with normal put/get methods
   because they only accept single values
2 we can't use referredId's because they have different semantics and
   do not imply cascade touch/delete (the way they're used in webcrawler)
3 conclusion from 1 and 2, we need to extend the AccessData interface
   with aggregation relations put/removeAggregatedID(s) getAggregatedIDs

this covers three of the seven methods i've added

4 conclusion from 3, change the semantics of remove(), right now it
   deletes ALL info about an ID (actual ids, referred ids, and aggregated
   without it: the user needs to call three methods to delete info about
   an id - additional constraint on the crawlers

5 we need a fourth method
    a. isTouched(id), if the crawler is to use getStoredIDs and filter
       out the untouched ones (con: this set may be HUGE)
    b. getUntouchedIDs (better)
   without it: we need to specify that there is a publicstaticfinal
   property, that must be equal to some value. This is hardly possible
   without exposing that value (another method), and traversing the
   getStoredIDs set (which may get HUGE), setting some magical property
   TOUCHED of all resources to false in initialize(), is also
   inacceptable because it incurs HUGE initialization overhead.

6 removeUntouched is necessary
   without it: the iterator returned by getUntouchedIDs doesn't support
   remove(), the iterator of the getStoredIDs set (which is HUGE) doesn't
   do it either, the CrawlerBase will have to construct a set of all
   id's to remove and iterate over it.

7 and 8
   getAggregatedIDsClosure and touchRecursively may indeed be moved to
   the CrawlerBase with little effort.

I wanted to satisfy following constraints:
1. implement the functionality described in my previous mails
2. don't use referredID's
3. don't create any big sets, (we want to save memory)
4. don't impose any additional constraints on the crawler
    implementations (e.g. call touch() explicitely)
5. (optional) make the accessData, self-contained, independent of
    crawlers and easily testable with little setup needed

The solution I've commited may be complicated, but it seems to satisfy 
those requirements. There are basic tests in the new TestAccessDataImpl 
class (I know it's ugly, some refactoring is due). Tests of other 
AccessData implementations will probably fail. I can remove the 
touchRecursively and getAggregatedIDsClosure from the interface, but 
other methods are IMHO necessary. I'll be able to resume this work on 

Please comment

Antoni Mylka

This email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2.;198757673;13503038;p?
Aperture-devel mailing list


DI Leo Sauermann 

Deutsches Forschungszentrum fuer 
Kuenstliche Intelligenz DFKI GmbH
Trippstadter Strasse 122
P.O. Box 2080           Fon:   +49 631 20575-116
D-67663 Kaiserslautern  Fax:   +49 631 20575-102
Germany                 Mail:

Prof.Dr.Dr.h.c.mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313