Also, I want to add:
Chris, your argumentation of "implementors will surely provide their
own implementation and then will work with simple interfaces of crawler
and accessdata" falls insofar apart, as
* people are lazy and expect it to work out of the box
* people will get good information about how to optimize it if the
information is encoded within the AccessData interface.
=> the new interface methods also sum up this discussion we have
here at the moment, this is needed documentation, otherwise our
decisions will remain to rot in a mailinglist... :-)
About changing the interfaces:
according to our numbering policy , we may have to increase the
major/minor version number now:
Minor versions should always have complete API back-compatiblity.
That's to say, any code developed against X.0 should
continue to run without alteration against all X.N
A major release may introduce incompatible API changes. The transition
strategy is to introduce new APIs in release X.N,
deprecating old APIs, then remove all deprecated APIs in release X+1.0."
Hence, its Aperture 1.1.0 release.
It was Antoni Myłka who said at the right time 24.04.2008 02:07 the
short: Commited a braking change, only AccessDataImpl works, all other
AccessData implementations are broken. Please have a go at
TestAccessDataImpl.java. Will work on it on Saturday. Please comment.
Christiaan Fluit pisze:
Antoni Myłka wrote:
I'd rather include those methods in the AccessData interface, but OK, it
can also be done in the Crawler. If you have doubts about this, we may
postpone it until Aperture 2.0
For now I prefer to do it in CrawlerBase as it does not break existing
code of Aperture users (as far as I can tell).
I have commited a solution with new methods in AccessData. It
necessitated a considerable extension of the AccessDataImpl. Right now
the FileAccessData is broken (it works but storing and retrieving it
from the file looses the aggregation relations). The model/repository
access data implementations are also broken.
For Aperture 2 there is another thing to consider. I expect that it is
more likely that an integrator will have to create its own AccessData
implementation (due to the various ways in which you can realize
persistency) than that he or she will write a custom Crawler, one that
also doesn't extend from CrawlerBase. This might be a reason for keeping
the AccessData API as simple as possible and doing all the recursion
magic in CrawlerBase.
My reasoning was:
1 we can't store the aggregation relations with normal put/get methods
because they only accept single values
2 we can't use referredId's because they have different semantics and
do not imply cascade touch/delete (the way they're used in webcrawler)
3 conclusion from 1 and 2, we need to extend the AccessData interface
with aggregation relations put/removeAggregatedID(s) getAggregatedIDs
this covers three of the seven methods i've added
4 conclusion from 3, change the semantics of remove(), right now it
deletes ALL info about an ID (actual ids, referred ids, and aggregated
without it: the user needs to call three methods to delete info about
an id - additional constraint on the crawlers
5 we need a fourth method
a. isTouched(id), if the crawler is to use getStoredIDs and filter
out the untouched ones (con: this set may be HUGE)
b. getUntouchedIDs (better)
without it: we need to specify that there is a publicstaticfinal
property, that must be equal to some value. This is hardly possible
without exposing that value (another method), and traversing the
getStoredIDs set (which may get HUGE), setting some magical property
TOUCHED of all resources to false in initialize(), is also
inacceptable because it incurs HUGE initialization overhead.
6 removeUntouched is necessary
without it: the iterator returned by getUntouchedIDs doesn't support
remove(), the iterator of the getStoredIDs set (which is HUGE) doesn't
do it either, the CrawlerBase will have to construct a set of all
id's to remove and iterate over it.
7 and 8
getAggregatedIDsClosure and touchRecursively may indeed be moved to
the CrawlerBase with little effort.
I wanted to satisfy following constraints:
1. implement the functionality described in my previous mails
2. don't use referredID's
3. don't create any big sets, (we want to save memory)
4. don't impose any additional constraints on the crawler
implementations (e.g. call touch() explicitely)
5. (optional) make the accessData, self-contained, independent of
crawlers and easily testable with little setup needed
The solution I've commited may be complicated, but it seems to satisfy
those requirements. There are basic tests in the new TestAccessDataImpl
class (I know it's ugly, some refactoring is due). Tests of other
AccessData implementations will probably fail. I can remove the
touchRecursively and getAggregatedIDsClosure from the interface, but
other methods are IMHO necessary. I'll be able to resume this work on
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
Aperture-devel mailing list
DI Leo Sauermann http://www.dfki.de/~sauermann
Deutsches Forschungszentrum fuer
Kuenstliche Intelligenz DFKI GmbH
Trippstadter Strasse 122
P.O. Box 2080 Fon: +49 631 20575-116
D-67663 Kaiserslautern Fax: +49 631 20575-102
Germany Mail: email@example.com
Prof.Dr.Dr.h.c.mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313