From:
<fre...@aj...> - 2005-12-22 13:10:13
|
In Exist, Resource are able to answer a lasModified date. org.exist.xmldb.EXistResource.getLastModificationTime() In cocoon context, I have a modified XMLDBSource implementation to make=20 Resource generators cacheable for other pipeline process. Is it expensive to maintain a lastModified() date at collection level ? If each parent collection of a modified resource was updated of a new date, this then allow to have a date validity for a query in its collection context. Till the folder is not modified, the query should send the same results ? Real life example A welcome page may aggregate various Exist query like * most recent titles (news) * categories (list of indexed term on a tag) * the welcome text * resources count to query * ... Completely generate this kind of page on each request is not very efficie= nt. If an Exist Collection could answer a lastModified(), it's easy to=20 inform the Validity of each xquery generators in cocoon context, and the=20 aggregated welcome page will be served from cache, till the=20 collections are not modified in Exist. A 1 minute query is not a problem if we can know when results change ? --=20 Fr=C3=A9d=C3=A9ric Glorieux (AJLSM, http://ajlsm.com) |
From:
<fre...@aj...> - 2005-12-23 09:10:25
|
In short, a public valid date at Collection level may allow external=20 applications to know if they can cache results instead reexecute the quer= y. > Is it expensive to maintain a lastModified() date at collection level = ? After infos from Pierrick Brihaye, maybe there's something already in Exi= st. http://exist.sourceforge.net/api/org/exist/collections/Collection.html#ge= tTimestamp() But, for the poor understanding of Exist internal I have, this int=20 doesn't seem to be the long System.currentTimeMillis() I'm waiting for. For resource, this date is set by a doc.setLastModified(System.currentTimeMillis()); for append, insert, remove, rename, replace, update in org\exist\xupdate\ and org\exist\xquery\update\ Just after, the broker is called to store the doc with this date context.getBroker().storeDocument(transaction, doc); I would be glad of some recursive setter on parent collections at this=20 place, but, my investigations have to stop very fast, because I don't=20 know if a date may be stored in some way for a Collection. The final goal is to get this Date in the xmldb implementation of=20 Collection. > In cocoon context, I have a modified XMLDBSource implementation to mak= e > Resource generators cacheable for other pipeline process. About these little changes on a Cocoon class, Sylvain Wallez explains us=20 that Cocoon project haven't the right (Apache license) to namely access=20 to an Exist class (GPL), even if the feature was interesting. But, it seems to him that it could be a valuable resource to expose, for=20 example in Exist project (like the XQueryGenerator with an expired=20 validity). --=20 Fr=C3=A9d=C3=A9ric Glorieux (AJLSM, http://ajlsm.com) |
From: Wolfgang M. <wol...@gm...> - 2005-12-23 10:57:31
|
> > Is it expensive to maintain a lastModified() date at collection level = ? Maintaining lastModified() for collections implies a performance penalty and that's why I didn't implement it: if you store thousands of small documents into a collection, you would have to update Collection.lastModified after every new document and save the collection. So far, this used to be *very* expensive, because the collection contained quite a lot of data. However, I decoupled collection storage from document storage when introducing the new recovery code, so Collection has to store much less data than before. The performance impact of maintaining lastModified for collections might thus not be as big as it used to be. You could at least give it a try and make some tests. > After infos from Pierrick Brihaye, maybe there's something already in Exi= st. > > http://exist.sourceforge.net/api/org/exist/collections/Collection.html#ge= tTimestamp() This timestamp is only used for caching. It's just an incremental counter. > For resource, this date is set by a > doc.setLastModified(System.currentTimeMillis()); > for append, insert, remove, rename, replace, update > in org\exist\xupdate\ and org\exist\xquery\update\ > > Just after, the broker is called to store the doc with this date > context.getBroker().storeDocument(transaction, doc); > > I would be glad of some recursive setter on parent collections at this > place, but, my investigations have to stop very fast, because I don't > know if a date may be stored in some way for a Collection. You could implement the same setLastModified() method for Collection. Change Collection.write() to write out the field. But: setting lastModified recursively on all ancestor collections would again have a huge performance impact (apart from the fact that changes across collections are deadlock prone). I thus think, lastModified should only be set on the current collection, not its ancestors (though I must confess that's a bit problematic). Wolfgang |
From:
<fre...@aj...> - 2005-12-23 15:23:22
|
Wolfgang, Many, many thanks for all the work already done, and the interest you=20 have always to our needs. This post will be a try to understand more Exist internal, could stay=20 without answer if you have not enough time, but this is for me a=20 necessary exercise before adding a line to Exist source. > You could implement the same setLastModified() method for Collection. > Change Collection.write() to write out the field. What I understand # add field private long modified =3D 0; to Collection # ensure persistence in Collection.write() and Collection.read() (I'm a=20 bit affraid of that, for example, what will arrive for older versions of=20 a db...) # update the field on each operation at collection level, trigger=20 operations seems to show the right places # pray that it will work nicely with the cache # open access to this info at different places, especially to show it in=20 the client > However, I decoupled collection storage from document storage when > introducing the new recovery code, so Collection has to store much > less data than before. The performance impact of maintaining > lastModified for collections might thus not be as big as it used to > be. You could at least give it a try and make some tests. I will begin next week > setting > lastModified recursively on all ancestor collections would again have > a huge performance impact If it has to be written on disk each time ? But if these collections are=20 already cached ? It's only a Class.field to set ? Problem will be for me, be sure to get the cached collection, and not=20 multiply objects. > (apart from the fact that changes across > collections are deadlock prone). mmmh, I can't figure effect of that. May be a lock is not necessary=20 here, because the only thing done is to set a field on the object, for a=20 not really critic information (may be a document will be mofdified at=20 the same time, but the only problem here will that the date will be=20 earlier ?) > I thus think, lastModified should > only be set on the current collection, not its ancestors (though I > must confess that's a bit problematic). For my need, this will mean I will have to rebuild this date whith more=20 expensive access to collections. I'm affraid that without recursive=20 parent, the usage of this info will be vastly reduced. >>=20 http://exist.sourceforge.net/api/org/exist/collections/Collection.html#ge= tTimestamp() > > This timestamp is only used for caching. It's just an incremental=20 counter. That's what I understood. --=20 Fr=E9d=E9ric Glorieux (AJLSM, http://ajlsm.com) |
From: Pierrick B. <pie...@fr...> - 2005-12-24 16:34:50
|
Hi, Fr=E9d=E9ric Glorieux wrote: > > setting > > lastModified recursively on all ancestor collections would again hav= e > > a huge performance impact >=20 > If it has to be written on disk each time ? Well... we are speaking about an internal in-memory cache (which is eventually written to disk from time to time). But this is not where the big performance issue is... > > (apart from the fact that changes across > > collections are deadlock prone). >=20 > mmmh, I can't figure effect of that. May be a lock is not necessary=20 > here, It *is*. You have to understand that the collection supports all the application's load, i.e. every insertion, deletion, get (i.e. query) operation. That's why it is cached in memory. Assigning a single field has already a cost. Cascading this assignation throughout the collection hierarchy multiplies this cost by a terrible factor (especially in what I've understood of your application's hierarchy ;-). Furthermore, in order to ensure consistency, you also have to understand that the collection *has* to be locked, i.e., from a "user" point of view, that you need to request a lock... and wait until it is granted to you... or not (lock time-out). Now, imagine cascading a timestamp from a deep collection when the root collection is requested by X process, read by X - N - 1 ones, and written by just one. Brrrr.... > > I thus think, lastModified should > > only be set on the current collection, not its ancestors (though I > > must confess that's a bit problematic). > For my need, this will mean I will have to rebuild this date whith more= =20 > expensive access to collections. I'm affraid that without recursive=20 > parent, the usage of this info will be vastly reduced. A decent solution would be to update the timestamp at each synch, possibly from a global "pending timestamps updates" collection. Not a so=20 bad granularity IMHO. > > This timestamp is only used for caching. It's just an incremental=20 > counter. >=20 > That's what I understood. So, my explanations weren't so bad :-) Cheers, p.b. |
From:
<fre...@aj...> - 2005-12-24 23:19:33
|
> Well... we are speaking about an internal in-memory cache (which is > eventually written to disk from time to time).=20 So, we are only talking about a long field to a java class. After infos on Exist internal cache for queries, working on more than=20 128 collections begin to be expensive. A 3 or 4 level tree is really a maximum. Of course, it means that the root collection should always be in memory,=20 updated by each write op. If you store a big set of docs in a collection, it means adding 3=20 collection in the cache, and 3 setLong() for each doc store. It's not a lot, but these repetive operations are not very intelligent,=20 till the end of the batch. > A decent solution would be to update the timestamp at each synch, > possibly from a global "pending timestamps updates" collection. To my opinion, if this date is the only reason to have this kind of TODO=20 list somewhere, this is a bad attempt to a global design for a so poor=20 feature. If there's already this kind of pending ops, the question will be, could=20 we know when it will be done ? From an external point of view, more than 1sec after last writing op is=20 not completely useful. It means : I can't show you the result of your=20 modification to this doc immediately in a query (ex: I correct the=20 subject of a bibliographical record, the first thing I will want to=20 check is the effect of my change on the subject list). In such case, caching will be better handled from the external app,=20 knowing what have been changed, who need a fast access to modifications,=20 who can wait (ex: public). >> mmmh, I can't figure effect of that. May be a lock is not necessary he= re, >=20 > It *is*. I understand the need of a lock for a write, also for a query before we=20 are sure there's no update in it. For a read, a "lock" is more a way to=20 say "wait for all writing operations before you in the queue to get=20 correct results" ? > Now, imagine cascading a timestamp from a deep collection when the root > collection is requested by X process, read by X - N - 1 ones, and > written by just one. Brrrr.... Of course yes, lock the root collection for this is completely unuseful. But, to my opinion, changing a timestamp doen't need a lock, because=20 this have no effects on other read/write operation. What are the risks ?=20 To have a newer date set by some one else ? What's the problem, the=20 newest is the best ? --=20 Fr=E9d=E9ric Glorieux (AJLSM, http://ajlsm.com) |