From: Gavin_King/Cirrus%<CI...@ci...> - 2002-09-04 15:21:14
|
Hi Christoph sorry about the slow ping time. It takes a while to collect thoughts and write a response to some of these things..... >>> Yesterday I was thinking about implementing a distributed cache for Hibernate. I want each node to have its own cache, but if one node writes to its db the data should be invalidated in all caches. You once mentioned that hibernate would need a transaction aware distributed cache to support distributed caching. I dont get why this is necessary. Can you tell me what kind of problems you think i will run into when trying to implement such a beast, and where you think I could start? I was thinking about using jcs as cache, and when a session is committed, just invalidate all written objects in the cache. <<< If you have a look over cirrus.hibernate.ReadWriteCache you'll see that theres some interesting logic that ensures transaction isolation is preserved. A cache entry carries around with it: (0) the cached data item if the item is fresh (1) the time it was cached (2) a lock count if any transactions are currently attempting to update the item (3) the time at which all locks had been released for a stale item All transactions lock an item before attempting to update it and unlock it after transaction completion. ie. the item has a lifecycle like this: lock <--------- ------> fresh ------> locked ---------> stale put lock release (actually the item may be locked and released multiple times while in the "locked" state until the lock count hits zero but the difficulty of representing that surpasses my minimal ascii-art skills.) A transaction may read an item of data from the cache if the transaction start time is AFTER the time at which the item was cached. (If not, the transaction must go to the database to see what state the database thinks that transaction should see.) A transaction may put an item into the cache if (a) there is no item in the cache for that id OR (b) the item is not fresh AND (c) the item in the cache with that id is unlocked AND (d) the time it was unlocked BEFORE the transaction start time So what all this means is that when doing a put, when locking, and when releasing, the transaction has to grab the current cache entry, modify it, and put it back in the cache _as_an_atomic_operation_. If you look at ReadWriteCache, atomicity is enforced by making each of these methods synchronized (a rare use of synchronized blocks in Hibernate). However, in a distributed environment you would need some other kind of method of synchronizing access from multiple servers. I imagine you would implement this using something like the following: * Create a new implementation of CacheConcurrencyStrategy -DistributedCacheConcurrencyStrategy * DistributedCacheConcurrencyStrategy would delegate its functionality to ReadWriteCache which in turn delegates to JCSCache (which must be a distributed JCS cache, so all servers see the same lock count + timestamps) * implement a LockServer process that would sit somewhere on the network and hand out very-short-duration locks on a particular id. * DistributedCacheConcurrencyStrategy would use the LockServer to synchronize access to the JCS Cache between multiservers. Locks would be expired on the same timescale as the cache timeout (which is assumed in all this to be >> than the transaction timeout) to allow for misbehaving processes, server failures, etc. Of course, any kind of distributed synchronization has a *very* major impact upon system scalability. I think this would be a very *fun* kind of thing to implement and would be practical for some systems. It would also be a great demonstration of the fexibility of our approach because clearly this is exactly the kind of thing that Hibernate was never meant to be good for! :) Gavin |