[Xorm-devel] Distributed Cache Management

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I have a simple design for distributed cache management.  It has a 
couple of holes, but doesn't seem like a bad first pass at it.  I would 
like to get feedback from the group to see what others think.

CacheManager methods (don't worry about local vs. remote right now)
-lock(Object o, transactionId) : Lock
Checks version of object, and attempts to create a lock object for it.  
Version is either some kind of change counter, or some time kind of time 
stamp of last update time in the datastore.
--o
The xorm managed object to lock
--transactionId
a relatively unique id for the transaction obtaining the lock.  Used to 
allow the same transaction to acquire a lock on the same object more 
than once.
--returns
If the object is already locked, returns the special LOCK_FAILED Lock 
object. (An overridden version of the method might allow the caller to 
specify waiting until the lock is obtained)
If a lock is granted, returns a Lock object with status LOCK_SUCCESS, 
and a valid lock expiration time.
If a lock is granted, but the object is stale, returns a Lock object 
with LOCK_REFRESH, and a valid lock expiration time.  This means that 
the caller has obtained a lock on the object, but needs to refresh the 
object with the data store.

-commit(Object o, transactionId) : Lock
Verifies that this object is has a lock held by the transactionId.  If 
so, it will update the version in the Cache Manager to match the version 
held by the object.  This method should be called AFTER a change to the 
datastore, but before committing the changes to the datastore.  This 
method also depends on the version of the object being updated based on 
the changes to the datastore, even though it is not yet committed.  
Regardless of outcome, this method will result in releasing the lock on 
this object in the CacheManager.
--o
The xorm managed object that a lock was previously obtained for
--transactionId
the transactionId that was used to obtain the lock for o
--returns
If there is a lock for this object, and the transactionId matches, and 
the lock has not expired, returns COMMIT_SUCCESS
Otherwise if the lock does not exist, is not for this transactionId, or 
has expired, returns COMMIT_FAILURE.  The proper action on a 
COMMIT_FAILURE might be a retry, or a rollback.

-check(Object o) : boolean
Checks whether the object is the most current version of the object seen 
by the CacheManager.
--o
The object to check
--returns
Returns true if this version of the object is the most recent seen by 
the CacheManager
otherwise false.

-refreshLock(Object o, transactionId) : LOCK
Checks for a valid lock on the object, and if all is good, extends the 
expiration of the Lock.  Might not be needed (could just be rolled into 
lock method).
--return values as per lock() method.

The concept behind all this is that the CacheManager object does not 
talk to the db, but simply deals with objects as it sees them.  Given a 
compliant local implementation of XORM, all objects would be seen 
whenever a transaction was involved, so there shouldn't be a race 
condition issue.  A simple sequence diagram follows:

XORM                    CacheManager                DataStore
----                    ------------                ---------
  |                           |                          |
 | |---startTrans()--         |                          |
 | |                |         |                          |
 | |<----------------         |                          |
 | |                          |                          |
 | |---lock(o, tId)--------->| |                         |
 | |<----------Lock object---| |                         |
 | |                          |                          |
 | |---refreshObject(o) (if stale)--------------------->| |
 | |<---------------------------------refreshedObject---| |
 | |                          |                          |
 | |---make changes--         |                          |
 | |                |         |                          |
 | |<----------------         |                          |
 | |---commitTrans() (begin)- |                          |
 | |                        | |                          |
 | |<------------------------ |                          |
 | |                          |                          |
 | |---makeUpdates (don't commit)---------------------->| |
 | |<-----------------------------------------success---| |
 | |                          |                         | |
 | |---commit(o, tid)------->| |                        | |
 | |<--------------success---| |                        | |
 | |                          |                         | |
 | |---commitTransaction------------------------------->| |
 | |<-----------------------------------------success---| |
 | |                          |                          |
 | |---commitTrans() (end)-   |                          |
 | |                      |   |                          |
 | |<----------------------   |                          |
  |                           |                          |

I think the whole scheme might work, if you assume that no one operates 
outside of it.  I don't know if that is a reasonable caveat or not.  
Obviously, it is possible for the distributed cache manager to provide 
methods for locking or committing more than one object at the same 
time.  Also, it is assumes that the distributed cache manager would most 
likely be "remote" from the XORM instance using it, so the "local" 
methods I showed should be assumed to be local facades to a remote 
distributed cache manager.  I tis probably obvious (but worth stating)  
that all XORM instances dealing with a particular datastore would use 
the same CacheManager, possibly with CacheManager replication for 
failover.  Persistent connections, and good communications design should 
make it relatively low overhead data wise.

The Lock object as it is internally held by the CacheManager might look 
something like this:
Lock
----
class : Class
id : objectPrimaryKey
version : Version (int, timestamp, etc)
txnId : TransactionId (int, long, etc)
expires: Time (long, Date, etc)

For objects that have been seen, but are not currently locked, txnId and 
expires would be null.  For objects that are locked, all values would be 
non null.  Locks can be removed lazily, by checking a lock when needed, 
and if it is expired, nulling txnId and expires.

This might be way off base, but it seemed like a good first pass.  I am 
not trying to reinvent the wheel here, but it seems like something like 
this might solve most of the "coordinated" or "distributed" cache 
management issues we see with xorm right now.

Thoughts / comments / questions?

Harry