Re: [Xorm-devel] Re: Distributed Cache Management

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Yes, collection relationships are always thrown away currently.  That's 
because XORM is not smart about bidirectional relationships.  So to add 
distributed caching on top of currently functionality, they don't need 
to be propagated.

Wes

Harry Evans wrote:

>NO! to caching persistence capable objects directly.  That is too big
>of a change for me to even contemplate right now.
>
>I will take a look at JCache (it seems like an awful lot of folks are
>doing cool things with JavaGroups.  Where have I been?).  I think a
>map based api would most likely be pretty easy to integrate into LRU
>or some derivative, given that it is implemented using a map.
>
>On the Collections front, when I scanned through the code, I think I
>found all the places in InterfaceInvocationHandler, InterfaceManager,
>and TransactionImpl where xorm was interacting with the DataCache.  I
>saw where row structures were being put in, updated and taken out of
>the cache that backed the InterfaceInvocationHandlers (TransactionImpl
>lines 247-282), but the same areas of code also update Collection
>Relationship proxies, and don't seem to make any cache calls
>(TransactionImpl lines 225-240).
>
>Even if it did, they probably wouldn't be in cache, given the fact
>that even the DataCache interface specifies that anything without a
>primary key that you call add() with is, by contract, thrown away. 
>However, it might be worth making the calls for Collection mods, even
>if they don't end up getting stored, to give the cache layer a chance
>to deal with propagating the changes.  I just need to get a little bit
>better understanding of where this info is being held onto in (if at
>all), if it isn't the DataCache layer. (Maybe collections are always
>thrown away with the InterfaceInvocationHandler, and then requeried?)
>
>Harry
>
>On 10/5/05, Wes Biggs <we...@ca...> wrote:
>  
>
>>Changes to collections are always accomplished by changes to the data
>>model.  That being said, I forget if XORM is keeping many-to-many table
>>Rows in cache currently -- it would need to for this scheme to work,
>>unless you want to change the whole thing and cache the
>>persistence-capable objects directly (I don't think this is a good idea).
>>
>>You might look at JCache.sourceforge.net which uses JavaGroups.  This is
>>the approach that some of the commercial JDO vendors have taken --
>>integrate a third party distributed cache management solution.  The
>>JCache API (it's a JSR) is basically a Map, and the implementation
>>supposedly takes care of all the notification and sync behind the
>>scenes.  But that might be harder to integrate with the existing
>>LRUCache, etc.
>>
>>Wes
>>
>>Harry Evans wrote:
>>
>>    
>>
>>>Having spoken with Doug, I am looking at using the JGroups (or
>>>JavaGroups, gotta love that Sun trademark enforcement) as the
>>>mechanism for transporting the notifications.
>>>
>>>Thoughts:
>>>1.  Modify LRUCache to optionally notify a listener when a Row is
>>>added, updated, or removed.
>>>2.  Subclass LRUCache to use (most likely) a JGroups NotificationBus
>>>for cache events.
>>>3.  Setup JGroups to manage this notification, and provide properties
>>>the subclass can read to allow for enhanced settings (UDP versus TCP
>>>notifications, etc).
>>>4.  I walked through some xorm code, and it doesn't appear that
>>>relationship proxy information is ever directly conveyed to the cache
>>>level, which means that there is no way, at that level, to invalidate
>>>the Collection case I listed previously.  I could really use some
>>>insight into an approach for this area (*cough* Wes *cough*) as this
>>>seems like a kinda important part.  I might have this wrong, but it
>>>appears xorm manages all collection level information (12M and M2M) at
>>>the ObjectProxy level, and not at all in the Row DataCache level.  Is
>>>that correct?
>>>5.  I am unclear whether InterfaceInvocationHandlers need to have
>>>their Row cleared in this scheme, or if they generally pass out of
>>>scope frequently enough that the Row is the main thing to focus on.
>>>This has to do with the concern that they hold a direct hard reference
>>>to a Row object, so throwing that out of cache isn't going to
>>>accomplish much if they are still pointing to it.
>>>
>>> I am really interested in the JGroups DistributedLockManager class,
>>>but it would have to be combined / enhanced with cache management to
>>>use it, and it seems like there should be an option, regardless, to
>>>only do cache management without the overhead of distributed locks.
>>>The JBossCache stuff also might be more appropriate in the long term,
>>>as it uses JGroups stuff, but seems like it can handle both locks and
>>>caching.  I will have to do more research.
>>>
>>>Input very much appreciated.
>>>
>>>Harry
>>>
>>>On 10/4/05, *Harry Evans* <ha...@gm... <mailto:ha...@gm...>>
>>>wrote:
>>>
>>>    Optimistic Distributed Cache Management
>>>
>>>    This is a proposal for a simple distributed cache invalidation
>>>    strategy for xorm.  It might be implementable at only the cache
>>>    manager layer, or might require integration at a higher layer.  It
>>>    does not do coordinated lock management, but would do reasonable cache
>>>    invalidation in the cases where the same enitity was not modified in
>>>    multiple instances in rapid succession.  This is seen as the first
>>>    phase in a more pessimistic strategy that would do actual lock
>>>    management.
>>>
>>>    It uses the multicast transmission facilities in Java to set up a peer
>>>    to peer local network on which cache invalidation messages are sent to
>>>    other servers using the same datastore.  Detected events are
>>>    broadcast
>>>    to other servers, allowing them to either hollow the referenced object
>>>    or remove them completely from cache.  Below are the major pieces of
>>>    the implementation as I see them.
>>>
>>>    I am looking at using JRMS (Java Reliable Multicast Service) jar
>>>    (http://www.experimentalstuff.com/Technologies/JRMS/index.html) to
>>>    reasonably guarantee the receipt of packets to each server
>>>    (specifically the LRMP variety, with late-join catchup disabled).
>>>    Basic desription of major areas below, followed by questions.
>>>
>>>    I would really really like to find a way to do this at the cache
>>>    manager level, as it would be a simple properties change to enable it,
>>>    by using a version of the LRU cache that implemented the additional
>>>    functionality, but I am not sure this is possible.  Feedback
>>>    appreciated.
>>>
>>>    Harry
>>>
>>>    Detection:
>>>    "Broadcast" means notification sent out over multicast
>>>    Objects that are deleted must be broadcast to be removed
>>>    Objects that have simple attributes or one way references updated must
>>>    be broadcast to be hollowed or removed
>>>    Changes to collections must be broadcast:
>>>      ObjParent has a collection that contains objects, including
>>>    ObjChild
>>>      ObjChild has a direct reference to ObjParent
>>>      ObjChild has its reference to ObjParent removed (or reassigned to
>>>    ObjNewParent)
>>>      ObjChild must be broadcast to be hollowed or removed
>>>      ObjParent must either be broadcast to be hollowed or removed or
>>>    must
>>>    be broadcast have its collection set to unresolved
>>>      ObjNewParent must either be broadcast to be hollowed or removed or
>>>    must be broadcast to have its collection set to unresolved
>>>
>>>    Transmission:
>>>    Initial transmission is seen as non locking multicast packets
>>>    All cache instances are both transmitters and receivers of packets
>>>    (there is no central server)
>>>    When a cache instance detects a broadcast event:
>>>      It formulates a broadcast packet, containing the table name and
>>>    primary key of the Object
>>>      (Should this be class instead of table?)
>>>      It transmits the packet on the multicast address for the group
>>>      Each cache instance (except the sender) receives the packet
>>>      Each cache instance purges (or makes hollow?) the referenced object
>>>      (Does this need to distinguish between hollow vs removal for change
>>>    vs delete, respectively?)
>>>      (How are collections handled?)
>>>
>>>    Packet format:
>>>    Packets are datagrams, pssoibly of fixed length:
>>>      Format: (Action ( Remove vs hollow?) + ) Primary key + Hash of
>>>    (Class | table name?)
>>>      (Can we use fixed length for packet?  ie can we reliably hash class
>>>    or table name?)
>>>      Packet is fixed length to minimize message size.  Variable length is
>>>    possible, but seems bad.
>>>      If using a hash, each cache manager calculates hash value upon entry
>>>    of (class | table) into cache space.  Unknown hashes are ignored (as
>>>    they aren't in cache).
>>>
>>>    Questions:
>>>    How do we detect the appropriate broadcast events at the cache
>>>    manager level?
>>>    Can multiple VMs on the same host bind to the same multicast group
>>>    address?  If not, do we need some form of local relay? (Tested to be
>>>    okay on windows)
>>>    Do we use table name, class name, or something else for entity
>>>    representation?
>>>    Can we use a hash of the entity representation (class or table name)
>>>    to allow for fixed length packets?
>>>    Which hash formula do we use?
>>>    Is there too much multicast traffic on a network of 60 machines? (we
>>>    can use separate addresses for separate clusters)
>>>    Are networks (or switches) commonly configured to support local
>>>    multicast packets?
>>>      
>>>