From: Gavin_King/Cirrus%<CI...@ci...> - 2002-02-27 03:34:24
|
This would be brilliant, Doug. I'd love to see it. I bet in real systems it would be a *much* bigger performance boost than having lazily initialized objects. > I have been thinking about "deep lazy collections." For example, a map > where only individual entries are read from the database as needed, > and only added or modified entries are written. [This is probably only > really useful for maps, and maybe sets.] I would have thought maps and *lists*?? > Of course you could create such a map by using a DATE field as the > primary key (id) for some class and using custom queries. But there is > a certain beauty in using Hibernate collections and Java accesses > directly. I agree. You don't want to write queries where you would naturally write Java. And there are other advantages to collections, like garbage collection. > I believe that deep lazy collections (lazy="deep") could be > implemented without *too* much effort. I would appreciate critiques on > my proposal... I believe its not "*too*" difficult and half of the problem (ie. only writing changed entries) has been on my secret todo list for some time (I hadn't thought of the other half.) This is not the same as saying it will be easy. Without doubt, the most difficult code in hibernate is the stuffin RelationalDatabaseSession that handles collections. I got it wrong quite a few times. The complexity is a result of: 1. arbitrarily nested subcollections 2. composite collection elements 3. persistent garbage collection 4. lazy initialization (5. collections not independantly versioned) Those four (5) things interact in wierd and wonderful ways. For example, if you want to garbage collect a collection thats not been (lazily) initialized, you have to load up not only that collection, but also all its subcollections so you can delete _them_ from the database. > 1. Deep Lazy collection classes would extend the Lazy classes in > cirrus.hibernate.collections.*. As a fallback, if the user does > something inconsistent with deep laziness, e.g., asking for a > collection of elements via Map.values() for example, the Deep Lazy > class would simply revert to Lazy behavior. > 3. Deep Lazy collection classes would maintain a number of local maps > to record additions, removals, and modifications. These maps, along > with a cache of read objects, are consulted before going to the > database. See logic and invariants below. > 4. CollectionPersister would be extended to support sql for select by > map-key, delete by map-key, and update by map-key, as well as count(*). > 5. To maintain some backward compatibility with the current > RelationalDatabaseSession and its use of dorecreate and doremove, > PersistentCollection could add methods for > isDeepLazy() => false (if true, don't delete all, instead ask for:) > deleteEntries() > insertEntries() => entries() for old collection classes > updateEntries() => null for old collection classes Heres some extra issues for you: 1. You can't keep any state, apart from actual elements and a pointer to the session, in the collection wrapper. This is to support rollback of the session state when an exception occurs. So lists of additions, removals, etc got to be kept on the CollectionEntry in RelationalDatabaseSession. It might be best to have two subclasses of CollectionEntry and try to do some things polymorphically. 2. It *might* be a good idea to move some functionality currently implemented on RelationalDatabaseSession to CollectionPersister. This would let you subclass CollectionPersister to change behaviour for deep lazy collections. On the other hand, persisters are not allowed to keep any mutable state (obviously), so that might result in lots of nasty calls to/from the session. 3. What happens when a Map starts out in a role where its doing "deep lazy" initialization and then moves to a role where its doing lazy or eager initialization? You will have the wrong wrapper class. This is a somewhat hard problem.... In light of (1) and especially (2), I would consider a different design. Continue using the current collection wrappers and make a single change: * write() and read() take an argument: the index being accessed SessionImplementor.initialize() and SessionImplementor.dirty() also recieve that argument. The wrapper itself never knows how fully initialized it is. This is consistent with the current design. The CollectionEntry keeps track of which elements changed and which are initialized. Subclass CollectionEntry and/or CollectionPersister to implement the functionality that varies between lazy and deep lazy. > There are surely some things I haven't considered or don't fully > appreciate. Some areas that concern me: > 1. The notion of ce.initialized must be refined. There will be > collections that have a persister, but have not been read entirely. > I.e., ce.initialized, but elements() not valid; see: flush, > updateReachableCollection, searchForDirtyCollections, > prepareCollectionForUpdate. > 2. prepareCollectionForUpdate must be smarter, e.g., deciding when to > totally recreate a collection versus when to update only; this may be > a separate dirty case verus a copied or moved case, I'm not sure. > 3. How does this affect copying and reusing (sub-)collections? Doug, why don't you create a new CVS branch for playing with this stuff and have a look at implementing the logic you described on CollectionEntry or a subclass. (We might need to make it a non-inner class.) |