Re: [Hibernate-devel] deep lazy collections

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

This would be brilliant, Doug. I'd love to see it. I bet in real systems it
would be a *much* bigger performance boost than having lazily initialized
objects.

> I have been thinking about "deep lazy collections." For example, a map
> where only individual entries are read from the database as needed,
> and only added or modified entries are written. [This is probably only
> really useful for maps, and maybe sets.]

I would have thought maps and *lists*??

> Of course you could create such a map by using a DATE field as the
> primary key (id) for some class and using custom queries. But there is
> a certain beauty in using Hibernate collections and Java accesses
> directly.

I agree. You don't want to write queries where you would naturally write
Java. And there are other advantages to collections, like garbage
collection.

> I believe that deep lazy collections (lazy="deep") could be
> implemented without *too* much effort. I would appreciate critiques on
> my proposal...

I believe its not "*too*" difficult and half of the problem (ie. only
writing changed entries) has been on my secret todo list for some time
(I hadn't thought of the other half.) This is not the same as saying
it will be easy. Without doubt, the most difficult code in hibernate
is the stuffin RelationalDatabaseSession that handles collections. I
got it wrong quite a few times. The complexity is a result of:

1. arbitrarily nested subcollections
2. composite collection elements
3. persistent garbage collection
4. lazy initialization
(5. collections not independantly versioned)

Those four (5) things interact in wierd and wonderful ways. For example,
if you want to garbage collect a collection thats not been (lazily)
initialized, you have to load up not only that collection, but also all
its subcollections so you can delete _them_ from the database.

> 1. Deep Lazy collection classes would extend the Lazy classes in
> cirrus.hibernate.collections.*. As a fallback, if the user does
> something inconsistent with deep laziness, e.g., asking for a
> collection of elements via Map.values() for example, the Deep Lazy
> class would simply revert to Lazy behavior.

> 3. Deep Lazy collection classes would maintain a number of local maps
> to record additions, removals, and modifications. These maps, along
> with a cache of read objects, are consulted before going to the
> database. See logic and invariants below.

> 4. CollectionPersister would be extended to support sql for select by
> map-key, delete by map-key, and update by map-key, as well as count(*).

> 5. To maintain some backward compatibility with the current
> RelationalDatabaseSession and its use of dorecreate and doremove,
> PersistentCollection could add methods for
>     isDeepLazy() => false (if true, don't delete all, instead ask for:)
>  deleteEntries()
>  insertEntries() => entries() for old collection classes
>  updateEntries() => null for old collection classes

Heres some extra issues for you:

1. You can't keep any state, apart from actual elements and a pointer to
the session, in the collection wrapper. This is to support rollback of
the session state when an exception occurs. So lists of additions,
removals, etc got to be kept on the CollectionEntry in
RelationalDatabaseSession. It might be best to have two subclasses
of CollectionEntry and try to do some things polymorphically.

2. It *might* be a good idea to move some functionality currently
implemented on RelationalDatabaseSession to CollectionPersister.
This would let you subclass CollectionPersister to change behaviour
for deep lazy collections. On the other hand, persisters are not
allowed to keep any mutable state (obviously), so that might result
in lots of nasty calls to/from the session.

3. What happens when a Map starts out in a role where its doing
"deep lazy" initialization and then moves to a role where its doing
lazy or eager initialization? You will have the wrong wrapper class.
This is a somewhat hard problem....

In light of (1) and especially (2), I would consider a different
design. Continue using the current collection wrappers and make a
single change:

  * write() and read() take an argument: the index being accessed

SessionImplementor.initialize() and SessionImplementor.dirty()
also recieve that argument. The wrapper itself never knows how
fully initialized it is. This is consistent with the current
design. The CollectionEntry keeps track of which elements changed
and which are initialized.

Subclass CollectionEntry and/or CollectionPersister to implement
the functionality that varies between lazy and deep lazy.

> There are surely some things I haven't considered or don't fully
> appreciate. Some areas that concern me:

> 1. The notion of ce.initialized must be refined. There will be
> collections that have a persister, but have not been read entirely.
> I.e., ce.initialized, but elements() not valid; see: flush,
> updateReachableCollection, searchForDirtyCollections,
> prepareCollectionForUpdate.

> 2. prepareCollectionForUpdate must be smarter, e.g., deciding when to
> totally recreate a collection versus when to update only; this may be
> a separate dirty case verus a copied or moved case, I'm not sure.

> 3. How does this affect copying and reusing (sub-)collections?

Doug, why don't you create a new CVS branch for playing with this
stuff and have a look at implementing the logic you described
on CollectionEntry or a subclass. (We might need to make it a
non-inner class.)

Re: [Hibernate-devel] deep lazy collections

An object relational-mapping (ORM) library for Java

Re: [Hibernate-devel] deep lazy collections