Re: [Sparrow-devel] JDO Implementation Comments

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

> Code Modification
> 
> We did not do code modification, but I had a play with the BCEL library 
> and I think that that would be a candidate for use. There is (was?) a 
> better library from IBM alphaworks that presents an API similar to 
> reflection, but allowing modification of byte-code, but I think that was 
> not under open source. The enhancement would need to be pluggable, to 
> allow us to try out some different implementations.

I've looked at various open source bytecode manipulation APIs and
they're much the same.

I think BCEL will have the largest community and best support probably
(because it's apache) and so I think that would be the best choice.

> We used source-code modification and reflection, ie the jdoXXX calls 
> were in a PersistenceCapableImpl superclass from which all our db 
> classes descended,

This isn't possible. That would require multiple inheritance when the PC
already extends something else.

> and the ClassManagers used reflection to read the 
> actual values. Classes encapsulated all fields with getXXX and setXXX 
> methods and explicitly called jdoMarkDirty() methods on modification. 

With the way the spec is set up, the state manager will know everything
(it knows more than the PC does itself!) so we won't need reflection
anywhere (that I know of).

> Having done this I can see why code modification is necessary - we had a 
> few bugs when a method was added but the jdoMarkDirty() was not :-) I 
> would suggest that an implementation should aim at hand-modified source 
> and worry about code-modification in parallel.

Perhaps, though we have some work already done towards it--and it
appears there is a separate body that Andy knows of that we may be able
to use some code (including an enhancer) from.

Either way, code enhancement is not a big project because it's pretty
much spelled out in the spec and I'm not sure we want to deviate from
that because of compatibility requirements.

> HOLLOW state
> 
> We did not implement the HOLLOW state, which made the code for 1-1 
> relationships nasty. Every relationship getter required a 
> jdoLoad("relationshipName") method to trigger loading of the appropriate 
> object. I would say HOLLOW state is a must.
> 
> Caching
> 
> Our PersistenceManagerImpl maintained 2 caches - a memory sensitive 
> cache like objectbridge, and a dirtyCache which maintained only objects 
> marked as PERSISTENT_DIRTY, PERSISTENT_NEW or PERSISTENT_DELETED in a 
> list. This meant that db checking was only the small number of changed 
> objects. This is a great boon for the JDO.
> 
> savepoint()
> 
> As hinted above, we extended the pm to have a checkpoint() method, which 
> proved extremely useful. This is less so now, as the spec has been 
> modified to allow the user to call setRetainValues(true) so that objects 
> are not made HOLLOW on commit. However there is still a problem 
> potentially with objects that outlive a pm. For example, if you create 
> an object and then want to cache it in a singleton, then changes to the 
> object are not automatically persisted, unless the singleton makes a 
> commit(). It then has to re-associate all objects with a new PM in order 
> to record new changes to the objects. It was much simpler to have a PM 
> associated with the singleton and call checkpoint() periodically. Note 
> that this is on the TO-DO list of the 1.0 spec as savepoint().

I think I understand you. So you're talking outside of transactions?

> The new optional 'optimistic transactions' part of the spec is a good 
> fit for this functionality too, and better in that it doesn't maintain a 
> connection to the database for a long lived pm. This allows usage 
> patterns like creating a PM and storing it in the session of a web 
> transaction, and managing all interactions through this PM. We had to 
> abandon this idea since our implementation maintained an open connection 
> for each open PM, and we rapidly ran out of connections.

Does the spec require one connection per PM? I would think that would be
a bad thing. I would think the db connection pool should be separated
from the PM so the PM could just grab a connection when it needed one.

> Change logging
> 
> One of the interesting things about our implementation was the 
> changelog. Any change to the database was recorded by a Change object, 
> that held the className, fieldNames and the old and new values of 
> changed fields. The fieldName was used so that the Change object could 
> be used to apply changes to a database with a different schema, or where 
> the mapping had changed. This was used to synchronize our web database 
> (MySQL) with our internal database (MS SQL Server). It worked extremely 
> well, and I would heartily recomment this approach. It was implemented 
> by having the PersistenceManagerFactory (nb, not the individual PMs) 
> fire transactionCommitted events with a list of changes. This allows the 
> maximum amount of flexibility in monitoring changes on the database. the 
> changes were logged in an XML format for transfer.

Interesting.

> Queries
> 
> The most we ever did with queries was to filter for objects. It was 
> never necessary to use complex queries, since the relationships 
> maintained by JDO allow you to navigate to the necessary object. I'm 
> sure anyoune who has had to embed queries in code and debug them would 
> agree that its easier to avoid it than work with it. So I don't think 
> complex query support is essential for early versions.
> 
> Having said that, over time it would be nice to support variants like 
> OQL and EJBQL for the implementation, since this would allow an 
> isql-like app that manipulates live object data. Now that would be a 
> killer app.
> 
> IMHO the implementation of querying should be aimed at two uses:
> 
> 1. I have a collection of instances and I want to use a query construct 
> to sort/filter them. So the filter must be applicable to a set of 
> objects without hitting the database.
> 
> 2. I want to fetch data from the database according to a filter. This is 
> easily implemented by navigating an extent and using the same filter as 
> in (1) above, but this is extremely inefficient to the point of being 
> useless in large systems. Thus the system must be able to pass all or 
> part of the query 'down the chain' to a lower level component that can 
> use all or part of it to generate an SQL query. The full filter can then 
> be applied to the smaller set of objects returned. This will be crucial 
> for real world use.

Okay. As you suggested, I envision the querying engine to be one of the
later modules developed as it is rather complicated and because you can
have a very useful piece of software without it (see Ozone for example).

Thanks for your comments!
-- 
Joel Shellman
Comprehensive Internet Solutions -- Building business dreams.
[ web design | database | e-commerce | hosting | marketing ]
iKestrel, Inc.  http://www.ikestrel.com/