[Sparrow-devel] JDO Implementation Comments

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi guys, I was pointed to this group by Thomas Mahler of the 
objectbridge group.

I am interested in helping with the sparrow/objectbridge efforts to 
implement JDO in open source. What follows is a description of what we 
did, with some thoughts about the JDO spec as it currently stands, and 
what I think is important to target.

I am moving between countries at the moment, and don't have permanent 
Internet connection, so this is off the top of my head.

At my former employer we implemented a persistence mechanism based on 
the JDO early drafts. It has been in use for over a year now on our 
website, and coped well with a large amount of traffic (30,000 distinct 
page views in a day, with heavy database access, running at a peak of 
25+ pages per second). The library interfaces with a MySQL server on the 
web site, and MS SQL Server on the internal side. Data is synchronized 
in real time through the persistence libraries. I can vouch for the 
suitability of the spec for real world use :-)

Code Modification

We did not do code modification, but I had a play with the BCEL library 
and I think that that would be a candidate for use. There is (was?) a 
better library from IBM alphaworks that presents an API similar to 
reflection, but allowing modification of byte-code, but I think that was 
not under open source. The enhancement would need to be pluggable, to 
allow us to try out some different implementations.

We used source-code modification and reflection, ie the jdoXXX calls 
were in a PersistenceCapableImpl superclass from which all our db 
classes descended, and the ClassManagers used reflection to read the 
actual values. Classes encapsulated all fields with getXXX and setXXX 
methods and explicitly called jdoMarkDirty() methods on modification. 
Having done this I can see why code modification is necessary - we had a 
few bugs when a method was added but the jdoMarkDirty() was not :-) I 
would suggest that an implementation should aim at hand-modified source 
and worry about code-modification in parallel.

HOLLOW state

We did not implement the HOLLOW state, which made the code for 1-1 
relationships nasty. Every relationship getter required a 
jdoLoad("relationshipName") method to trigger loading of the appropriate 
object. I would say HOLLOW state is a must.

Caching

Our PersistenceManagerImpl maintained 2 caches - a memory sensitive 
cache like objectbridge, and a dirtyCache which maintained only objects 
marked as PERSISTENT_DIRTY, PERSISTENT_NEW or PERSISTENT_DELETED in a 
list. This meant that db checking was only the small number of changed 
objects. This is a great boon for the JDO.

savepoint()

As hinted above, we extended the pm to have a checkpoint() method, which 
proved extremely useful. This is less so now, as the spec has been 
modified to allow the user to call setRetainValues(true) so that objects 
are not made HOLLOW on commit. However there is still a problem 
potentially with objects that outlive a pm. For example, if you create 
an object and then want to cache it in a singleton, then changes to the 
object are not automatically persisted, unless the singleton makes a 
commit(). It then has to re-associate all objects with a new PM in order 
to record new changes to the objects. It was much simpler to have a PM 
associated with the singleton and call checkpoint() periodically. Note 
that this is on the TO-DO list of the 1.0 spec as savepoint().

The new optional 'optimistic transactions' part of the spec is a good 
fit for this functionality too, and better in that it doesn't maintain a 
connection to the database for a long lived pm. This allows usage 
patterns like creating a PM and storing it in the session of a web 
transaction, and managing all interactions through this PM. We had to 
abandon this idea since our implementation maintained an open connection 
for each open PM, and we rapidly ran out of connections.

Change logging

One of the interesting things about our implementation was the 
changelog. Any change to the database was recorded by a Change object, 
that held the className, fieldNames and the old and new values of 
changed fields. The fieldName was used so that the Change object could 
be used to apply changes to a database with a different schema, or where 
the mapping had changed. This was used to synchronize our web database 
(MySQL) with our internal database (MS SQL Server). It worked extremely 
well, and I would heartily recomment this approach. It was implemented 
by having the PersistenceManagerFactory (nb, not the individual PMs) 
fire transactionCommitted events with a list of changes. This allows the 
maximum amount of flexibility in monitoring changes on the database. the 
changes were logged in an XML format for transfer.

Queries

The most we ever did with queries was to filter for objects. It was 
never necessary to use complex queries, since the relationships 
maintained by JDO allow you to navigate to the necessary object. I'm 
sure anyoune who has had to embed queries in code and debug them would 
agree that its easier to avoid it than work with it. So I don't think 
complex query support is essential for early versions.

Having said that, over time it would be nice to support variants like 
OQL and EJBQL for the implementation, since this would allow an 
isql-like app that manipulates live object data. Now that would be a 
killer app.

IMHO the implementation of querying should be aimed at two uses:

1. I have a collection of instances and I want to use a query construct 
to sort/filter them. So the filter must be applicable to a set of 
objects without hitting the database.

2. I want to fetch data from the database according to a filter. This is 
easily implemented by navigating an extent and using the same filter as 
in (1) above, but this is extremely inefficient to the point of being 
useless in large systems. Thus the system must be able to pass all or 
part of the query 'down the chain' to a lower level component that can 
use all or part of it to generate an SQL query. The full filter can then 
be applied to the smaller set of objects returned. This will be crucial 
for real world use.