[Sparrow-devel] JDO Implementation Comments
Status: Inactive
Brought to you by:
ikestrel
From: Chris S. <sk...@us...> - 2002-04-17 13:04:18
|
Hi guys, I was pointed to this group by Thomas Mahler of the objectbridge group. I am interested in helping with the sparrow/objectbridge efforts to implement JDO in open source. What follows is a description of what we did, with some thoughts about the JDO spec as it currently stands, and what I think is important to target. I am moving between countries at the moment, and don't have permanent Internet connection, so this is off the top of my head. At my former employer we implemented a persistence mechanism based on the JDO early drafts. It has been in use for over a year now on our website, and coped well with a large amount of traffic (30,000 distinct page views in a day, with heavy database access, running at a peak of 25+ pages per second). The library interfaces with a MySQL server on the web site, and MS SQL Server on the internal side. Data is synchronized in real time through the persistence libraries. I can vouch for the suitability of the spec for real world use :-) Code Modification We did not do code modification, but I had a play with the BCEL library and I think that that would be a candidate for use. There is (was?) a better library from IBM alphaworks that presents an API similar to reflection, but allowing modification of byte-code, but I think that was not under open source. The enhancement would need to be pluggable, to allow us to try out some different implementations. We used source-code modification and reflection, ie the jdoXXX calls were in a PersistenceCapableImpl superclass from which all our db classes descended, and the ClassManagers used reflection to read the actual values. Classes encapsulated all fields with getXXX and setXXX methods and explicitly called jdoMarkDirty() methods on modification. Having done this I can see why code modification is necessary - we had a few bugs when a method was added but the jdoMarkDirty() was not :-) I would suggest that an implementation should aim at hand-modified source and worry about code-modification in parallel. HOLLOW state We did not implement the HOLLOW state, which made the code for 1-1 relationships nasty. Every relationship getter required a jdoLoad("relationshipName") method to trigger loading of the appropriate object. I would say HOLLOW state is a must. Caching Our PersistenceManagerImpl maintained 2 caches - a memory sensitive cache like objectbridge, and a dirtyCache which maintained only objects marked as PERSISTENT_DIRTY, PERSISTENT_NEW or PERSISTENT_DELETED in a list. This meant that db checking was only the small number of changed objects. This is a great boon for the JDO. savepoint() As hinted above, we extended the pm to have a checkpoint() method, which proved extremely useful. This is less so now, as the spec has been modified to allow the user to call setRetainValues(true) so that objects are not made HOLLOW on commit. However there is still a problem potentially with objects that outlive a pm. For example, if you create an object and then want to cache it in a singleton, then changes to the object are not automatically persisted, unless the singleton makes a commit(). It then has to re-associate all objects with a new PM in order to record new changes to the objects. It was much simpler to have a PM associated with the singleton and call checkpoint() periodically. Note that this is on the TO-DO list of the 1.0 spec as savepoint(). The new optional 'optimistic transactions' part of the spec is a good fit for this functionality too, and better in that it doesn't maintain a connection to the database for a long lived pm. This allows usage patterns like creating a PM and storing it in the session of a web transaction, and managing all interactions through this PM. We had to abandon this idea since our implementation maintained an open connection for each open PM, and we rapidly ran out of connections. Change logging One of the interesting things about our implementation was the changelog. Any change to the database was recorded by a Change object, that held the className, fieldNames and the old and new values of changed fields. The fieldName was used so that the Change object could be used to apply changes to a database with a different schema, or where the mapping had changed. This was used to synchronize our web database (MySQL) with our internal database (MS SQL Server). It worked extremely well, and I would heartily recomment this approach. It was implemented by having the PersistenceManagerFactory (nb, not the individual PMs) fire transactionCommitted events with a list of changes. This allows the maximum amount of flexibility in monitoring changes on the database. the changes were logged in an XML format for transfer. Queries The most we ever did with queries was to filter for objects. It was never necessary to use complex queries, since the relationships maintained by JDO allow you to navigate to the necessary object. I'm sure anyoune who has had to embed queries in code and debug them would agree that its easier to avoid it than work with it. So I don't think complex query support is essential for early versions. Having said that, over time it would be nice to support variants like OQL and EJBQL for the implementation, since this would allow an isql-like app that manipulates live object data. Now that would be a killer app. IMHO the implementation of querying should be aimed at two uses: 1. I have a collection of instances and I want to use a query construct to sort/filter them. So the filter must be applicable to a set of objects without hitting the database. 2. I want to fetch data from the database according to a filter. This is easily implemented by navigating an extent and using the same filter as in (1) above, but this is extremely inefficient to the point of being useless in large systems. Thus the system must be able to pass all or part of the query 'down the chain' to a lower level component that can use all or part of it to generate an SQL query. The full filter can then be applied to the smaller set of objects returned. This will be crucial for real world use. |