Thread: [OJB-developers] OJB relationships & more
Brought to you by:
thma
From: Christian S. <chr...@ne...> - 2001-10-28 18:34:43
|
Hello, a few questions: Q1: ====================================== Even though database connection configuration is done in the JdbcConnectionDescriptor element, configuring of JNDI lookup for datasources must be done in OJB.properties with the ConnectionFactoryClass property (and the JdbcConnectionDescriptor/url.dbalias element is then "misused" as the jndi lookup name). I find that rather confusing. Why not have a a "jndi-lookup" element inside the JdbcConnectionDescriptor instead? Q2: ====================================== according to the docs, OJB supports "lazy loading" of relationships through the proxy class mechanism. This means, that when an objects are loaded, for each loaded object and every to-N attribute a separate query is issued to retreive the foreign key attributes for the referenced objects. Later, when the referenced objects are accessed, each referenced object is loaded individually with a primary-key-query (I call this "singular loading"). This leads to the following scenario: say, we have classes A, B, C. Class A has 2 collection attributes which point to B and C, respectively. B and C both have dynamic proxyies configured. Loading all instances of A would lead to the following processing sequence: 1) perform "select * from TABLE_A". 2) Iterate the result set. For each row: 2.1) instantiate the A object 2.2) perform "select b.foreign_key from TABLE_A a, TABLE_B b where b.foreign_key = a.primary_key" 2.3) Iterate the result set. For each row: 2.3.1) instantiate the proxy and add to collection a_bs 2.4) perform "select c.foreign_key from TABLE_A a, TABLE_C c where c.foreign_key = a.primary_key" 2.4) Iterate the result set. For each row: 2.4.1) instantiate the proxy and add to collection a_cs then, for each access to an object referenced through the collections a_bs or a_cs, another SQL statement would be issued. Now say we have 2 instances of A in the database, which both hold 20 references in each collection attribute. I was to access all objects in the above described data set, over time the number of SQL statement issued would be 1 + number_of_collection_attributes + (number_of_collection_attributes * number_of_referenced objects), i.e. 1 + 2 + 2 * 20 = 43. This loading scheme is extremely inefficient, especially since the number of statements issued increases linearily with the number of objects referenced (and accessed) in the collections. A smart JDBC programmer (or O/R tool) could handle all with 1 (!) outer join statement, or at most 3 statements. If I am right, I dont think the proxy mechanism increases efficiency a lot, since it only saves the object instantiation, but incurs a possibly 1000-fold increase of DBMS accesses. I think a lazy relationship loading mechanism must be implemented such, that it performs the foreign-key join query only when the relationship itself is accessed ("resolved"). It should load the full object data (vs. only the foreign keys). For proxy classes, it should keep the raw data in a cache attached to the relationship, and instantiate from that cache. Everything else (especially singular loading) is unacceptable in a production environment. Did I miss something? Q3: ================================== I am wondering about the field/class ids in the repository.xml file. Why assign separate ids when the names would do just as well? Q4: ================================== Are there any plans to remove the need for separately mapping foreign key attributes? I find this a rather annoying requirement, which clearly defeats the notion of "transparence". Of course I am willing to lend a hand in any of these matters regards, Christian |
From: Thomas M. <tho...@ho...> - 2001-11-02 11:16:57
|
Hi Christian, I was on a short vacation so my answer is a bit late... Christian Sell wrote: > > Hello, > > a few questions: > > Q1: ====================================== > Even though database connection configuration is done in the > JdbcConnectionDescriptor element, configuring of JNDI lookup for datasources > must be done in OJB.properties with the ConnectionFactoryClass property (and > the JdbcConnectionDescriptor/url.dbalias element is then "misused" as the > jndi lookup name). I find that rather confusing. Why not have a a > "jndi-lookup" element inside the JdbcConnectionDescriptor instead? > That's a good idea! The JNDI based Connectionfactory was a quick hack to get someone started using OJB within a J2EE environment. Thus it will need some polishing... I will put your request on my todo-list. > Q2: ====================================== > according to the docs, OJB supports "lazy loading" of relationships through > the proxy class mechanism. This means, that when an objects are loaded, for > each loaded object and every to-N attribute a separate query is issued to > retreive the foreign key attributes for the referenced objects. Later, when > the referenced objects are accessed, each referenced object is loaded > individually with a primary-key-query (I call this "singular loading"). This > leads to the following scenario: > > say, we have classes A, B, C. Class A has 2 collection attributes which > point to B and C, respectively. B and C both have dynamic proxyies > configured. Loading all instances of A would lead to the following > processing sequence: > > 1) perform "select * from TABLE_A". > 2) Iterate the result set. For each row: > 2.1) instantiate the A object > 2.2) perform "select b.foreign_key from TABLE_A a, TABLE_B b where > b.foreign_key = a.primary_key" > 2.3) Iterate the result set. For each row: > 2.3.1) instantiate the proxy and add to collection a_bs > 2.4) perform "select c.foreign_key from TABLE_A a, TABLE_C c where > c.foreign_key = a.primary_key" > 2.4) Iterate the result set. For each row: > 2.4.1) instantiate the proxy and add to collection a_cs > > then, for each access to an object referenced through the collections a_bs > or a_cs, another SQL statement would be issued. Now say we have 2 instances > of A in the database, which both hold 20 references in each collection > attribute. I was to access all objects in the above described data set, over > time the number of SQL statement issued would be > > 1 + number_of_collection_attributes + (number_of_collection_attributes * > number_of_referenced objects), > i.e. 1 + 2 + 2 * 20 = 43. > > This loading scheme is extremely inefficient, especially since the number of > statements issued increases linearily with the number of objects referenced > (and accessed) in the collections. A smart JDBC programmer (or O/R tool) > could handle all with 1 (!) outer join statement, or at most 3 statements. > If I am right, I dont think the proxy mechanism increases efficiency a lot, > since it only saves the object instantiation, but incurs a possibly > 1000-fold increase of DBMS accesses. > Of course, if you iterate over all elements of the B- and C-collections Proxies are inefficient. It would make no sense to use proxies here, as you know in advance that all objects in the collection attributes have bo accessed. Proxies can improve performance if only "some" of the Bs and Cs have to be accessed. > I think a lazy relationship loading mechanism must be implemented such, that > it performs the foreign-key join query only when the relationship itself is > accessed ("resolved"). In my proxy design I don't diffentiate between 1:1 references and 1:n references. I will think of your idea to have special proxies for 1:n attributes. It should load the full object data (vs. only the > foreign keys). I see one possible disadvantage of your solution here: Say we have your proxies. Object A will then be created with two proxy attributes for the B- and C-collection. Now we have to access only the fourth element of the B-Collection. According to you proxy mechanism all 20 B-objects contained in B are instanciated on this access to the collection. You will end up with 19 superfluous full instanciations versus 19 superfluous instanciations of Proxy objects with my existing concept. For proxy classes, it should keep the raw data in a cache > attached to the relationship, and instantiate from that cache. Maybe a tough problem as OJB proxies must be serializable due to the client/server architecture. >Everything > else (especially singular loading) is unacceptable in a production > environment. > I will have a look at TopLink to see what they are doing here. > Did I miss something? > > Q3: ================================== > I am wondering about the field/class ids in the repository.xml file. Why > assign separate ids when the names would do just as well? > Class ID's are superfluous. Field ID's are necessary to provide a order relation for the Comparator used for sorting. There will be a complete review and cleanup of the existing DTD once we have included all 1.0 features to the existing DTD (n:m relationships, automatic foreignkey assignment, JNDI naming) > Q4: ================================== > Are there any plans to remove the need for separately mapping foreign key > attributes? I find this a rather annoying requirement, which clearly defeats > the notion of "transparence". > I'm not sure if I understand you right? Currently OJB requires to set foreign key attributes manually. (E.G: we create a new A with 20 B objects in a collection attribute. a.primary_key is computed automatically with the <autoincrement> features. But we have to set the b.foreign_key attributes referencing to a.primary_key manually.) This is of course annoying and has to be replaced by some automatic solution. This feature is already on the todo list. But I have not started working on it. > Of course I am willing to lend a hand in any of these matters > regards, > Christian > Thanks, Any help is appreciated a lot! Thomas > _______________________________________________ > Objectbridge-developers mailing list > Obj...@li... > https://lists.sourceforge.net/lists/listinfo/objectbridge-developers |
From: Thomas M. <tho...@ho...> - 2001-11-03 19:24:48
|
Hi Cristian, Christian Sell wrote: > > > I think, when a relationship is resolved, you will alost always end up > accessing all the objects therein. Your argument about the "fourth element" > is not convincing to me, as there is no notion of ordering in relationships. > You would always have to iterate the relationship and decide based on the > object state which one you need. And, if you only need a few particular > objects from a relationship, you should use an explicit query in the first > place. My point: when a relationship is resolved, the underlying semantics > are that ALL objects are requested. > Good argument. I added this as a feature request to the todo list. > What I do see is that an iteration over a relationship could be dropped > before the end is reached because some condition is met, e.g. because the > user stops paging in the list she is presented with. This should be handled > by a cursor-based relationship iterator, i.e. the iterator is internally > based on a JDBC ResultSet (an SQL cursor), and objects are only instantiated > when iterated over. > That's a good idea. OJB already supports Iterators (PersistenceBroker.getIteratorByQuery(...)). It's not difficult to write lazy collections (based on such iterators) instead just exhausting the Iterator to fill a Vector as it is done now in PersistenceBroker.getCollectionByQuery(...). I'll add this to the todo list. > > Class ID's are superfluous. Field ID's are necessary to provide a order > > relation for the Comparator used for sorting. > > There will be a complete review and cleanup of the existing DTD once we > > have included all 1.0 features to the existing DTD (n:m relationships, > > automatic foreignkey assignment, JNDI naming) > > Looking at the respoistory, I also see some need for clean up. The things > that immediately come to my mind were: > > - make the structure XML-like. Currently, the repository syntax is a hybrid > between XML and java properties files. As I see it, this is due to your > parser, which only works on the first level (it does not use a stack > internally). Therefore, you have introduced stuff like "<url.protocol>", > where > it should really be "<url><protocol></protocol></url>". I really think this > should be changed asap. I could donate a SAX ContentHandler base class which > makes this rather easy. Ivan provided a clean design for a DTD. (I guess you'll find it in the mailinglist archive. If not I can post it to you). When all features required for release 1.0 are there I'm planning to have a rewrite of DTD and Parser. Your help is of course welcome. > - introduce field-level ConversionStrategies. This is a must IMO. > - introduce the "jndi-lookup" subelement for the JdbcConnectionDescriptor > ALready on the todo list. > > > Q4: ================================== > > > Are there any plans to remove the need for separately mapping foreign > key > > > attributes? I find this a rather annoying requirement, which clearly > defeats > > > the notion of "transparence". > > > > > > > I'm not sure if I understand you right? > > Currently OJB requires to set foreign key attributes manually. > > (E.G: we create a new A with 20 B objects in a collection attribute. > > a.primary_key is computed automatically with the <autoincrement> > > features. But we have to set the b.foreign_key attributes referencing to > > a.primary_key manually.) > > This is of course annoying and has to be replaced by some automatic > > solution. This feature is already on the todo list. But I have not > > started working on it. > > Well, my point is going a bit further. I would even want to avoid the > b.foreign_key attribute altogether, as it clearly is a database artifact > that has no significance to the object model. After all, B already has a > my_a attribute. If you want to persist a plain Java class which was not > designed with OJB in mind (which is what transparence is about), it will > most certainly not have the b.foreign_key attribute. I am sure this can be > done rather easily, as the value is there in the target objects PK, and can > be retreived through metadata during storage operations (done this before). > OJB started as a MAPPING tool. That is we assumed that there is a full database model. The Developers I talked to had no objections against having primary key attributes and foreign key atributes in the persistent classes. Obviously those things don't belong to the domain model. but they they can be clearly marked as not being part of the domain model (e.g. by declaring them as "private transient"). As most OO designs don't allow direct access to attributes but only through getters and setters these "technical" attributes can completely hidden from the application developer. Thus I decided to consider this kind of transparency as "nice to have" but not essential. Regarding your idea to eliminate the need of foreign key attributes I agree that this should be possible without to much effort. > > Any help is appreciated a lot! > > I am currently making up my mind with respect to an object persistence > solution for my projects. TopLink is not an option for obvious reasons, and TopLink is really crowded with an immense mass of really good features. Also the mapping Workbench is really great. But it's so expensive. In my company we have developed an abstraction layer that encapsulates persistence layers to let projects decide whether they want to use TopLink or OJB. It's only a configuration entry to be changed... > EJB 2.0 is giving me a headache with all those interfaces and helper classes > you need to make it work, and the lack of O/R features like inheritance & > polymorphism. I agree! THAT is poor design (not my harmless foreign key attributes ;-) I also owe a lot to the OSS community, so it would be about > time to make a contribution myself. That's great. > As I would plan to use OJB in > production, one of my main requirements would be to remove all the > performance-critical issues like singular loading, and all other excess DBMS > calls (like those someone else mentioned about DLists, which allocate > sequence values for themselves and all elements - yuck). OK DLists must work this way. With a better SequenceManager (as it comes with the next release) they won't be a big Problem. What may become a performance bottle neck is the LockManager. Currently it's based on a DB table. That is all locking, unlocking, checking for locks etc. produces database traffic. For heavy duty apps a separate LockManager server is a must. For the singlevm mode a inmemory Lockmanager would be sufficient. > Second would be the > removal of non-transparency, > and a JDO interface is also quite important. The next release will be called 0.7. It will provide a completed client server architecture (now also the MetaData Layer will be accessed remotely). With this release there will be a solid persistence kernel working that could be used for a JDO Implementation. I see 2 possible approaches for the JDO implementation: 1. start coding and make as much "copy paste"-reuse from the ODMG implementation. 2. Start with a refactoring: - Extract everything from the ODMG implementation that might be useful for JDO too. Build a generic Object Transaction Kernel based on this refactoring. - Reimplement ODMG based on this new Kernel, - Implement JDO based on the new Kernel. This would be a layered architecture with - OJB PersistenceBroker as base - OJB Object Transaction Kernel (OTK) (based on the Broker) - ODMG and JDO Implementation build on top of the OTK. This is obviously the best way to go but will be a lot of work too... > Some of this would possibly require a redesign/reworking of the kernel. If > we could come to an agreement (possibly after more discussion), I may be > willing to make a significant time investment in the near future (towards > end of the year). What do you think? > Sounds great. Of course will need some more clarifications on details. But as mentioned before: I'm really glad to get support! cheers, Thomas > regards, > Christian |