From: Ray Z. <rz...@co...> - 2004-04-26 15:46:20
|
Hi Simon, This is going to be a quick, first-pass response to the issues you raise ... For number 1, let me respond with a general comment about caching and keeping multiple Perl objects in sync, etc, since I think that is the core issue. First, let me say that I have not used caching at the SPOPS level, but I have used caching of SPOPS objects at the application level in my work. In my opinion, there are two ways of handling this issue. 1. SPOPS assumes that the application is keeping track of how many copies of an object are in memory and which ones have unsaved changes, etc. The only MASTER copy of the object is the saved one. In this case SPOPS should not do anything special to try to keep things in sync, that's the job of the application level. 2. SPOPS always assumes a 1 to 1 correspondence between the Perl object and the object in the database. Do caching at the SPOPS level with SPOPS making sure there is never more than one copy of the object in memory. Copies of the object are simply multiple references to a single cached object. This is the approach used by Tangram if I'm not mistaken. Unless I'm missing something, this seems pretty clean and straightforward. However, it doesn't doesn't address all of the consistency issues in contexts where you have multiple processes running simultaneously (e.g. multiple Apache children), where there is one copy of an object in the database, but multiple copies in the memories of various processes, in their individual caches. To address this at the SPOPS caching level you have to use some sort of a shared memory cache with synchronization/locking mechanisms, which in my opinion takes you back to handling the issue at the application level again. So the bottom line, for me is, unless you are in the context where you only have one process running at a time (not the case for my apps), you ALWAYS have to handle the issue at the application level anyway. Having SPOPS do the caching as in (2) can help you with that, but assumptions can never be made at the SPOPS level that even a single cached object is necessarily in sync with the database since some other process may have changed it behind your back. On Apr 25, 2004, at 6:29 PM, Vsevolod (Simon) Ilyushchenko wrote: > 1a. The whole thing won't work properly without caching turned on. > Assume that an A has many X'es, and the table X has a column 'a_id'. > When I pass a list of ids to the function A->list_of_x_add (similar to > linksto_add), the application may have various copies of X'es with > those ids floating around, some of whose field values may have been > changed. Since (at least in the auto-save case) it does not make sense > to just update the 'a_id' field in the database and not the rest of > the fields, we need to save the relevant X objects. But we don't know > about them without caching! My proposal here (found in last section under "Fetch" in my 7/3/01 post) was to pass in the X objects, not the ids ... For auto_by and lazy_by, two additional methods are created in A, one for adding objects to its list of X's and one for removing objects from it. These can only be used after the A object has been saved. Their primary purpose is to keep the list in memory in sync with what's in the database, so when using auto_by or lazy_by it's a good idea to use only these methods to add or remove corresponding X's. If the 'name' parameter is present, the methods are named add_<name> and remove_<name>. If the 'name' parameter is not present they are named add_to_<list_field> and remove_from_<list_field>. The method to add X's takes an X object or an arrayref of X objects as inputs and returns the same object or arrayref to the objects after saving them. The method to remove X's takes an id or arrayref of ids and returns the number of X's successfuly removed. > 1b. However, even the current cache is inadequate for the task. Right > now, the first time an object is retrieved, it's saved in the cache. > If it's retrieved again, a copy of the object is returned. Thus, > whoever asked for the object first, has the "master" copy, meaning > that everybody else will see his changes. But if other requestors make > changes and the first requestor's copy is saved, their changes will be > lost. I'm not really familiar with SPOPS caching and admit I haven't paid attention to your previous posts on caching, so educate me here. My understanding is the SPOPS doesn't implement caching, it just provides hooks to do it. So is this returning of a new copy of a cached object instead of a reference to the existing cached object a feature(bug) of the hooks in SPOPS or of a particular implementation of caching? As I mentioned above, I think any caching at the SPOPS level should make sure there is only ONE copy of the object in memory. I don't see the purpose of having a "master" copy in memory with other copies of it. The only "master" copy of the object is the one in the datastore. > Here is a sample code that illustrates the problem. Assume that A > still has many X'es. > > my $a = A->fetch(1); > my $x = A->list_of_x->[0]; > my $a1 = $x->myA; > > Here, logically, $a and $a1 refer to the same object with the same ID. > But they are different Perl objects. If I change $a1 and save $a, my > changes to $a1 will be lost. > Is there a reason the cache does not simply return the stored object? I agree. An SPOPS level cache should always return the same object, not a copy. > 1c. Normally, calling $a->list_of_x_add($x) will make sure that the > changes to the 'a_id' field in the X table are saved. There is a fun > special case, though - what if $a has been just created and not saved > yet? There are two possible behaviors: a) save $a behind the scenes to > obtain a_id, or b) throw an error requiring the user to call save() > explicitly. Variant a) makes list_of_x_add() behave similarly to the > normal case, but does something that the user may not want. Variant > b), conversely, exposes some inner workings of SPOPS to the user, but > does not do a potentially undesirable save. What is preferable here? I say (b). Quoting from the same paragraph of my proposal again "These can only be used after the A object has been saved." ... implying that it throws an exception otherwise. I don't think this necessarily exposes inner workings of SPOPS to the user. I think it just needs to be documented that these methods throw exceptions if called for objects that are not saved. > 2. You may have noticed that I used 'has_many', not 'has_a' as Ray > originally suggested. I do think it's cleaner to separate them, but if > you insist, I will eventually roll them back into one - I just > separate them now for the ease of coding. I'm not sure I've seen how you're using 'has_many' in the configuration. It sounds to me though that it's putting the definition of the relationship at the other end, that's all. Does this then replace the manual_by|auto_by|lazy_by configuration syntax? I guess I would vote for sticking with only the 'has_a' unless and until I see the full detail of the syntax spelled out and can see that it doesn't bring up new issues. I spent a lot of time on the syntax I proposed and am fairly comfortable that it is general and consistent. > 3. For the many-to-many 'links_to' case (where A has-many Bs via the > linking table X), Ray suggested having the configuration hash in the X > class, not in the A class where 'links_to' lives now. This has the > added benefit of adding more fields to X if necessary, but IMO also a > major drawback of changing the API. Why don't we try to keep the API > as constant as possible and leave the 'links_to' stanza in A? We can > add new hash keys to specify extra X fields and to create a Perl class > corresponding to X if necessary. On this point (and the previous one now that I think about it), my approach regarding where to put the configuration hash was to put it in the class which has the fields. The configuration hash for a class defines the meaning of each of its fields. It can also add behavior related to those fields to other classes. I think it's essential that we are consistent about where we put configuration. You propose putting the configuration in A ... but why A and not B? > 4. Ray also suggested two different APIs for the simple has_a case (an > X has one A). If a dependent object is autofetched, $x->myA returns an > instance of A. However, if the fetch is manual, $x->myA returns a_id, > and only $x->fetch_myA returns an actual object. Is there a reason to > do it differently? My thought here was that if myA is an auto or lazy-fetched field, then you always assume that $x->{myA} is an object. Otherwise, you always assume that $x->{myA} is an id. You still have a convenience method to fetch the corresponding object if you need it, but even after fetching the object, $x->{myA} is still just the id. It just seemed the most consistent to me. Otherwise, for manual fetches you end up with the case where you don't know when you access $x->{myA} whether to expect an id or an object, since it depends on whether or not you've done the manual fetch. > 5. The issue of avoiding circular saves can be addressed simply by > setting a certain flag after an object is saved and checking for this > flag each time an object is reached in the relationship graph during > the save. (Obviously, this will require full caching as described > above.) Let me know if this for some reason won't work. Why does this require full caching? Maybe an example would help. I don't think any of what I proposed requires caching, just the assumption that consistency is being maintained, with or without a cache, by the application level logic. Thanks again, Simon, for all your work on this area ... Ray Zimmerman Director, Laboratory for Experimental Economics and Decision Research 428-B Phillips Hall, Cornell University, Ithaca, NY 14853 phone: (607) 255-9645 fax: (815) 377-3932 |