From: Ray Z. <rz...@co...> - 2004-11-08 23:13:58
|
Simon, >> And you say this full cache is necessary for the consistency circular >> references. You mean to avoid infinite loops when you have A set to >> auto-fetch B which is set to auto-fetch A? Seems to me that you >> should be able to detect this type of thing when the classes are >> configured. While I think a cache is a nice option that I may very >> well use, I don't think it should be mandatory unless it's absolutely >> necessary. Can you give me an example where consistency makes it >> absolutely necessary? > > For example, you have a Book object with many Authors. If the > application loads a book with the list of authors, adds another author > to this book and asks the new author about its parent book, the > current SPOPS implementation will re-fetch the book object, > potentially ignoring the changes that were made to original book > object. > > However, I only now realized that you suggest saving both the > parent-to-child reference and the reverse reference in the object > fields. (I distinguish between $author->{book_id}, a number, and > $author->{book}, an object. Let me know if I understood this > correctly.) I think you've got it right, except I'm not sure what you mean about the reverse reference case. Maybe the myA vs fetch_myA stuff below clarifies what I'm suggesting. > Thus, once $book->{list_of_authors} is populated, adding a new author > to the book should add the new object to this list, plus it should set > the field $author->{book} to the original book object. This will make > the above situation impossible. Right. Except that 'list_of_authors' in $book implies that you've specified a reverse fetch of the book field in author, so $author->{book} is just an id, not an object. More on object vs id below. > However, inconsistencies still may occur if: > > 1) I create a new author-book relationship by setting the field > $author->{book_id} instead of saying $book->add_author($author). This > can be discouraged as an incorrect way of altering data, of course, > but logically both make sense, and I'd like to be able to use them > both. > > Or, 2) if I am working with a second relationship, say > books-to-artists (illustrators). In this case, in one place in my > code, I could retrieve a book object by saying $artist->book, and then > in another place I'll call $author->book, and even though they may > refer to the same book, they will always be two different objects. Right. Without a cache these inconsistencies are always possible. But this is a "global" issue with OOP frameworks like SPOPS. I think Chris has taken the right approach in letting the developer decide at the application level whether s/he needs to always maintain that consistency and if so whether to use SPOPS level caching or application level caching/logic to ensure consistency. (And I understand there was a bug in SPOPS caching which prevented this from working correctly). But I don't think there is anything in the new has_a design that REQUIRES one to maintain consistency though an SPOPS level cache, right? I think the only thing you need to do is ensure that you don't have any auto-fetching loops (might even want to include lazy-fetching) when you do the configuration. In other words, you don't want to have a book auto-fetch it's list of authors AND have the author set to auto-fetch its book, creating an infinite loop. Even if you use a cache this would cause circular references which cause a problem for garbage collection unless you use weak references. I think this checking is important, but I haven't honestly given any thought to how to implement it. > So looks like cache is still necessary. Only if you need to guarantee a single in-memory copy for a process. I argue that this is an arbitrary requirement. In a read-only environment, it really doesn't matter (except for resource usage) if you have multiple copies of the same object in memory. And in a web environment, even using a simple cache doesn't guarantee consistency across multiple processes (apache children), so you still need a higher level synchronization mechanism to ensure consistency. >> It was for completeness and to offer a mode that is equivalent to >> current has_a behavior, that is, the field normally just gives you an >> id, but you also have a convenience method for fetching the object as >> well. My idea was that any 'has_a' spec, including 'manual', would >> create convenience methods for fetching the related objects. The >> 'auto' and 'lazy' options would simply call these methods >> automatically at the appropriate time and stash the return values in >> the object. So in the way I was picturing things, implementing >> 'manual' would simply be the first step in implementing 'auto' and >> 'lazy'. > > I feel dumb - I still don't quite get it. However, in your original > examples the method X->myA returns the id of A in the case of manual > fetch and A itself in the case of lazy/auto fetch, right? In my view, > X->myA always return the id and X->fetch_myA always returns the object > (I tend to use them like $author->book_id and $author->book in my > applications). So there is no need for manual fetching. > > I think that having X->myA return inconsistent values may be confusing. > Let me know what you think. Perhaps I am still missing the utility of > manual fetching. My thought was that specifying 'auto' or 'lazy' are equivalent to saying "this field is an object". Specifying 'manual' is equivalent to saying "this field is an object id". So X->myA always returns the value stored in the field and X->fetch_myA always returns the object. >> Without the manual option, you can't specify a relationship at all >> without having it define auto-fetching behavior. You can't, for >> example, auto-remove an object without having it also auto-fetched >> (which I can imagine you might want if you typically only need to >> deal with the ID of the secondary object). > > But in this case you still have to fetch the dependent object, because > it may define its own rules of auto-removal of even more objects. But the fetch only happens for the purpose of correctly doing the remove ... the 'manual' specifier still means that the field holds an object id, not an object. >> Just curious, does your implementation of 'auto' generate a public >> 'fetch_myA' method, for example? > > See above - even if the fetch method name ('alias' in the current > terminology') is not specified in the configuration, it'll be > auto-created by using the name given to the target class. (I mean the > name of config hash key for the target class, not its Perl name. In > the example I sent you, X_alias is such a name.) The problem I see with this is that it generates clashes when you have multiple fields with the same class. We need a method name that is unique for the field we want to fetch, not just for the class we use to fetch it. I think using the class alias is left-over from the old has_a config which used the class as the hash key (which you agreed is detestable :-). I vote once again that we stick with my proposal to use 'fetch_' prepended to the name of the field, by default, and allow an option to explicitly specify a method name. >>> OTOH, there are three types of removes - 'auto', 'manual' and >>> 'forget'. 'Auto' means complete removal of dependent objects, >>> 'forget' - nullifying id fields pointing to the removed objects, and >>> 'manual' - no action. The default should logically be 'forget', but >>> it may conflict with no autosaving, so I'll have to set it to >>> 'manual'. >> OK, but what is the 'reverse_remove'? Is specifying 'reverse_remove' >> => 'forget' in a 'has_a' the same as specifying 'remove' => 'forget' >> in the corresponding 'has_many'? If so, which one takes precedence if >> they are inconsistent? It looks like 'reverse_remove' => 'forget' is >> equvalent to what I called 'null_by', right?. I personally think that >> having multiple (and possibly conflicting) ways/places of defining >> the behavior for a single relationship is asking for trouble. I think >> it will make it difficult to write correct and clear documentation >> and it will create some debugging nightmares. (More on this below) > > This should not be a problem, because in my current proposal the > programmer specifies either has_a or has_many (which implies the > reverse has_a), so no conflicts should be possible. However, if we > change the syntax, this issue will go away. But since you can put a 'has_many' in Book and a 'has_a' in Author, for example, where Author has a 'book' field, I think they can be inconsistent. In my proposal you, for the 'has_a' in Author, you either specify a forward or reverse direction with no way to specify something conflicting in the Book class. >> Why do you include both 'link_class' and 'link_class_alias'? Aren't >> they redundant? (see [1] below). > > 'Link_class' refers to the Perl class name, 'link_class_alias' - to > the method name used to retrieve its instances (this is your > 'list_field' in the 'link' hash). But can't you always get the one, given the other? >> And I suppose the 'table' is only necessary if you don't specify the >> 'link_class' and vice versa, right? > > Yup. I am a little unhappy that in your proposal one has to have a > Perl class for the linking table even if one is never going to use it, > but I guess this is necessary for the sake of the uniform syntax. Which is why I wouldn't protest too much of we decided to leave the old 'links_to' syntax in untouched, at least for the time being. You would only need to define the linking class if you needed the auto-fetching/removing behavior. >> [1] I confess I never really did understand the purpose of the >> alias. What is the difference between the alias and the class? Isn't >> one of them redundant? > > The alias is used to generate access methods in other classes > referring to this one. In your configuration examples you always give > a value to the 'name' key, but if it's omitted, methods are given > names like 'fetch_X_alias'. Ah ... right ... detestable :-) Let's use something tied to the field name, not the class, as I mentioned above. Ray Zimmerman Director, Laboratory for Experimental Economics and Decision Research 428-B Phillips Hall, Cornell University, Ithaca, NY 14853 phone: (607) 255-9645 |