Re: [Openinteract-dev] has_many progress

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Simon,

>> And you say this full cache is necessary for the consistency circular 
>> references. You mean to avoid infinite loops when you have A set to 
>> auto-fetch B which is set to auto-fetch A? Seems to me that you 
>> should be able to detect this type of thing when the classes are 
>> configured. While I think a cache is a nice option that I may very 
>> well use, I don't think it should be mandatory unless it's absolutely 
>> necessary. Can you give me an example where consistency makes it 
>> absolutely necessary?
>
> For example, you have a Book object with many Authors. If the 
> application loads a book with the list of authors, adds another author 
> to this book and asks the new author about its parent book, the 
> current SPOPS implementation will re-fetch the book object, 
> potentially ignoring the changes that were made to original book 
> object.
>
> However, I only now realized that you suggest saving both the 
> parent-to-child reference and the reverse reference in the object 
> fields. (I distinguish between $author->{book_id}, a number, and 
> $author->{book}, an object. Let me know if I understood this 
> correctly.)

I think you've got it right, except I'm not sure what you mean about 
the reverse reference case. Maybe the myA vs fetch_myA stuff below 
clarifies what I'm suggesting.

> Thus, once $book->{list_of_authors} is populated, adding a new author 
> to the book should add the new object to this list, plus it should set 
> the field $author->{book} to the original book object. This will make 
> the above situation impossible.

Right. Except that 'list_of_authors' in $book implies that you've 
specified a reverse fetch of the book field in author, so 
$author->{book} is just an id, not an object. More on object vs id 
below.

> However, inconsistencies still may occur if:
>
> 1) I create a new author-book relationship by setting the field 
> $author->{book_id} instead of saying $book->add_author($author). This 
> can be discouraged as an incorrect way of altering data, of course, 
> but logically both make sense, and I'd like to be able to use them 
> both.
>
> Or, 2) if I am working with a second relationship, say 
> books-to-artists (illustrators). In this case, in one place in my 
> code, I could retrieve a book object by saying $artist->book, and then 
> in another place I'll call $author->book, and even though they may 
> refer to the same book, they will always be two different objects.

Right. Without a cache these inconsistencies are always possible. But 
this is a "global" issue with OOP frameworks like SPOPS. I think Chris 
has taken the right approach in letting the developer decide at the 
application level whether s/he needs to always maintain that 
consistency and if so whether to use SPOPS level caching or application 
level caching/logic to ensure consistency. (And I understand there was 
a bug in SPOPS caching which prevented this from working correctly).

But I don't think there is anything in the new has_a design that 
REQUIRES one to maintain consistency though an SPOPS level cache, 
right? I think the only thing you need to do is ensure that you don't 
have any auto-fetching loops (might even want to include lazy-fetching) 
when you do the configuration. In other words, you don't want to have a 
book auto-fetch it's list of authors AND have the author set to 
auto-fetch its book, creating an infinite loop. Even if you use a cache 
this would cause circular references which cause a problem for garbage 
collection unless you use weak references.

I think this checking is important, but I haven't honestly given any 
thought to how to implement it.

> So looks like cache is still necessary.

Only if you need to guarantee a single in-memory copy for a process. I 
argue that this is an arbitrary requirement. In a read-only 
environment, it really doesn't matter (except for resource usage) if 
you have multiple copies of the same object in memory. And in a web 
environment, even using a simple cache doesn't guarantee consistency 
across multiple processes (apache children), so you still need a higher 
level synchronization mechanism to ensure consistency.

>> It was for completeness and to offer a mode that is equivalent to 
>> current has_a behavior, that is, the field normally just gives you an 
>> id, but you also have a convenience method for fetching the object as 
>> well. My idea was that any 'has_a' spec, including 'manual', would 
>> create convenience methods for fetching the related objects. The 
>> 'auto' and 'lazy' options would simply call these methods 
>> automatically at the appropriate time and stash the return values in 
>> the object. So in the way I was picturing things, implementing 
>> 'manual' would simply be the first step in implementing 'auto' and 
>> 'lazy'.
>
> I feel dumb - I still don't quite get it. However, in your original 
> examples the method X->myA returns the id of A in the case of manual 
> fetch and A itself in the case of lazy/auto fetch, right? In my view, 
> X->myA always return the id and X->fetch_myA always returns the object 
> (I tend to use them like $author->book_id and $author->book in my 
> applications). So there is no need for manual fetching.
>
> I think that having X->myA return inconsistent values may be confusing.
> Let me know what you think. Perhaps I am still missing the utility of 
> manual fetching.

My thought was that specifying 'auto' or 'lazy' are equivalent to 
saying "this field is an object". Specifying 'manual' is equivalent to 
saying "this field is an object id". So X->myA always returns the value 
stored in the field and X->fetch_myA always returns the object.

>> Without the manual option, you can't specify a relationship at all 
>> without having it define auto-fetching behavior. You can't, for 
>> example, auto-remove an object without having it also auto-fetched 
>> (which I can imagine you might want if you typically only need to 
>> deal with the ID of the secondary object).
>
> But in this case you still have to fetch the dependent object, because 
> it may define its own rules of auto-removal of even more objects.

But the fetch only happens for the purpose of correctly doing the 
remove ... the 'manual' specifier still means that the field holds an 
object id, not an object.

>> Just curious, does your implementation of 'auto' generate a public 
>> 'fetch_myA' method, for example?
>
> See above - even if the fetch method name ('alias' in the current 
> terminology') is not specified in the configuration, it'll be 
> auto-created by using the name given to the target class. (I mean the 
> name of config hash key for the target class, not its Perl name. In 
> the example I sent you, X_alias is such a name.)

The problem I see with this is that it generates clashes when you have 
multiple fields with the same class. We need a method name that is 
unique for the field we want to fetch, not just for the class we use to 
fetch it. I think using the class alias is left-over from the old has_a 
config which used the class as the hash key (which you agreed is 
detestable :-).

I vote once again that we stick with my proposal to use 'fetch_' 
prepended to the name of the field, by default, and allow an option to 
explicitly specify a method name.

>>> OTOH, there are three types of removes - 'auto', 'manual' and 
>>> 'forget'. 'Auto' means complete removal of dependent objects, 
>>> 'forget' - nullifying id fields pointing to the removed objects, and 
>>> 'manual' - no action. The default should logically be 'forget', but 
>>> it may conflict with no autosaving, so I'll have to set it to 
>>> 'manual'.
>> OK, but what is the 'reverse_remove'? Is specifying 'reverse_remove' 
>> => 'forget' in a 'has_a' the same as specifying 'remove' => 'forget' 
>> in the corresponding 'has_many'? If so, which one takes precedence if 
>> they are inconsistent? It looks like 'reverse_remove' => 'forget' is 
>> equvalent to what I called 'null_by', right?. I personally think that 
>> having multiple (and possibly conflicting) ways/places of defining 
>> the behavior for a single relationship is asking for trouble. I think 
>> it will make it difficult to write correct and clear documentation 
>> and it will create some debugging nightmares. (More on this below)
>
> This should not be a problem, because in my current proposal the 
> programmer specifies either has_a or has_many (which implies the 
> reverse has_a), so no conflicts should be possible. However, if we 
> change the syntax, this issue will go away.

But since you can put a 'has_many' in Book and a 'has_a' in Author, for 
example, where Author has a 'book' field, I think they can be 
inconsistent. In my proposal you, for the 'has_a' in Author, you either 
specify a forward or reverse direction with no way to specify something 
conflicting in the Book class.

>> Why do you include both 'link_class' and 'link_class_alias'? Aren't 
>> they redundant? (see [1] below).
>
> 'Link_class' refers to the Perl class name, 'link_class_alias' - to 
> the method name used to retrieve its instances (this is your 
> 'list_field' in the 'link' hash).

But can't you always get the one, given the other?

>> And I suppose the 'table' is only necessary if you don't specify the 
>> 'link_class' and vice versa, right?
>
> Yup. I am a little unhappy that in your proposal one has to have a 
> Perl class for the linking table even if one is never going to use it, 
> but I guess this is necessary for the sake of the uniform syntax.

Which is why I wouldn't protest too much of we decided to leave the old 
'links_to' syntax in untouched, at least for the time being. You would 
only need to define the linking class if you needed the 
auto-fetching/removing behavior.

>> [1]  I confess I never really did understand the purpose of the 
>> alias. What is the difference between the alias and the class? Isn't 
>> one of them redundant?
>
> The alias is used to generate access methods in other classes 
> referring to this one. In your configuration examples you always give 
> a value to the 'name' key, but if it's omitted, methods are given 
> names like 'fetch_X_alias'.

Ah ... right ... detestable :-) Let's use something tied to the field 
name, not the class, as I mentioned above.

Ray Zimmerman
Director, Laboratory for Experimental Economics and Decision Research
428-B Phillips Hall, Cornell University, Ithaca, NY 14853
phone:  (607) 255-9645