Re: [Openinteract-dev] has_many progress

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Simon,

On Oct 31, 2004, at 8:32 PM, Vsevolod (Simon) Ilyushchenko wrote:
> My current progress - code pretty much finalized, and a .t file with 
> 64 tests (I'll have to add a few more tests to cover all combinations 
> of features). I'd like to do some more internal testing before I send 
> the official patch, but if anyone has time to play with the new 
> features (yeah, right :), I'd be happy to share it.

Cool. I'd like to check it out.

> It's completely backwards compatible.

Is this a good thing?  :-)  (more below in this)

I assume that's why you changed the key values in the 'has_a' spec to 
the class name instead of the field name? But isn't this a problem if 
you have multiple has_a fields of the same class, with different 
behavior for fetching/saving/removing? Or do you just use an arrayref 
of hashrefs instead of the single hashref in that case?  Blech.

For example, I'm thinking of something like a Transfer object that has 
a 'from_account' and a 'to_account' field, both of class 'Account', 
where you want to auto-fetch the 'from_account' and lazy-fetch (or 
manual fetch) the 'to_account'.

This is something I did not like about the original has_a spec. It 
seems so backwards to me. I want to specify relationship behavior for 
each individual field. Whether or not two has_a fields belong (or not) 
to the same class is irrelevant and therefore (to me) it makes no sense 
to organize them by class.

> > Just curious, at what level are you implementing this cache? Is it
> > *always* used for all SPOPS objects? Only for DBI? Only for DBI when 
> you
> > are using the new functionality?
>
> It's only for DBI.

And you say this full cache is necessary for the consistency circular 
references. You mean to avoid infinite loops when you have A set to 
auto-fetch B which is set to auto-fetch A? Seems to me that you should 
be able to detect this type of thing when the classes are configured. 
While I think a cache is a nice option that I may very well use, I 
don't think it should be mandatory unless it's absolutely necessary. 
Can you give me an example where consistency makes it absolutely 
necessary?

I've used only application level caching, never SPOPS-level caching so 
I'm not clear on how this works. Does this mean that SPOPS objects no 
longer go out of scope and get destroyed until the program ends or the 
cache is manually flushed? In a mod_perl environment then, do you flush 
the cache at the end of every request or what? Seems like you could get 
some huge apache children pretty quickly if you're not careful.

> I actually did not implement manual loading - I just can't see where 
> it can be used (and a bit fuzzy on how it would work). So there is 
> only auto and lazy loading.

It was for completeness and to offer a mode that is equivalent to 
current has_a behavior, that is, the field normally just gives you an 
id, but you also have a convenience method for fetching the object as 
well. My idea was that any 'has_a' spec, including 'manual', would 
create convenience methods for fetching the related objects. The 'auto' 
and 'lazy' options would simply call these methods automatically at the 
appropriate time and stash the return values in the object. So in the 
way I was picturing things, implementing 'manual' would simply be the 
first step in implementing 'auto' and 'lazy'.

Without the manual option, you can't specify a relationship at all 
without having it define auto-fetching behavior. You can't, for 
example, auto-remove an object without having it also auto-fetched 
(which I can imagine you might want if you typically only need to deal 
with the ID of the secondary object).

Just curious, does your implementation of 'auto' generate a public 
'fetch_myA' method, for example?

> Autosaving is always off by default, to preserve compatibility.

I'm not sure I follow. Auto-fetching is new so there is no previous 
corresponding save behavior to be backward compatible with. Classes 
defined without any auto-fetch/auto-remove behavior, could behave as 
always. Classes defining new auto behavior could have whatever default 
'save' behavior we think makes sense. So I'm not sure there is a 
backward compatibility issue here.

And the save behavior described in my updated proposal posted 4 Jan 
2002 still seems to be the most consistent and make the most sense to 
me.

> OTOH, there are three types of removes - 'auto', 'manual' and 
> 'forget'. 'Auto' means complete removal of dependent objects, 'forget' 
> - nullifying id fields pointing to the removed objects, and 'manual' - 
> no action. The default should logically be 'forget', but it may 
> conflict with no autosaving, so I'll have to set it to 'manual'.

OK, but what is the 'reverse_remove'? Is specifying 'reverse_remove' => 
'forget' in a 'has_a' the same as specifying 'remove' => 'forget' in 
the corresponding 'has_many'? If so, which one takes precedence if they 
are inconsistent? It looks like 'reverse_remove' => 'forget' is 
equvalent to what I called 'null_by', right?. I personally think that 
having multiple (and possibly conflicting) ways/places of defining the 
behavior for a single relationship is asking for trouble. I think it 
will make it difficult to write correct and clear documentation and it 
will create some debugging nightmares. (More on this below)

> As I mentioned, I am flexible on the matter of has_many. If you want, 
> I can create an additional config option (you call it 'auto_by' etc,
> I think) and put it into the other class.

Borrowing from some of your configuration terminology, we could call it 
'reverse_auto', 'reverse_lazy' and 'reverse_manual'. That might be more 
clear. And for removes we could use 'manual_forget' and 'auto_forget' 
in place of my 'manual_null' and 'auto_null' if you prefer. Again, 
having more than one way to do it seems like a bad idea to me. (More on 
this below).

> Links_to is harder, because to preserve backward compatibility it has 
> to stay in one of the edge classes. But if you want to add more 
> variables to the linking class, you'll have to define it for SPOPS 
> anyway, and there I can probably also implement your suggestion as an 
> option.

I suppose if you want to enhance the existing links_to syntax you are 
correct. (More below on why I wouldn't do this).

Why do you include both 'link_class' and 'link_class_alias'? Aren't 
they redundant? (see [1] below).
And I suppose the 'table' is only necessary if you don't specify the 
'link_class' and vice versa, right?

After looking over your configuration syntax I started trying to 
convert my examples to your syntax to see if all the bases are covered. 
Here are some random comments, based on my understanding of what you're 
doing:

* for both forward and reverse direction, auto-remove is not possible 
without auto-fetch (missing 'manual' mode)
* for both forward and reverse direction, auto/lazy-fetch looks fine 
(except for default behavior of save)
* forward direction manual/auto remove looks fine
* reverse direction auto/forget remove looks fine, but I don't see an 
equivalent to manual_null (manual_forget)
* is use of 'alias' and 'reverse_alias' consistent with the way alias 
is normally used in SPOPS? (see [1] below)
* didn't see any mention of the 'name' option for explicitly specifying 
the name of generated methods
* not clear to me what auto/lazy fetching, auto removing, etc options 
are implemented for links_to

In conclusion, here are a few of my perspectives.

I've been trying to keep an open mind toward the differences between 
your implementation and my proposed design, adopting a wait and see 
approach. I hesitate to state my opinions too strongly since you're 
contributing useful code and I'm just talking about design. On the 
other hand, I suppose you probably want honest input.

I think that both links_to and has_many share a fundamental flaw. They 
both allow relationships to be defined from the "other" end, making it 
possible to define conflicting behavior for a single relationship. This 
unnecessarily complicates the error checking or precedence rules and 
the documentation. I think always forcing the relationship to be 
defined by the class with the linking field, and having a single syntax 
(has_a) for defining it, makes for a much simpler, cleaner, more 
consistent, easier to understand/document/implement, less error prone 
design. Including both approaches (e.g. has_many and reverse_auto or 
auto_by), in my opinion, only complicates and confuses things further.

For this reason, I think we should forget trying to make the new syntax 
backward compatible. I think we should leave the old syntax in place 
(but deprecated) as long as necessary for backward compatibility and 
add a completely redesigned new syntax that folks can migrate to 
gradually or as they need the new features. If it were up to me I 
wouldn't touch links_to at all, and I wouldn't touch the old has_a 
either. I'd just add a new configuration handler for the case where the 
key of a 'has_a' spec matches one of the field names, in which case you 
assume it is the new syntax I proposed, otherwise you assume it's a 
class and use the old has_a semantics.

I don't know if Chris has enough spare cycles to give an opinion on 
this at the moment, but I'd be interested in his perspective.

While I appreciate all your work, I have to say that the above issues 
are leading me to a growing preference for my original proposed design.

Ray Zimmerman
Director, Laboratory for Experimental Economics and Decision Research
428-B Phillips Hall, Cornell University, Ithaca, NY 14853
phone:  (607) 255-9645

[1]  I confess I never really did understand the purpose of the alias. 
What is the difference between the alias and the class? Isn't one of 
them redundant?