Re: [Openinteract-dev] has_many progress

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Dec 4, 2004, at 10:57 AM, Vsevolod (Simon) Ilyushchenko wrote:
>> Again, the purpose of this is to allow me to think of my book's  
>> publisher field as a Publisher object.
>
> I don't think you can persuade me that this approach is neater.

Maybe not ... but I hope you don't mind my trying :-)

> One of the best things about SPOPS is that it lets me think accurately  
> in terms of my tables and my objects at the same time. Each database  
> field name corresponds to an object variable name, period. Having the  
> 'publisher' id column correspond to a 'publisher' object will break  
> the neat mental picture.

Your mental picture is a totally valid one, and may often be the best  
one, but IMHO it is not the only valid one. The value in the database  
is the persistent version of the corresponding field in the objects,  
but need not be in an identical form. Developers should be able to  
define how one maps to the other in a way that is most useful to them.  
Take a look at SPOPS::Tool::DateConvert for an example of how SPOPS  
already allows this type of behavior as an option. (Btw, Chris, this a  
good example of where changing configuration forces you to change what  
you expect to find in that particular field of your object ... which I  
still find quite reasonable).

I would say that if you want the "value in database eq value in object"  
mental picture, then all you're saying is that you want to configure  
for manual fetching.

> Plus, there is a huge downside that if I want to find out parent_id  
> from a child (for the auto/lazy fetch), I'll have to call  
> child->parent->id, which means that for the lazy fetch I have to  
> retrieve the parent to get parent's id, though it's present in the  
> child anyway. However, see more below.

Again, I think you're just saying that you want manual fetching. Why  
would you configure it for auto/lazy fetch if all you wanted was the  
id? If you only need the object sometimes and want the parent() method  
to always return an id, you can still always get to the object using  
the fetch_parent() method (which you can even rename if you like).

>>   - What do you call the field where you stash the auto-fetched  
>> object? Always specify explicit name in config?
>
> This can be auto-deduced from class names (config hash keys, not Perl  
> names). Currently SPOPS supports the config key {main_alias} to be  
> used in such cases.

Unless I don't understand what you're suggesting, I think this goes  
back to our previous discussion about what to use for the config key,  
alias (old has_a syntax) or field name (new has_a syntax). Those  
aliases are only unique to the class, but we need something unique to  
the field to be able to handle the case where you have two fields which  
belong to the same class.

>>   - When auto-saving how should SPOPS handle inconsistencies between  
>> the id of the object in the publisher field and the value in the  
>> publisher_id field? Or to put it another way, if I want to re-assign  
>> a book's publisher, do I assign a new id to the publisher_id field or  
>> do I assign a new object to the publisher field?
>
> It's possible to generate methods in such a way that all related id  
> fields and objects are updated when a change happens. However, this is  
> implementing caching all over again. This makes me think that maybe  
> storing objects in fields is not such a good idea after all.  
> (Currently SPOPS does not do it, and the code that I've written before  
> Ray's objections does not do it either.)

No storing objects in fields?  You mean you want to eliminate  
auto/lazy-fetch as an option?

Ah ... wait a minute ... are you assuming a cache? I suppose with a  
cache you could do auto/lazy-fetching without storing the auto-fetched  
objects in the primary object. But as I mentioned before (and I seem to  
remember Chris saying the same), I think caching and the new has_a (or  
relates_to) functionality should be treated separately. I have  
certainly been thinking of these as completely separate. For the  
purposes of designing this new functionality, I've *always* assumed no  
caching. Caching can be added as an independent option later.

> This brings up another problem, though. How should save() work in the  
> many-to-many configuration when the parent is saved? Currently  
> links_to calling addChild() method already causes an update of the  
> linking table. Hence, when parent->save() is called, there is no need  
> to update the linking table, though the framework can go ahead and  
> save the children objects if auto-save is specified. Oh, wait! We  
> don't have the children objects because we don't like storing objects  
> in the fields! We have to re-fetch the children and then save them.  
> Fortunately, I have the "cache_only" option for fetch_group() which  
> will only return objects in cache, but we still have to make a  
> database call because we don't know what IDs we want.

In my proposal, auto-saving is off by default for 'link'. So you don't  
save either the link or the child object. Details are in  
<http://sourceforge.net/mailarchive/forum.php? 
thread_id=110632&forum_id=3222>.

> In the one-to-many configuration, the addChild method updates the  
> parent_id field in the child object and saves it, which amounts to  
> auto-save, even though it may not have been requested. The code can be  
> changed, though, to only save the new parent_id of the child object.

It's only an auto-save if it happens automatically as a result of  
saving the object that fetched it. I view calling an add_*() method as  
an explicit (i.e. "manual") save operation on the child and think it  
should be documented as such.

> The problem lies in the fact that we perform database saves not just  
> in the save() method, but also in the add/removeChild operations. What  
> happens if the add/remove operations do not access the database? We  
> have to either 1) maintain a hidden list of dependent ids and work  
> with it, or 2) go Ray's way and maintain a list of dependent objects.  
> To maintain consistency we can either A) add logic to generated  
> methods to keep those ids in sync, or B) prohibit saying  
> child->parent_id if it can be obtained as child->parent->id.

I think all of these options are ugly, which is why I propose that we  
stick with saving the child object in the add_*() methods and  
documenting that clearly.

> How do we deal, though, with manual fetch (when a dependent object is  
> not stored) plus auto-save (when a dependent object/id should be  
> stored to be saved)?

This issue is also addressed in my proposal in the e-mail referenced  
above. Since you can't auto-save an object you don't have, auto-saving  
with manual fetch makes no sense, and attempting to configure a field  
that way should throw an error.

>> Seems much more messy to me for something that I consider to be  
>> completely new functionality (no backward compatibility issues to  
>> worry about). While it does eliminate the need for the manual fetch  
>> option, as Simon mentioned earlier (and probably even lazy fetch), it  
>> also eliminates one of the main features for me, which is the ability  
>> to take an existing field and treat it conceptually as an object.
>
> But I like having parent_id() and parent() methods separate, because I  
> should not completely forget that I'm dealing with a database.

So why not just use ...

     relates_to => {
         parent_id => {
             class => 'Parent::Class',
             fetch => {
                 type => 'manual'
                 name => 'parent'
             }
         }
     }

Doesn't that give you what you want? Turn on caching and you even have  
your version of lazy-loading, right? The parent_id() method always  
gives you the id and the parent() method always gives you the (possibly  
cached) object. I think the only thing missing is auto-fetching (into  
cache only) without stashing the objects in a field. This 'auto-cache'  
option, to give it a name, obviously requires that caching be turned on  
and would have to either throw an error or revert to 'manual' if a  
cache were not present. But, I honestly don't see the need for this  
option. If you ALWAYS want to auto-fetch the secondary object(s), then  
you're never in the situation where you need to fetch them just to get  
the id. Both are always available. So why not just auto-fetch them into  
the field as I proposed?

I think it's clear that there are differences in the way you and I like  
to think about our objects/data and the way we intend to use SPOPS, and  
this new functionality in particular. And this is all perfectly  
reasonable. My hope is that we can find a clean, consistent,  
easy-to-document/understand design that is as flexible as possible, and  
in particular, flexible enough to encompass both of our requirements.  
After all this discussion, I still think my original design, in all of  
its gory detail, accomplishes this. If there is a favorite mental  
picture or usage scenario that is excluded by my proposed design (or  
even made more difficult or confusing to configure), I don't yet see  
it.  ( And I am trying :-)

Ray Zimmerman
Director, Laboratory for Experimental Economics and Decision Research
428-B Phillips Hall, Cornell University, Ithaca, NY 14853
phone:  (607) 255-9645