From: Vsevolod (S. I. <si...@cs...> - 2004-10-22 03:32:28
|
Chris and Ray, Now that Chris has some time (hopefully) to consider the changes, I'd like to submit a patch that would add has_many, auto-save/auto-delete (optional) and full cache support (which is necessary for the consistency circular references). Before I do that, I'd like to get Chris's approval on the following architectural decisions: 1. There will be a basic cache which will always return the cached object, not its copy. 2. The fetch_many() now will also use cache. It will still make a call to the database, because it won't know which ids to request from the cache, though. 3. Auto-save and auto-delete are off by default. This is not what Ray requested, but it's the only way to stay backwards compatible. With auto-save on, the graph of object in memory will be traversed to determine what has been changed. (Thus, I have deep_changed() instead of changed().) 4. It will be possible to use a separate class that corresponds to the linking table in links_to (think ClubMembership). The syntax I came up with is: Club => { class => 'Club', ... links_to => { 'Person' => { table => 'ClubMembership', link_class => 'ClubMembership', link_class_alias => 'memberships', alias => 'members', to_id_field => 'member_id', from_id_field => 'club_id' }, The ClubMembership class can have extra attributes (like date), but they will be specified in a regular way in the config file. The new attribute 'link_class_alias' will return an arrayref of the ClubMembership instances. Again, this is not as elegant as Ray proposed, but backwards compatible. An open question is what to do with has_a. Currently it's implemented in ClassFactory, not ClassFactory/DBI, but adding auto-save and auto-delete functionality requires the has_a code to be DBI-aware. I can add extra functionality into ClassFactory/DBI, but I am not sure which effect it would have on LDAP storage, for example. Oh, and a style question. I have marked all of my changes with "tags": #Simon #/Simon and I commented out, not deleted, all replaced code, for ease of reference. Is this necessary? I will grudgingly remove lc-ing of field names. I love it, but, as pointed out before, it will break existing code. Simon -- Simon (Vsevolod ILyushchenko) si...@cs... http://www.simonf.com Terrorism is a tactic and so to declare war on terrorism is equivalent to Roosevelt's declaring war on blitzkrieg. Zbigniew Brzezinski, U.S. national security advisor, 1977-81 |
From: Ray Z. <rz...@co...> - 2004-10-22 13:04:06
|
Hi Simon, It's good to hear that this work is coming along ... On Oct 21, 2004, at 11:32 PM, Vsevolod (Simon) Ilyushchenko wrote: > 1. There will be a basic cache which will always return the cached > object, not its copy. This sounds like a good idea. This ensures that, within a given process, there is only ever one copy of an object in memory no matter how many times and in what contexts you fetch it, right? Just curious, at what level are you implementing this cache? Is it *always* used for all SPOPS objects? Only for DBI? Only for DBI when you are using the new functionality? > 2. The fetch_many() now will also use cache. It will still make a call > to the database, because it won't know which ids to request from the > cache, though. > 3. Auto-save and auto-delete are off by default. This is not what Ray > requested, but it's the only way to stay backwards compatible. With > auto-save on, the graph of object in memory will be traversed to > determine what has been changed. (Thus, I have deep_changed() instead > of changed().) > 4. It will be possible to use a separate class that corresponds to the > linking table in links_to (think ClubMembership). The syntax I came up > with is: > Club => { > class => 'Club', > ... > links_to => > { 'Person' => > { table => 'ClubMembership', > link_class => 'ClubMembership', > link_class_alias => 'memberships', > alias => 'members', > to_id_field => 'member_id', > from_id_field => 'club_id' }, > > The ClubMembership class can have extra attributes (like date), but > they will be specified in a regular way in the config file. The new > attribute 'link_class_alias' will return an arrayref of the > ClubMembership instances. Again, this is not as elegant as Ray > proposed, but backwards compatible. I'd like to give my feedback, but I'm not sure I have a clear understanding of which pieces of my proposal you implemented exactly, which pieces you implemented with small modifications, and for which pieces you chose an alternative approach entirely. Do you have a documentation detailing the new syntax for the functionality you are adding? Actually, possibly the simplest thing (and probably the most helpful thing for me) would be to just take the examples at the end of my proposal and modify them to your syntax so I could see an example of what each case looks like. I forgot, did you implement any lazy loading features or is it just manual and auto? If I understand correctly, one of the fundamental differences between your implementation and my proposal is that for has_many and links_to relationships the definition of the relationship is no longer specified in the class containing the linking field, but rather in a class that is linked to. I'm a bit concerned that with the way I use SPOPS (not to create classes but simply to define the relationships and persistence of traditionally defined *.pm-file classes, where linking classes have not only extra fields, but also extra functionality), some of the functionality I need which was present in my proposal will be missing from your redesign ... but I guess I won't know until I see the details. In any case, your actual working implementation is far better than any un-implemented proposal, so don't get me wrong, I really appreciate all the work you've done even if it doesn't end up covering everything I would have liked. > An open question is what to do with has_a. Currently it's implemented > in ClassFactory, not ClassFactory/DBI, but adding auto-save and > auto-delete functionality requires the has_a code to be DBI-aware. I > can add extra functionality into ClassFactory/DBI, but I am not sure > which effect it would have on LDAP storage, for example. > > Oh, and a style question. I have marked all of my changes with "tags": > #Simon > #/Simon > and I commented out, not deleted, all replaced code, for ease of > reference. Is this necessary? Might be nice initially for Chris to see this, but I'd think eventually he'll want to remove it. But of course, that's Chris's call. > I will grudgingly remove lc-ing of field names. I love it, but, as > pointed out before, it will break existing code. Thanks, this is appreciated since it was my existing code that was breaking. :-) Thanks again for your contributions, Ray Zimmerman Director, Laboratory for Experimental Economics and Decision Research 428-B Phillips Hall, Cornell University, Ithaca, NY 14853 phone: (607) 255-9645 |
From: Vsevolod (S. I. <si...@cs...> - 2004-11-01 01:32:25
|
Ray, Chris, and others, My current progress - code pretty much finalized, and a .t file with 64 tests (I'll have to add a few more tests to cover all combinations of features). I'd like to do some more internal testing before I send the official patch, but if anyone has time to play with the new features (yeah, right :), I'd be happy to share it. It's completely backwards compatible. > Just curious, at what level are you implementing this cache? Is it > *always* used for all SPOPS objects? Only for DBI? Only for DBI when you > are using the new functionality? It's only for DBI. > Do you have a documentation detailing the new syntax for the > functionality you are adding? Actually, possibly the simplest thing (and > probably the most helpful thing for me) would be to just take the > examples at the end of my proposal and modify them to your syntax so I > could see an example of what each case looks like. See below. > I forgot, did you implement any lazy loading features or is it just > manual and auto? I actually did not implement manual loading - I just can't see where it can be used (and a bit fuzzy on how it would work). So there is only auto and lazy loading. Autosaving is always off by default, to preserve compatibility. OTOH, there are three types of removes - 'auto', 'manual' and 'forget'. 'Auto' means complete removal of dependent objects, 'forget' - nullifying id fields pointing to the removed objects, and 'manual' - no action. The default should logically be 'forget', but it may conflict with no autosaving, so I'll have to set it to 'manual'. > If I understand correctly, one of the fundamental differences between > your implementation and my proposal is that for has_many and links_to > relationships the definition of the relationship is no longer specified > in the class containing the linking field, but rather in a class that is > linked to. I'm a bit concerned that with the way I use SPOPS (not to > create classes but simply to define the relationships and persistence of > traditionally defined *.pm-file classes, where linking classes have not > only extra fields, but also extra functionality), some of the > functionality I need which was present in my proposal will be missing > from your redesign ... but I guess I won't know until I see the details. As I mentioned, I am flexible on the matter of has_many. If you want, I can create an additional config option (you call it 'auto_by' etc, I think) and put it into the other class. Links_to is harder, because to preserve backward compatibility it has to stay in one of the edge classes. But if you want to add more variables to the linking class, you'll have to define it for SPOPS anyway, and there I can probably also implement your suggestion as an option. Below is the usage example that you posted originally, and after *** - my syntax. $CONF = { X_alias => { class => "X", field => [ qw/ x_id x_data myA myB / ], has_a => { myA => { class => "A", fetch => { type => "manual|auto|lazy|manual_by|auto_by|lazy_by", name => "someA", list_field => "list_of_Xs", }, link => { type => "manual|auto|lazy", field => "myB" name => "someB", list_field => "linked_Bs", }, remove => { type => "manual|auto|manual_by|auto_by|manual_null|auto_null" name => "removeSomeA", list_field => "list_of_Xs", } }, myB => { class => "B", fetch => { type => "manual|auto|lazy|auto_by|lazy_by", name => "someB", list_field => "linked_Bs", }, link => { type => "manual|auto|lazy", field => "myA" name => "someA", list_field => "list_of_As", }, remove => { type => "manual|auto|manual_by|auto_by|manual_null|auto_null" name => "removeSomeB", list_field => "list_of_Xs", } }, } }, A_alias => { class => "A", field => [ qw/ a_id a_data / ], list_field => { X => ["list_of_Xs"], B => ["linked_Bs"] } }, B_alias => { class => "B", field => [ qw/ b_id b_data / ], }, } *** $CONF = { X_alias => { class => "X", field => [ qw/ x_id x_data myA myB / ], has_a => { {A => {alias => "mya", fetch => { type => 'auto|lazy', save =>1 }, remove => { type => 'auto|manual' } reverse_remove => {type => 'manual|forget' } } }, has_many => { B => { fetch => { type => 'auto|lazy', save => 1, alias => 'list_of_Xs', reverse_alias => 'myx'}, remove => { type => 'auto|manual|forget' } } }, links_to => { 'D' => { table => 'spops_x_d', alias => 'list_of_Ds', link_class => 'Membership', link_class_alias => 'memberships', } } .. }, Membership_alias => { class => 'Membership', field => ['id, 'x_id', 'd_id', 'start_date', 'end_date'], ... } Simon -- Simon (Vsevolod ILyushchenko) si...@cs... http://www.simonf.com Terrorism is a tactic and so to declare war on terrorism is equivalent to Roosevelt's declaring war on blitzkrieg. Zbigniew Brzezinski, U.S. national security advisor, 1977-81 |
From: Ray Z. <rz...@co...> - 2004-11-01 23:18:17
|
Hi Simon, On Oct 31, 2004, at 8:32 PM, Vsevolod (Simon) Ilyushchenko wrote: > My current progress - code pretty much finalized, and a .t file with > 64 tests (I'll have to add a few more tests to cover all combinations > of features). I'd like to do some more internal testing before I send > the official patch, but if anyone has time to play with the new > features (yeah, right :), I'd be happy to share it. Cool. I'd like to check it out. > It's completely backwards compatible. Is this a good thing? :-) (more below in this) I assume that's why you changed the key values in the 'has_a' spec to the class name instead of the field name? But isn't this a problem if you have multiple has_a fields of the same class, with different behavior for fetching/saving/removing? Or do you just use an arrayref of hashrefs instead of the single hashref in that case? Blech. For example, I'm thinking of something like a Transfer object that has a 'from_account' and a 'to_account' field, both of class 'Account', where you want to auto-fetch the 'from_account' and lazy-fetch (or manual fetch) the 'to_account'. This is something I did not like about the original has_a spec. It seems so backwards to me. I want to specify relationship behavior for each individual field. Whether or not two has_a fields belong (or not) to the same class is irrelevant and therefore (to me) it makes no sense to organize them by class. > > Just curious, at what level are you implementing this cache? Is it > > *always* used for all SPOPS objects? Only for DBI? Only for DBI when > you > > are using the new functionality? > > It's only for DBI. And you say this full cache is necessary for the consistency circular references. You mean to avoid infinite loops when you have A set to auto-fetch B which is set to auto-fetch A? Seems to me that you should be able to detect this type of thing when the classes are configured. While I think a cache is a nice option that I may very well use, I don't think it should be mandatory unless it's absolutely necessary. Can you give me an example where consistency makes it absolutely necessary? I've used only application level caching, never SPOPS-level caching so I'm not clear on how this works. Does this mean that SPOPS objects no longer go out of scope and get destroyed until the program ends or the cache is manually flushed? In a mod_perl environment then, do you flush the cache at the end of every request or what? Seems like you could get some huge apache children pretty quickly if you're not careful. > I actually did not implement manual loading - I just can't see where > it can be used (and a bit fuzzy on how it would work). So there is > only auto and lazy loading. It was for completeness and to offer a mode that is equivalent to current has_a behavior, that is, the field normally just gives you an id, but you also have a convenience method for fetching the object as well. My idea was that any 'has_a' spec, including 'manual', would create convenience methods for fetching the related objects. The 'auto' and 'lazy' options would simply call these methods automatically at the appropriate time and stash the return values in the object. So in the way I was picturing things, implementing 'manual' would simply be the first step in implementing 'auto' and 'lazy'. Without the manual option, you can't specify a relationship at all without having it define auto-fetching behavior. You can't, for example, auto-remove an object without having it also auto-fetched (which I can imagine you might want if you typically only need to deal with the ID of the secondary object). Just curious, does your implementation of 'auto' generate a public 'fetch_myA' method, for example? > Autosaving is always off by default, to preserve compatibility. I'm not sure I follow. Auto-fetching is new so there is no previous corresponding save behavior to be backward compatible with. Classes defined without any auto-fetch/auto-remove behavior, could behave as always. Classes defining new auto behavior could have whatever default 'save' behavior we think makes sense. So I'm not sure there is a backward compatibility issue here. And the save behavior described in my updated proposal posted 4 Jan 2002 still seems to be the most consistent and make the most sense to me. > OTOH, there are three types of removes - 'auto', 'manual' and > 'forget'. 'Auto' means complete removal of dependent objects, 'forget' > - nullifying id fields pointing to the removed objects, and 'manual' - > no action. The default should logically be 'forget', but it may > conflict with no autosaving, so I'll have to set it to 'manual'. OK, but what is the 'reverse_remove'? Is specifying 'reverse_remove' => 'forget' in a 'has_a' the same as specifying 'remove' => 'forget' in the corresponding 'has_many'? If so, which one takes precedence if they are inconsistent? It looks like 'reverse_remove' => 'forget' is equvalent to what I called 'null_by', right?. I personally think that having multiple (and possibly conflicting) ways/places of defining the behavior for a single relationship is asking for trouble. I think it will make it difficult to write correct and clear documentation and it will create some debugging nightmares. (More on this below) > As I mentioned, I am flexible on the matter of has_many. If you want, > I can create an additional config option (you call it 'auto_by' etc, > I think) and put it into the other class. Borrowing from some of your configuration terminology, we could call it 'reverse_auto', 'reverse_lazy' and 'reverse_manual'. That might be more clear. And for removes we could use 'manual_forget' and 'auto_forget' in place of my 'manual_null' and 'auto_null' if you prefer. Again, having more than one way to do it seems like a bad idea to me. (More on this below). > Links_to is harder, because to preserve backward compatibility it has > to stay in one of the edge classes. But if you want to add more > variables to the linking class, you'll have to define it for SPOPS > anyway, and there I can probably also implement your suggestion as an > option. I suppose if you want to enhance the existing links_to syntax you are correct. (More below on why I wouldn't do this). Why do you include both 'link_class' and 'link_class_alias'? Aren't they redundant? (see [1] below). And I suppose the 'table' is only necessary if you don't specify the 'link_class' and vice versa, right? After looking over your configuration syntax I started trying to convert my examples to your syntax to see if all the bases are covered. Here are some random comments, based on my understanding of what you're doing: * for both forward and reverse direction, auto-remove is not possible without auto-fetch (missing 'manual' mode) * for both forward and reverse direction, auto/lazy-fetch looks fine (except for default behavior of save) * forward direction manual/auto remove looks fine * reverse direction auto/forget remove looks fine, but I don't see an equivalent to manual_null (manual_forget) * is use of 'alias' and 'reverse_alias' consistent with the way alias is normally used in SPOPS? (see [1] below) * didn't see any mention of the 'name' option for explicitly specifying the name of generated methods * not clear to me what auto/lazy fetching, auto removing, etc options are implemented for links_to In conclusion, here are a few of my perspectives. I've been trying to keep an open mind toward the differences between your implementation and my proposed design, adopting a wait and see approach. I hesitate to state my opinions too strongly since you're contributing useful code and I'm just talking about design. On the other hand, I suppose you probably want honest input. I think that both links_to and has_many share a fundamental flaw. They both allow relationships to be defined from the "other" end, making it possible to define conflicting behavior for a single relationship. This unnecessarily complicates the error checking or precedence rules and the documentation. I think always forcing the relationship to be defined by the class with the linking field, and having a single syntax (has_a) for defining it, makes for a much simpler, cleaner, more consistent, easier to understand/document/implement, less error prone design. Including both approaches (e.g. has_many and reverse_auto or auto_by), in my opinion, only complicates and confuses things further. For this reason, I think we should forget trying to make the new syntax backward compatible. I think we should leave the old syntax in place (but deprecated) as long as necessary for backward compatibility and add a completely redesigned new syntax that folks can migrate to gradually or as they need the new features. If it were up to me I wouldn't touch links_to at all, and I wouldn't touch the old has_a either. I'd just add a new configuration handler for the case where the key of a 'has_a' spec matches one of the field names, in which case you assume it is the new syntax I proposed, otherwise you assume it's a class and use the old has_a semantics. I don't know if Chris has enough spare cycles to give an opinion on this at the moment, but I'd be interested in his perspective. While I appreciate all your work, I have to say that the above issues are leading me to a growing preference for my original proposed design. Ray Zimmerman Director, Laboratory for Experimental Economics and Decision Research 428-B Phillips Hall, Cornell University, Ithaca, NY 14853 phone: (607) 255-9645 [1] I confess I never really did understand the purpose of the alias. What is the difference between the alias and the class? Isn't one of them redundant? |
From: Vsevolod (S. I. <si...@cs...> - 2004-11-02 14:46:39
|
> I've been trying to keep an open mind toward the differences between > your implementation and my proposed design, adopting a wait and see > approach. I hesitate to state my opinions too strongly since you're > contributing useful code and I'm just talking about design. On the other > hand, I suppose you probably want honest input. Ray, Honest input is, of course, appreciated. I'll look over the your questions over the weekend, but it seems that compatibility is the major question, as it drives the syntax that you don't like. I had to stay compatible by default, and I think only Chris can authorize a new syntax. So I would also like to hear what he has to say. My implementation is not written in stone - I personally don't really care what the syntax is as long as it provides the full fuctionality. Simon -- Simon (Vsevolod ILyushchenko) si...@cs... http://www.simonf.com Terrorism is a tactic and so to declare war on terrorism is equivalent to Roosevelt's declaring war on blitzkrieg. Zbigniew Brzezinski, U.S. national security advisor, 1977-81 |
From: Vsevolod (S. I. <si...@cs...> - 2004-11-07 19:15:31
|
Ray, I've mulled over your comments, and I think I'm starting to see the light. :) I've looked at other OOP frameworks (Alzabo, Tangram, Hibernate, XORM), and they all seem to define the relationship in the class that has the ID field which refers to the other class. I continue the discussion of cache and the 'manual' keyword below, but when we converge to an opinion on those, I'll redo the code your way. It's less work than it seems. However, Chris has not replied yet. I'll ask him again about the compatibility issue. However, if I simply extend the has_a syntax the way you propose and smartly determine whether the old or the new syntax is used (I do it anyway now), it should not be a problem. > I assume that's why you changed the key values in the 'has_a' spec to > the class name instead of the field name? But isn't this a problem if > you have multiple has_a fields of the same class, with different > behavior for fetching/saving/removing? Or do you just use an arrayref of > hashrefs instead of the single hashref in that case? Blech. I agree, it's detestable. > And you say this full cache is necessary for the consistency circular > references. You mean to avoid infinite loops when you have A set to > auto-fetch B which is set to auto-fetch A? Seems to me that you should > be able to detect this type of thing when the classes are configured. > While I think a cache is a nice option that I may very well use, I don't > think it should be mandatory unless it's absolutely necessary. Can you > give me an example where consistency makes it absolutely necessary? For example, you have a Book object with many Authors. If the application loads a book with the list of authors, adds another author to this book and asks the new author about its parent book, the current SPOPS implementation will re-fetch the book object, potentially ignoring the changes that were made to original book object. However, I only now realized that you suggest saving both the parent-to-child reference and the reverse reference in the object fields. (I distinguish between $author->{book_id}, a number, and $author->{book}, an object. Let me know if I understood this correctly.) Thus, once $book->{list_of_authors} is populated, adding a new author to the book should add the new object to this list, plus it should set the field $author->{book} to the original book object. This will make the above situation impossible. However, inconsistencies still may occur if: 1) I create a new author-book relationship by setting the field $author->{book_id} instead of saying $book->add_author($author). This can be discouraged as an incorrect way of altering data, of course, but logically both make sense, and I'd like to be able to use them both. Or, 2) if I am working with a second relationship, say books-to-artists (illustrators). In this case, in one place in my code, I could retrieve a book object by saying $artist->book, and then in another place I'll call $author->book, and even though they may refer to the same book, they will always be two different objects. So looks like cache is still necessary. > I've used only application level caching, never SPOPS-level caching so > I'm not clear on how this works. Does this mean that SPOPS objects no > longer go out of scope and get destroyed until the program ends or the > cache is manually flushed? In a mod_perl environment then, do you flush > the cache at the end of every request or what? Seems like you could get > some huge apache children pretty quickly if you're not careful. It has to be flushed, of course. > It was for completeness and to offer a mode that is equivalent to > current has_a behavior, that is, the field normally just gives you an > id, but you also have a convenience method for fetching the object as > well. My idea was that any 'has_a' spec, including 'manual', would > create convenience methods for fetching the related objects. The 'auto' > and 'lazy' options would simply call these methods automatically at the > appropriate time and stash the return values in the object. So in the > way I was picturing things, implementing 'manual' would simply be the > first step in implementing 'auto' and 'lazy'. I feel dumb - I still don't quite get it. However, in your original examples the method X->myA returns the id of A in the case of manual fetch and A itself in the case of lazy/auto fetch, right? In my view, X->myA always return the id and X->fetch_myA always returns the object (I tend to use them like $author->book_id and $author->book in my applications). So there is no need for manual fetching. I think that having X->myA return inconsistent values may be confusing. Let me know what you think. Perhaps I am still missing the utility of manual fetching. > Without the manual option, you can't specify a relationship at all > without having it define auto-fetching behavior. You can't, for example, > auto-remove an object without having it also auto-fetched (which I can > imagine you might want if you typically only need to deal with the ID of > the secondary object). But in this case you still have to fetch the dependent object, because it may define its own rules of auto-removal of even more objects. > Just curious, does your implementation of 'auto' generate a public > 'fetch_myA' method, for example? See above - even if the fetch method name ('alias' in the current terminology') is not specified in the configuration, it'll be auto-created by using the name given to the target class. (I mean the name of config hash key for the target class, not its Perl name. In the example I sent you, X_alias is such a name.) >> Autosaving is always off by default, to preserve compatibility. > > > I'm not sure I follow. Auto-fetching is new so there is no previous > corresponding save behavior to be backward compatible with. Classes > defined without any auto-fetch/auto-remove behavior, could behave as > always. Classes defining new auto behavior could have whatever default > 'save' behavior we think makes sense. So I'm not sure there is a > backward compatibility issue here. > And the save behavior described in my updated proposal posted 4 Jan 2002 > still seems to be the most consistent and make the most sense to me. Yes, if we change the syntax, we will be free to follow you rules. >> OTOH, there are three types of removes - 'auto', 'manual' and >> 'forget'. 'Auto' means complete removal of dependent objects, 'forget' >> - nullifying id fields pointing to the removed objects, and 'manual' - >> no action. The default should logically be 'forget', but it may >> conflict with no autosaving, so I'll have to set it to 'manual'. > > > OK, but what is the 'reverse_remove'? Is specifying 'reverse_remove' => > 'forget' in a 'has_a' the same as specifying 'remove' => 'forget' in the > corresponding 'has_many'? If so, which one takes precedence if they are > inconsistent? It looks like 'reverse_remove' => 'forget' is equvalent to > what I called 'null_by', right?. I personally think that having multiple > (and possibly conflicting) ways/places of defining the behavior for a > single relationship is asking for trouble. I think it will make it > difficult to write correct and clear documentation and it will create > some debugging nightmares. (More on this below) This should not be a problem, because in my current proposal the programmer specifies either has_a or has_many (which implies the reverse has_a), so no conflicts should be possible. However, if we change the syntax, this issue will go away. > Why do you include both 'link_class' and 'link_class_alias'? Aren't they > redundant? (see [1] below). 'Link_class' refers to the Perl class name, 'link_class_alias' - to the method name used to retrieve its instances (this is your 'list_field' in the 'link' hash). > And I suppose the 'table' is only necessary if you don't specify the > 'link_class' and vice versa, right? Yup. I am a little unhappy that in your proposal one has to have a Perl class for the linking table even if one is never going to use it, but I guess this is necessary for the sake of the uniform syntax. > * didn't see any mention of the 'name' option for explicitly specifying > the name of generated methods It's called 'alias'. > * not clear to me what auto/lazy fetching, auto removing, etc options > are implemented for links_to If we use your syntax, they will be the same as in the fetch_by case. > [1] I confess I never really did understand the purpose of the alias. > What is the difference between the alias and the class? Isn't one of > them redundant? The alias is used to generate access methods in other classes referring to this one. In your configuration examples you always give a value to the 'name' key, but if it's omitted, methods are given names like 'fetch_X_alias'. Simon -- Simon (Vsevolod ILyushchenko) si...@cs... http://www.simonf.com Terrorism is a tactic and so to declare war on terrorism is equivalent to Roosevelt's declaring war on blitzkrieg. Zbigniew Brzezinski, U.S. national security advisor, 1977-81 |
From: Ray Z. <rz...@co...> - 2004-11-08 23:13:58
|
Simon, >> And you say this full cache is necessary for the consistency circular >> references. You mean to avoid infinite loops when you have A set to >> auto-fetch B which is set to auto-fetch A? Seems to me that you >> should be able to detect this type of thing when the classes are >> configured. While I think a cache is a nice option that I may very >> well use, I don't think it should be mandatory unless it's absolutely >> necessary. Can you give me an example where consistency makes it >> absolutely necessary? > > For example, you have a Book object with many Authors. If the > application loads a book with the list of authors, adds another author > to this book and asks the new author about its parent book, the > current SPOPS implementation will re-fetch the book object, > potentially ignoring the changes that were made to original book > object. > > However, I only now realized that you suggest saving both the > parent-to-child reference and the reverse reference in the object > fields. (I distinguish between $author->{book_id}, a number, and > $author->{book}, an object. Let me know if I understood this > correctly.) I think you've got it right, except I'm not sure what you mean about the reverse reference case. Maybe the myA vs fetch_myA stuff below clarifies what I'm suggesting. > Thus, once $book->{list_of_authors} is populated, adding a new author > to the book should add the new object to this list, plus it should set > the field $author->{book} to the original book object. This will make > the above situation impossible. Right. Except that 'list_of_authors' in $book implies that you've specified a reverse fetch of the book field in author, so $author->{book} is just an id, not an object. More on object vs id below. > However, inconsistencies still may occur if: > > 1) I create a new author-book relationship by setting the field > $author->{book_id} instead of saying $book->add_author($author). This > can be discouraged as an incorrect way of altering data, of course, > but logically both make sense, and I'd like to be able to use them > both. > > Or, 2) if I am working with a second relationship, say > books-to-artists (illustrators). In this case, in one place in my > code, I could retrieve a book object by saying $artist->book, and then > in another place I'll call $author->book, and even though they may > refer to the same book, they will always be two different objects. Right. Without a cache these inconsistencies are always possible. But this is a "global" issue with OOP frameworks like SPOPS. I think Chris has taken the right approach in letting the developer decide at the application level whether s/he needs to always maintain that consistency and if so whether to use SPOPS level caching or application level caching/logic to ensure consistency. (And I understand there was a bug in SPOPS caching which prevented this from working correctly). But I don't think there is anything in the new has_a design that REQUIRES one to maintain consistency though an SPOPS level cache, right? I think the only thing you need to do is ensure that you don't have any auto-fetching loops (might even want to include lazy-fetching) when you do the configuration. In other words, you don't want to have a book auto-fetch it's list of authors AND have the author set to auto-fetch its book, creating an infinite loop. Even if you use a cache this would cause circular references which cause a problem for garbage collection unless you use weak references. I think this checking is important, but I haven't honestly given any thought to how to implement it. > So looks like cache is still necessary. Only if you need to guarantee a single in-memory copy for a process. I argue that this is an arbitrary requirement. In a read-only environment, it really doesn't matter (except for resource usage) if you have multiple copies of the same object in memory. And in a web environment, even using a simple cache doesn't guarantee consistency across multiple processes (apache children), so you still need a higher level synchronization mechanism to ensure consistency. >> It was for completeness and to offer a mode that is equivalent to >> current has_a behavior, that is, the field normally just gives you an >> id, but you also have a convenience method for fetching the object as >> well. My idea was that any 'has_a' spec, including 'manual', would >> create convenience methods for fetching the related objects. The >> 'auto' and 'lazy' options would simply call these methods >> automatically at the appropriate time and stash the return values in >> the object. So in the way I was picturing things, implementing >> 'manual' would simply be the first step in implementing 'auto' and >> 'lazy'. > > I feel dumb - I still don't quite get it. However, in your original > examples the method X->myA returns the id of A in the case of manual > fetch and A itself in the case of lazy/auto fetch, right? In my view, > X->myA always return the id and X->fetch_myA always returns the object > (I tend to use them like $author->book_id and $author->book in my > applications). So there is no need for manual fetching. > > I think that having X->myA return inconsistent values may be confusing. > Let me know what you think. Perhaps I am still missing the utility of > manual fetching. My thought was that specifying 'auto' or 'lazy' are equivalent to saying "this field is an object". Specifying 'manual' is equivalent to saying "this field is an object id". So X->myA always returns the value stored in the field and X->fetch_myA always returns the object. >> Without the manual option, you can't specify a relationship at all >> without having it define auto-fetching behavior. You can't, for >> example, auto-remove an object without having it also auto-fetched >> (which I can imagine you might want if you typically only need to >> deal with the ID of the secondary object). > > But in this case you still have to fetch the dependent object, because > it may define its own rules of auto-removal of even more objects. But the fetch only happens for the purpose of correctly doing the remove ... the 'manual' specifier still means that the field holds an object id, not an object. >> Just curious, does your implementation of 'auto' generate a public >> 'fetch_myA' method, for example? > > See above - even if the fetch method name ('alias' in the current > terminology') is not specified in the configuration, it'll be > auto-created by using the name given to the target class. (I mean the > name of config hash key for the target class, not its Perl name. In > the example I sent you, X_alias is such a name.) The problem I see with this is that it generates clashes when you have multiple fields with the same class. We need a method name that is unique for the field we want to fetch, not just for the class we use to fetch it. I think using the class alias is left-over from the old has_a config which used the class as the hash key (which you agreed is detestable :-). I vote once again that we stick with my proposal to use 'fetch_' prepended to the name of the field, by default, and allow an option to explicitly specify a method name. >>> OTOH, there are three types of removes - 'auto', 'manual' and >>> 'forget'. 'Auto' means complete removal of dependent objects, >>> 'forget' - nullifying id fields pointing to the removed objects, and >>> 'manual' - no action. The default should logically be 'forget', but >>> it may conflict with no autosaving, so I'll have to set it to >>> 'manual'. >> OK, but what is the 'reverse_remove'? Is specifying 'reverse_remove' >> => 'forget' in a 'has_a' the same as specifying 'remove' => 'forget' >> in the corresponding 'has_many'? If so, which one takes precedence if >> they are inconsistent? It looks like 'reverse_remove' => 'forget' is >> equvalent to what I called 'null_by', right?. I personally think that >> having multiple (and possibly conflicting) ways/places of defining >> the behavior for a single relationship is asking for trouble. I think >> it will make it difficult to write correct and clear documentation >> and it will create some debugging nightmares. (More on this below) > > This should not be a problem, because in my current proposal the > programmer specifies either has_a or has_many (which implies the > reverse has_a), so no conflicts should be possible. However, if we > change the syntax, this issue will go away. But since you can put a 'has_many' in Book and a 'has_a' in Author, for example, where Author has a 'book' field, I think they can be inconsistent. In my proposal you, for the 'has_a' in Author, you either specify a forward or reverse direction with no way to specify something conflicting in the Book class. >> Why do you include both 'link_class' and 'link_class_alias'? Aren't >> they redundant? (see [1] below). > > 'Link_class' refers to the Perl class name, 'link_class_alias' - to > the method name used to retrieve its instances (this is your > 'list_field' in the 'link' hash). But can't you always get the one, given the other? >> And I suppose the 'table' is only necessary if you don't specify the >> 'link_class' and vice versa, right? > > Yup. I am a little unhappy that in your proposal one has to have a > Perl class for the linking table even if one is never going to use it, > but I guess this is necessary for the sake of the uniform syntax. Which is why I wouldn't protest too much of we decided to leave the old 'links_to' syntax in untouched, at least for the time being. You would only need to define the linking class if you needed the auto-fetching/removing behavior. >> [1] I confess I never really did understand the purpose of the >> alias. What is the difference between the alias and the class? Isn't >> one of them redundant? > > The alias is used to generate access methods in other classes > referring to this one. In your configuration examples you always give > a value to the 'name' key, but if it's omitted, methods are given > names like 'fetch_X_alias'. Ah ... right ... detestable :-) Let's use something tied to the field name, not the class, as I mentioned above. Ray Zimmerman Director, Laboratory for Experimental Economics and Decision Research 428-B Phillips Hall, Cornell University, Ithaca, NY 14853 phone: (607) 255-9645 |
From: Vsevolod (S. I. <si...@cs...> - 2004-11-12 01:40:29
|
Ray, > Right. Except that 'list_of_authors' in $book implies that you've > specified a reverse fetch of the book field in author, so > $author->{book} is just an id, not an object. More on object vs id below. Okay, this is my last stumbling block. I just don't get why you want to sometimes refer to the object and sometimes to the id. If it's implemented this way, I'll have to say $author->book->id instead of $author->book_id for auto/lazy, if I want to to just get the id value. Also, if I switch from auto/lazy to manual fetch, I'll have to change my code. Can you give me an example of manual fetch that cannot be implemented otherwise or that makes life more convenient? Manual remove, OTOH, is clearly different from the other two methods of removal. > But I don't think there is anything in the new has_a design that > REQUIRES one to maintain consistency though an SPOPS level cache, right? Aww, who am I to argue with developers? I'll make sure both cases work. > I think the only thing you need to do is ensure that you don't have any > auto-fetching loops (might even want to include lazy-fetching) when you > do the configuration. In other words, you don't want to have a book > auto-fetch it's list of authors AND have the author set to auto-fetch > its book, creating an infinite loop. Even if you use a cache this would > cause circular references which cause a problem for garbage collection > unless you use weak references. Hmmm... circular references will actually come up even without circular configuration. If an author refers to a list of books, and the books refer back to the author, there you have it... > I think this checking is important, but I haven't honestly given any > thought to how to implement it. I'm currently doing this checking during saving/removing objects. The configuration allows loops - I'm afraid that if this is prohibited at the configuration stage, the potential for error is too great. > I vote once again that we stick with my proposal to use 'fetch_' > prepended to the name of the field, by default, and allow an option to > explicitly specify a method name. Okay. >> This should not be a problem, because in my current proposal the >> programmer specifies either has_a or has_many (which implies the >> reverse has_a), so no conflicts should be possible. However, if we >> change the syntax, this issue will go away. > > > But since you can put a 'has_many' in Book and a 'has_a' in Author, for > example, where Author has a 'book' field, I think they can be > inconsistent. In my proposal you, for the 'has_a' in Author, you either > specify a forward or reverse direction with no way to specify something > conflicting in the Book class. Who's to stop the developer from saying that Author has_a Book and Book has_a Author? >> 'Link_class' refers to the Perl class name, 'link_class_alias' - to >> the method name used to retrieve its instances (this is your >> 'list_field' in the 'link' hash). > > But can't you always get the one, given the other? No, the alias can be overriden by the user. Simon -- Simon (Vsevolod ILyushchenko) si...@cs... http://www.simonf.com Terrorism is a tactic and so to declare war on terrorism is equivalent to Roosevelt's declaring war on blitzkrieg. Zbigniew Brzezinski, U.S. national security advisor, 1977-81 |
From: Ray Z. <rz...@co...> - 2004-11-15 15:50:52
|
On Nov 11, 2004, at 8:40 PM, Vsevolod (Simon) Ilyushchenko wrote: > Okay, this is my last stumbling block. I just don't get why you want > to sometimes refer to the object and sometimes to the id. If it's > implemented this way, I'll have to say $author->book->id instead of > $author->book_id for auto/lazy, if I want to to just get the id value. > Also, if I switch from auto/lazy to manual fetch, I'll have to change > my code. But you'll have to change code anyway if you switch from auto/lazy to manual if you have any code that accesses the book object (since it will no longer be auto/lazy-fetched). I view the specification of the auto/lazy vs manual as part of specifying the 'type' of the field. If you change the 'type' you should expect to have to change code in order to get at the same data (e.g. the id). If I understand your proposal you would always have the field hold the id and have it store the auto/lazy-fetched object into some other key in the parent object? What key? This approach requires you to specify for each auto/lazy fetched field another name by which you access the object. You have to worry about inconsistency between the id and the object. When you go to save, if the id in the field doesn't match the id of the object in corresponding key, which do you save? I just think it is simpler and more straightforward to use the single field and think of the auto/lazy vs manual as part of the type definition for that field. > Can you give me an example of manual fetch that cannot be implemented > otherwise or that makes life more convenient? Manual remove, OTOH, is > clearly different from the other two methods of removal. Well, I think the example that I gave in a previous e-mail ... wait, I now see that that example made no sense. Sorry. For some reason I was thinking that the remove spec appeared inside the fetch, not at the same level. Nevermind. OK, I think I understand your point. IF you use separate field names for the id (persistent field) and the auto/lazy-fetched object (temp non-persistent field), I suppose there is no need for manual fetch. But if you use the same field for the id and the auto-fetched object, then you DO need the manual option. I still think this is a cleaner approach since you're not cluttering up your object with extra fields unnecessarily and it allows you treat some fields as ids (manual) and others as objects (auto/lazy), which I find useful conceptually. >> I think the only thing you need to do is ensure that you don't have >> any auto-fetching loops (might even want to include lazy-fetching) >> when you do the configuration. In other words, you don't want to have >> a book auto-fetch it's list of authors AND have the author set to >> auto-fetch its book, creating an infinite loop. Even if you use a >> cache this would cause circular references which cause a problem for >> garbage collection unless you use weak references. > > Hmmm... circular references will actually come up even without > circular configuration. If an author refers to a list of books, and > the books refer back to the author, there you have it... Right, I agree. After thinking about this a bit more, I think there are two separate issues here. They are related to one another and both affect, or are affected by, the presence or absence of caching. One is the circular configuration which could result in auto-fetching loops and the other is circular references in the objects which prevent automatic garbage collection. I think the first one should be handled by SPOPS and it should be at the earliest possible stage (configuration of the class if possible, run-time if not). I think the second one is outside the scope of SPOPS. The developer needs to be aware of which objects hold references which other objects. It is certainly possible for an object A to reference B which references C which in turn references A causing a circular reference involving more than just two classes. Another thing to keep in mind is that sometimes circular references are needed/desired, but the developer needs to handle the breaking of the circular ref manually before destruction. I don't think it is the job of SPOPS to prevent circular references. When implementing an SPOPS cache it may be possible to handle circular references in an intelligent way during flushing of the cache for example, but I view this as a separate issue that should not be mixed in with the new has-a semantics. > I'm currently doing this checking during saving/removing objects. The > configuration allows loops - I'm afraid that if this is prohibited at > the configuration stage, the potential for error is too great. Are you talking about circular references or auto-fetch loops? If the latter, why is it more error prone at the configuration stage? I think the circular reference issue is fundamentally a run-time issue since it deals with instances of objects referring to one another, and for the reasons stated above, I don't think SPOPS should try to tackle this except maybe w.r.t. caching. On the other hand I think the auto-fetch loop issue is about class (as opposed to instance) behavior and is therefore fundamentally a configuration issue. No class should be able to auto-fetch a field which through further auto-fetch configuration ends up auto-fetching an object of the original class. It may be easier to implement this checking at run-time, but it is still a configuration level issue and I think it should be handled during configuration if possible. In other words, if a developer creates an auto-fetch loop, he has a bug. If the checking is done at the config stage, the bug will be exposed the first time the classes are created. If the checking is done during fetching, the class might be used successfully for a while without the bug be exposed if the particular field in question isn't populated for example. It's always best to detect errors at the earliest possible stage. >> But since you can put a 'has_many' in Book and a 'has_a' in Author, >> for example, where Author has a 'book' field, I think they can be >> inconsistent. In my proposal you, for the 'has_a' in Author, you >> either specify a forward or reverse direction with no way to specify >> something conflicting in the Book class. > > Who's to stop the developer from saying that Author has_a Book and > Book has_a Author? True, true. But I think of a 'has_many' and a reverse 'has_a' as two ways of defining the behavior of a single uni-directional link between the two classes (the Author's book field). In that sense, allowing both makes it possible to define contradicting behavior for that single link. I view your example, on the other hand, as defining the behavior of two different uni-directional (and obviously circular) links. I don't think this is a contradictory configuration, but it does require SPOPS to ensure that it doesn't create an infinite auto-fetch loop, and the developer needs to be aware of the implications for circular references (which will depend on manual vs lazy and cache implementation). Ray Zimmerman Director, Laboratory for Experimental Economics and Decision Research 428-B Phillips Hall, Cornell University, Ithaca, NY 14853 phone: (607) 255-9645 |
From: Chris W. <ch...@cw...> - 2004-12-01 19:22:17
|
> ... > But you'll have to change code anyway if you switch from auto/lazy to > manual if you have any code that accesses the book object (since it > will no longer be auto/lazy-fetched). I view the specification of the > auto/lazy vs manual as part of specifying the 'type' of the field. If > you change the 'type' you should expect to have to change code in order > to get at the same data (e.g. the id). > > If I understand your proposal you would always have the field hold the > id and have it store the auto/lazy-fetched object into some other key > in the parent object? What key? This approach requires you to specify > for each auto/lazy fetched field another name by which you access the > object. You have to worry about inconsistency between the id and the > object. When you go to save, if the id in the field doesn't match the > id of the object in corresponding key, which do you save? > > I just think it is simpler and more straightforward to use the single > field and think of the auto/lazy vs manual as part of the type > definition for that field. > ... > > OK, I think I understand your point. IF you use separate field names > for the id (persistent field) and the auto/lazy-fetched object (temp > non-persistent field), I suppose there is no need for manual fetch. > > But if you use the same field for the id and the auto-fetched object, > then you DO need the manual option. I still think this is a cleaner > approach since you're not cluttering up your object with extra fields > unnecessarily and it allows you treat some fields as ids (manual) and > others as objects (auto/lazy), which I find useful conceptually. Just to be clear what we're talking about here: book: book_id, title, publisher_id publisher: publisher_id, name book_author: book_id, author_id author: author_id, name Book 1 ---> 1 Publisher Book 1 ---> * Author my $book = Book->fetch(42); $book->publisher == object (whether manual/lazy fetch) $book->publisher_id == ID $book->author == list of objects (whether manual/lazy fetch) Is that right? Because it sounds from the discussion as if: $book->publisher == sometimes ID, sometimes object, depending on the configuration which IMO is very confusing. Chris -- Chris Winters (ch...@cw...) Building enterprise-capable snack solutions since 1988. |
From: Vsevolod (S. I. <si...@cs...> - 2004-12-01 19:58:44
|
> > my $book = Book->fetch(42); > $book->publisher == object (whether manual/lazy fetch) > $book->publisher_id == ID > $book->author == list of objects (whether manual/lazy fetch) > > Is that right? Because it sounds from the discussion as if: > > $book->publisher == sometimes ID, sometimes object, > depending on the configuration > > which IMO is very confusing. Yes, yes, yes! My point exactly! Simon -- Simon (Vsevolod ILyushchenko) si...@cs... http://www.simonf.com Terrorism is a tactic and so to declare war on terrorism is equivalent to Roosevelt's declaring war on blitzkrieg. Zbigniew Brzezinski, U.S. national security advisor, 1977-81 |
From: Ray Z. <rz...@co...> - 2004-12-01 22:05:51
|
On Dec 1, 2004, at 2:58 PM, Vsevolod (Simon) Ilyushchenko wrote: >> my $book = Book->fetch(42); >> $book->publisher == object (whether manual/lazy fetch) >> $book->publisher_id == ID >> $book->author == list of objects (whether manual/lazy fetch) >> Is that right? Because it sounds from the discussion as if: >> $book->publisher == sometimes ID, sometimes object, >> depending on the configuration >> which IMO is very confusing. > > Yes, yes, yes! My point exactly! Aw, c'mon Simon ... I thought I'd convinced you on this one too. Now I have to convince Chris too? :-) Chris, have you gotten through our whole previous discussion? I think I've explained why I think this is NOT confusing (at least to me :-) I can try to restate my reasons if you think it would help. Ray |
From: Chris W. <ch...@cw...> - 2004-12-01 22:19:44
|
> On Dec 1, 2004, at 2:58 PM, Vsevolod (Simon) Ilyushchenko wrote: >>> my $book = Book->fetch(42); >>> $book->publisher == object (whether manual/lazy fetch) >>> $book->publisher_id == ID >>> $book->author == list of objects (whether manual/lazy fetch) >>> Is that right? Because it sounds from the discussion as if: >>> $book->publisher == sometimes ID, sometimes object, >>> depending on the configuration >>> which IMO is very confusing. >> >> Yes, yes, yes! My point exactly! > > Aw, c'mon Simon ... I thought I'd convinced you on this one too. Now I > have to convince Chris too? :-) > > Chris, have you gotten through our whole previous discussion? I think > I've explained why I think this is NOT confusing (at least to me :-) I > can try to restate my reasons if you think it would help. I believe I read through everything. I just think you're overestimating how much time and effort people are willing to spend to sort these out. We who have been using SPOPS for a while see the relationship between the configuration and the code as blindingly obvious. But for new people -- or even people who only use SPOPS occasionally -- it's not. And the idea that the behavior of an existing method could change based on configuration is very disconcerting. People like to think of their objects as somewhat stable -- even Perl people! -- and when we change the meaning under the hood like that it's untrustworthy. OTOH, the idea that something *new* happens -- like adding new methods -- as a result of configuration changes isn't so bad because you have the choice if you want to use the new feature or not. Chris -- Chris Winters (ch...@cw...) Building enterprise-capable snack solutions since 1988. |
From: Ray Z. <rz...@co...> - 2004-12-03 19:43:17
|
Chris, I only got a chance to skim over your e-mails the other day. Now that I've looked at your example more carefully, I don't think you got exactly what I'm proposing yet, so I'll try to clarify and then respond to some of your points. On Dec 1, 2004, at 2:27 PM, Chris Winters wrote: > book: > book_id, title, publisher_id > > publisher: > publisher_id, name > > book_author: > book_id, author_id > > author: > author_id, name > > Book 1 ---> 1 Publisher > Book 1 ---> * Author > > my $book = Book->fetch(42); > $book->publisher == object (whether manual/lazy fetch) > $book->publisher_id == ID > $book->author == list of objects (whether manual/lazy fetch) > > Is that right? Because it sounds from the discussion as if: > > $book->publisher == sometimes ID, sometimes object, > depending on the configuration > > which IMO is very confusing. In this example it appears that book has a 'publisher_id' field only, so there would be no 'publisher' method. For the sake of describing my proposal let me rename the 'publisher_id' field to 'publisher', which will still hold the publisher's id in the database. My proposal is as follows: First, regardless of the configuration ... 1. The publisher column in the database holds the ID of a publisher object. 2. The book object has a fetch_publisher() method for fetching the object (config can optionally specify different method name). $book->fetch_publisher == object 3. The book object has a publisher() method which always returns the value of the field in the book object. i.e. $book->publisher == $book->{publisher} For manual fetch configuration, the publisher field in the book object is ALWAYS the ID stored in publisher column in the database. $book->publisher == ID For auto/lazy fetch configuration, the publisher field in the book object is ALWAYS the publisher OBJECT whose ID is stored in the publisher column in the database. $book->publisher == object In the case of auto-fetch, the assignment, $book->{publisher} = $book->fetch_publisher, is executed when the $book object is fetched. In the lazy-fetch case it happens the first time $book->publisher is called (or $book->{publisher} is accessed). Again, the purpose of this is to allow me to think of my book's publisher field as a Publisher object. On Dec 1, 2004, at 5:24 PM, Chris Winters wrote: > I believe I read through everything. I just think you're overestimating > how much time and effort people are willing to spend to sort these > out. We > who have been using SPOPS for a while see the relationship between the > configuration and the code as blindingly obvious. > > But for new people -- or even people who only use SPOPS occasionally -- > it's not. And the idea that the behavior of an existing method could > change based on configuration is very disconcerting. People like to > think > of their objects as somewhat stable -- even Perl people! -- and when we > change the meaning under the hood like that it's untrustworthy. > > OTOH, the idea that something *new* happens -- like adding new methods > -- > as a result of configuration changes isn't so bad because you have the > choice if you want to use the new feature or not. I agree completely. But we are not changing the behavior of any existing methods. The publisher method has always returned the value held in the publisher field of the Perl object and that is what it still returns. What we are doing is adding the *new* ability to configure a field to be an auto/lazy-fetched object instead of simply the ID. This is a new feature which is why I agree that it is a good idea to use a name other than 'has_a' for this config. I really don't like the idea of a 'publisher_id' field configured to auto-fetch an object into a 'publisher' field. I think using two fields clutters up the object unnecessarily and creates other ambiguities that have to be resolved/documented. Since this wasn't envisioned when I wrote up the detailed description of my original design, I haven't thought through this in detail, but here are two issues off the top of my head: - What do you call the field where you stash the auto-fetched object? Always specify explicit name in config? - When auto-saving how should SPOPS handle inconsistencies between the id of the object in the publisher field and the value in the publisher_id field? Or to put it another way, if I want to re-assign a book's publisher, do I assign a new id to the publisher_id field or do I assign a new object to the publisher field? Seems much more messy to me for something that I consider to be completely new functionality (no backward compatibility issues to worry about). While it does eliminate the need for the manual fetch option, as Simon mentioned earlier (and probably even lazy fetch), it also eliminates one of the main features for me, which is the ability to take an existing field and treat it conceptually as an object. I suppose if someone insisted on being able to auto-fetch into a second field I wouldn't object too strongly to permitting that as an option, but I think it adds unnecessary complexity which would need to be documented. Ray Zimmerman Director, Laboratory for Experimental Economics and Decision Research 428-B Phillips Hall, Cornell University, Ithaca, NY 14853 phone: (607) 255-9645 |
From: Vsevolod (S. I. <si...@cs...> - 2004-12-04 15:57:29
|
Hi, > For auto/lazy fetch configuration, the publisher field in the book > object is ALWAYS the publisher OBJECT whose ID is stored in the > publisher column in the database. > $book->publisher == object > In the case of auto-fetch, the assignment, $book->{publisher} = > $book->fetch_publisher, is executed when the $book object is fetched. In > the lazy-fetch case it happens the first time $book->publisher is called > (or $book->{publisher} is accessed). Again, the purpose of this is to > allow me to think of my book's publisher field as a Publisher object. I don't think you can persuade me that this approach is neater. One of the best things about SPOPS is that it lets me think accurately in terms of my tables and my objects at the same time. Each database field name corresponds to an object variable name, period. Having the 'publisher' id column correspond to a 'publisher' object will break the neat mental picture. Plus, there is a huge downside that if I want to find out parent_id from a child (for the auto/lazy fetch), I'll have to call child->parent->id, which means that for the lazy fetch I have to retrieve the parent to get parent's id, though it's present in the child anyway. However, see more below. > - What do you call the field where you stash the auto-fetched object? > Always specify explicit name in config? This can be auto-deduced from class names (config hash keys, not Perl names). Currently SPOPS supports the config key {main_alias} to be used in such cases. > - When auto-saving how should SPOPS handle inconsistencies between the > id of the object in the publisher field and the value in the > publisher_id field? Or to put it another way, if I want to re-assign a > book's publisher, do I assign a new id to the publisher_id field or do I > assign a new object to the publisher field? It's possible to generate methods in such a way that all related id fields and objects are updated when a change happens. However, this is implementing caching all over again. This makes me think that maybe storing objects in fields is not such a good idea after all. (Currently SPOPS does not do it, and the code that I've written before Ray's objections does not do it either.) This brings up another problem, though. How should save() work in the many-to-many configuration when the parent is saved? Currently links_to calling addChild() method already causes an update of the linking table. Hence, when parent->save() is called, there is no need to update the linking table, though the framework can go ahead and save the children objects if auto-save is specified. Oh, wait! We don't have the children objects because we don't like storing objects in the fields! We have to re-fetch the children and then save them. Fortunately, I have the "cache_only" option for fetch_group() which will only return objects in cache, but we still have to make a database call because we don't know what IDs we want. In the one-to-many configuration, the addChild method updates the parent_id field in the child object and saves it, which amounts to auto-save, even though it may not have been requested. The code can be changed, though, to only save the new parent_id of the child object. The problem lies in the fact that we perform database saves not just in the save() method, but also in the add/removeChild operations. What happens if the add/remove operations do not access the database? We have to either 1) maintain a hidden list of dependent ids and work with it, or 2) go Ray's way and maintain a list of dependent objects. To maintain consistency we can either A) add logic to generated methods to keep those ids in sync, or B) prohibit saying child->parent_id if it can be obtained as child->parent->id. How do we deal, though, with manual fetch (when a dependent object is not stored) plus auto-save (when a dependent object/id should be stored to be saved)? I'll try to install Hibernate this weekend and see what they do there. I've looked at Fowler's "Patterns of Enterprise Application Architechture", but I don't think he brings up the problem of saves. Fowler gave me an idea, though - the various id fields pointing to other objects should be read-only. This way there is less chance of creating confusion. Currently the methods that return objects (parent() or children()) are only getters, but if we want to be really object-oriented, it would be nice to make them setters too, and if we can't set ids directly any more, setters may become manageable. > Seems much more messy to me for something that I consider to be > completely new functionality (no backward compatibility issues to worry > about). While it does eliminate the need for the manual fetch option, as > Simon mentioned earlier (and probably even lazy fetch), it also > eliminates one of the main features for me, which is the ability to take > an existing field and treat it conceptually as an object. But I like having parent_id() and parent() methods separate, because I should not completely forget that I'm dealing with a database. Simon -- Simon (Vsevolod ILyushchenko) si...@cs... http://www.simonf.com Terrorism is a tactic and so to declare war on terrorism is equivalent to Roosevelt's declaring war on blitzkrieg. Zbigniew Brzezinski, U.S. national security advisor, 1977-81 |
From: Ray Z. <rz...@co...> - 2004-12-06 19:36:46
|
On Dec 4, 2004, at 10:57 AM, Vsevolod (Simon) Ilyushchenko wrote: >> Again, the purpose of this is to allow me to think of my book's >> publisher field as a Publisher object. > > I don't think you can persuade me that this approach is neater. Maybe not ... but I hope you don't mind my trying :-) > One of the best things about SPOPS is that it lets me think accurately > in terms of my tables and my objects at the same time. Each database > field name corresponds to an object variable name, period. Having the > 'publisher' id column correspond to a 'publisher' object will break > the neat mental picture. Your mental picture is a totally valid one, and may often be the best one, but IMHO it is not the only valid one. The value in the database is the persistent version of the corresponding field in the objects, but need not be in an identical form. Developers should be able to define how one maps to the other in a way that is most useful to them. Take a look at SPOPS::Tool::DateConvert for an example of how SPOPS already allows this type of behavior as an option. (Btw, Chris, this a good example of where changing configuration forces you to change what you expect to find in that particular field of your object ... which I still find quite reasonable). I would say that if you want the "value in database eq value in object" mental picture, then all you're saying is that you want to configure for manual fetching. > Plus, there is a huge downside that if I want to find out parent_id > from a child (for the auto/lazy fetch), I'll have to call > child->parent->id, which means that for the lazy fetch I have to > retrieve the parent to get parent's id, though it's present in the > child anyway. However, see more below. Again, I think you're just saying that you want manual fetching. Why would you configure it for auto/lazy fetch if all you wanted was the id? If you only need the object sometimes and want the parent() method to always return an id, you can still always get to the object using the fetch_parent() method (which you can even rename if you like). >> - What do you call the field where you stash the auto-fetched >> object? Always specify explicit name in config? > > This can be auto-deduced from class names (config hash keys, not Perl > names). Currently SPOPS supports the config key {main_alias} to be > used in such cases. Unless I don't understand what you're suggesting, I think this goes back to our previous discussion about what to use for the config key, alias (old has_a syntax) or field name (new has_a syntax). Those aliases are only unique to the class, but we need something unique to the field to be able to handle the case where you have two fields which belong to the same class. >> - When auto-saving how should SPOPS handle inconsistencies between >> the id of the object in the publisher field and the value in the >> publisher_id field? Or to put it another way, if I want to re-assign >> a book's publisher, do I assign a new id to the publisher_id field or >> do I assign a new object to the publisher field? > > It's possible to generate methods in such a way that all related id > fields and objects are updated when a change happens. However, this is > implementing caching all over again. This makes me think that maybe > storing objects in fields is not such a good idea after all. > (Currently SPOPS does not do it, and the code that I've written before > Ray's objections does not do it either.) No storing objects in fields? You mean you want to eliminate auto/lazy-fetch as an option? Ah ... wait a minute ... are you assuming a cache? I suppose with a cache you could do auto/lazy-fetching without storing the auto-fetched objects in the primary object. But as I mentioned before (and I seem to remember Chris saying the same), I think caching and the new has_a (or relates_to) functionality should be treated separately. I have certainly been thinking of these as completely separate. For the purposes of designing this new functionality, I've *always* assumed no caching. Caching can be added as an independent option later. > This brings up another problem, though. How should save() work in the > many-to-many configuration when the parent is saved? Currently > links_to calling addChild() method already causes an update of the > linking table. Hence, when parent->save() is called, there is no need > to update the linking table, though the framework can go ahead and > save the children objects if auto-save is specified. Oh, wait! We > don't have the children objects because we don't like storing objects > in the fields! We have to re-fetch the children and then save them. > Fortunately, I have the "cache_only" option for fetch_group() which > will only return objects in cache, but we still have to make a > database call because we don't know what IDs we want. In my proposal, auto-saving is off by default for 'link'. So you don't save either the link or the child object. Details are in <http://sourceforge.net/mailarchive/forum.php? thread_id=110632&forum_id=3222>. > In the one-to-many configuration, the addChild method updates the > parent_id field in the child object and saves it, which amounts to > auto-save, even though it may not have been requested. The code can be > changed, though, to only save the new parent_id of the child object. It's only an auto-save if it happens automatically as a result of saving the object that fetched it. I view calling an add_*() method as an explicit (i.e. "manual") save operation on the child and think it should be documented as such. > The problem lies in the fact that we perform database saves not just > in the save() method, but also in the add/removeChild operations. What > happens if the add/remove operations do not access the database? We > have to either 1) maintain a hidden list of dependent ids and work > with it, or 2) go Ray's way and maintain a list of dependent objects. > To maintain consistency we can either A) add logic to generated > methods to keep those ids in sync, or B) prohibit saying > child->parent_id if it can be obtained as child->parent->id. I think all of these options are ugly, which is why I propose that we stick with saving the child object in the add_*() methods and documenting that clearly. > How do we deal, though, with manual fetch (when a dependent object is > not stored) plus auto-save (when a dependent object/id should be > stored to be saved)? This issue is also addressed in my proposal in the e-mail referenced above. Since you can't auto-save an object you don't have, auto-saving with manual fetch makes no sense, and attempting to configure a field that way should throw an error. >> Seems much more messy to me for something that I consider to be >> completely new functionality (no backward compatibility issues to >> worry about). While it does eliminate the need for the manual fetch >> option, as Simon mentioned earlier (and probably even lazy fetch), it >> also eliminates one of the main features for me, which is the ability >> to take an existing field and treat it conceptually as an object. > > But I like having parent_id() and parent() methods separate, because I > should not completely forget that I'm dealing with a database. So why not just use ... relates_to => { parent_id => { class => 'Parent::Class', fetch => { type => 'manual' name => 'parent' } } } Doesn't that give you what you want? Turn on caching and you even have your version of lazy-loading, right? The parent_id() method always gives you the id and the parent() method always gives you the (possibly cached) object. I think the only thing missing is auto-fetching (into cache only) without stashing the objects in a field. This 'auto-cache' option, to give it a name, obviously requires that caching be turned on and would have to either throw an error or revert to 'manual' if a cache were not present. But, I honestly don't see the need for this option. If you ALWAYS want to auto-fetch the secondary object(s), then you're never in the situation where you need to fetch them just to get the id. Both are always available. So why not just auto-fetch them into the field as I proposed? I think it's clear that there are differences in the way you and I like to think about our objects/data and the way we intend to use SPOPS, and this new functionality in particular. And this is all perfectly reasonable. My hope is that we can find a clean, consistent, easy-to-document/understand design that is as flexible as possible, and in particular, flexible enough to encompass both of our requirements. After all this discussion, I still think my original design, in all of its gory detail, accomplishes this. If there is a favorite mental picture or usage scenario that is excluded by my proposed design (or even made more difficult or confusing to configure), I don't yet see it. ( And I am trying :-) Ray Zimmerman Director, Laboratory for Experimental Economics and Decision Research 428-B Phillips Hall, Cornell University, Ithaca, NY 14853 phone: (607) 255-9645 |
From: Vsevolod (S. I. <si...@cs...> - 2004-12-07 02:11:59
|
Chris, Your approval is eventually needed here. Ray, > Again, I think you're just saying that you want manual fetching. Why > would you configure it for auto/lazy fetch if all you wanted was the > id? If you only need the object sometimes and want the parent() method > to always return an id, you can still always get to the object using > the fetch_parent() method (which you can even rename if you like). I think this finally got through. Even before reading the end of your email, I realized that, indeed, I am describing what you call "manual fetching". I was objecting to the presence of the word 'fetch', but I should be able to rename the method in the config. > Unless I don't understand what you're suggesting, I think this goes > back to our previous discussion about what to use for the config key, > alias (old has_a syntax) or field name (new has_a syntax). Those > aliases are only unique to the class, but we need something unique to > the field to be able to handle the case where you have two fields which > belong to the same class. Well, class alias and field name can be combined. My point is that a field is an integer and an object is an object, and I don't like mixing their names. In fact, the idea of substituting objects for ids (in your auto-save case) still weirds me out, even though I see where you are going with it. I'd like Chris's input on this. Another weirdness-inducing thing is that with your auto- or lazy-fetching, we effectively have a limited cache, as the fetched objects are stored in the hidden fields, but with manual fetching we do not have a hidden cache. Thus, simply changing SPOPS config will change the code behavior, and it may not even be noticed at once during compilation (unlike the id-to-object change), because the lack of caching only rears its head when you fetch the same object more than once. No matter how much you document this, it's not obvious for a beginner, and thus the warnings will be ignored. Your example with DateConvert which is also a configuration change is not quite in the same league, because then you are clearly changing a datum type. But a switch from auto- to manual fetching would look very innocent to a beginner. > No storing objects in fields? You mean you want to eliminate > auto/lazy-fetch as an option? > > Ah ... wait a minute ... are you assuming a cache? No, that would be even more weird. However, note my comment in the previous email that the foreign key ids may be made read-only. This will eliminate a way to create inconsistencies. >> How do we deal, though, with manual fetch (when a dependent object is >> not stored) plus auto-save (when a dependent object/id should be >> stored to be saved)? > > > This issue is also addressed in my proposal in the e-mail referenced > above. Since you can't auto-save an object you don't have, auto-saving > with manual fetch makes no sense, and attempting to configure a field > that way should throw an error. Actually, I can think of a situation when it does make sense. Suppose you edit a web page listing info for an author and all the books he wrote. First you retrieve the old author and book objects from the database, then you replace some values, then you save the author objects and would like the book objects become updated to. Even if they have not been fetched yet, they would get fetched when get an array of children to iterate over. You'll say that this is more suited for auto-fetching, but I may still want to use the manual fetching for the "neatness" reasons. :) Simon -- Simon (Vsevolod ILyushchenko) si...@cs... http://www.simonf.com Terrorism is a tactic and so to declare war on terrorism is equivalent to Roosevelt's declaring war on blitzkrieg. Zbigniew Brzezinski, U.S. national security advisor, 1977-81 |
From: Ray Z. <rz...@co...> - 2004-12-07 18:24:48
|
Simon, On Dec 6, 2004, at 9:11 PM, Vsevolod (Simon) Ilyushchenko wrote: > Another weirdness-inducing thing is that with your auto- or > lazy-fetching, we effectively have a limited cache, as the fetched > objects are stored in the hidden fields, but with manual fetching we > do not have a hidden cache. Thus, simply changing SPOPS config will > change the code behavior, and it may not even be noticed at once > during compilation (unlike the id-to-object change), because the lack > of caching only rears its head when you fetch the same object more > than once. No matter how much you document this, it's not obvious for > a beginner, and thus the warnings will be ignored. I don't see why this is so difficult, even for beginners. I think you are confusing things by thinking of it as a cache. A cache is something that sits between you and the database and intercepts fetch requests to avoid going to the database. That is not what we have here. We have two types of methods, accessor methods and fetch methods. In both the manual and the auto/lazy case, the object's accessor methods simply return what is stored in the Perl object. In the auto/lazy case, you expect these to be secondary objects that were fetched previously and stashed. In the manual case, you are simply saying that you don't want to store any secondary objects inside your primary object. So you don't expect to have any accessors that return objects. You still have convenience methods to fetch these objects, but these are "fetch" methods, not accessors. (I just looked back at my original proposal and I see that to be consistent with what I just said, I would need to change the 'manual_by' option so that the fetch method does NOT store the retrieved objects in the primary object. We have 'lazy_by' to handle the "fetch into the object on-demand" case.) > Your example with DateConvert which is also a configuration change is > not quite in the same league, because then you are clearly changing a > datum type. But a switch from auto- to manual fetching would look very > innocent to a beginner. I think that putting the "manual implies id, auto/lazy implies object" very prominently in the documentation should be sufficient, but you seem to think not. Can we come up with something other than 'manual/auto/lazy' for the fetch type that captures the differences in data type as well as differences in fetch behavior? We already talked about changing *_by to reverse_*, so how about renaming my proposed fetch types to something like the following? manual => manual auto => auto_object lazy => lazy_object manual_by => reverse_manual auto_by => reverse_auto_object lazy_by => reverse_lazy_object Does that help? >> No storing objects in fields? You mean you want to eliminate >> auto/lazy-fetch as an option? >> Ah ... wait a minute ... are you assuming a cache? > > No, that would be even more weird. However, note my comment in the > previous email that the foreign key ids may be made read-only. This > will eliminate a way to create inconsistencies. OK, now I am confused ... I must have missed what you mean by not storing objects in fields. If you're auto-fetching something that isn't going into cache and you're not storing it in the object, where is it going? Or did you just mean that you store it in the object under a key that isn't a persistent field? If so, I think that having the id in the persistent field and the corresponding object stored under some other key, with the id being read-only leads to more potential confusion than just storing the object in the persistent field like I propose. But I suppose this is a matter of opinion. In this case, if you want to set a new value for the secondary object you have to change the object if it's auto/lazy-fetch (since the id is read only), but you change the id if it's manual (since you don't even have the object). How about this ... would the following change to my proposal allow you to do exactly what you want? Suppose, we change the 'list_field' key to something like 'object_field' and in addition to it's current meaning for reverse direction fetches, we use it for forward direction fetches to optionally specify the key under which to store the auto/lazy-fetched object (i.e. the name of the object accessor). If not specified, the object would be fetched into the same field (just like the original proposal). For saves and removes in the auto/lazy-fetch case, I think the auto-fetched object should still override the id (and we could make the id field read-only in this case as you mentioned). I don't personally see the need for the additional complexity introduced with this option, but I guess it's not so bad ... just more to document and more for new users to digest. I *think* this would let you do what you want without preventing me from doing what I want, right? >>> How do we deal, though, with manual fetch (when a dependent object >>> is not stored) plus auto-save (when a dependent object/id should be >>> stored to be saved)? >> This issue is also addressed in my proposal in the e-mail referenced >> above. Since you can't auto-save an object you don't have, >> auto-saving with manual fetch makes no sense, and attempting to >> configure a field that way should throw an error. > > Actually, I can think of a situation when it does make sense. Suppose > you edit a web page listing info for an author and all the books he > wrote. First you retrieve the old author and book objects from the > database, then you replace some values, then you save the author > objects and would like the book objects become updated to. Even if > they have not been fetched yet, they would get fetched when get an > array of children to iterate over. > > You'll say that this is more suited for auto-fetching, but I may still > want to use the manual fetching for the "neatness" reasons. :) I'm not sure I understand your example (are you modifying any relationships or just the data?), so it's difficult to say which is more suited, but it seems to me if you are going to do manual fetching and updating, then manual saving should not be such a burden. If the book objects haven't been fetched yet, then they haven't been modified and don't need to be saved either, right? If you think the changes I proposed here will satisfy your requirements, I'll be happy to update my proposed spec to incorporate them. For me, having a design written up in nitty gritty detail helps in keeping everyone on the same page and might make it easier for Chris to keep up with all of our discussions. Ray Zimmerman Director, Laboratory for Experimental Economics and Decision Research 428-B Phillips Hall, Cornell University, Ithaca, NY 14853 phone: (607) 255-9645 |
From: Vsevolod (S. I. <si...@cs...> - 2004-12-11 15:14:54
|
Ray, > (I just looked back at my original proposal and I see that to be > consistent with what I just said, I would need to change the 'manual_by' > option so that the fetch method does NOT store the retrieved objects in > the primary object. We have 'lazy_by' to handle the "fetch into the > object on-demand" case.) Right, then it will precisely mirror the way SPOPS currently works. > We already talked about changing *_by to reverse_*, so how about > renaming my proposed fetch types to something like the following? > > manual => manual > auto => auto_object > lazy => lazy_object > manual_by => reverse_manual > auto_by => reverse_auto_object > lazy_by => reverse_lazy_object > > Does that help? This is clearer. > If so, I think that having the id in the persistent field and the > corresponding object stored under some other key, with the id being > read-only leads to more potential confusion than just storing the object > in the persistent field like I propose. But I suppose this is a matter > of opinion. In this case, if you want to set a new value for the > secondary object you have to change the object if it's auto/lazy-fetch > (since the id is read only), but you change the id if it's manual (since > you don't even have the object). That's another example of config-change-influencing-code that bothers me. After looking at Hibernate, I think I appreciate the consistency of the auto/lazy approach. The problem is that I like the manual approach too, but supporting them both makes the framework confusing. I think I'll go ahead and start implementing the dual approach, as things can be thrown out if needed, but this definitely has to run by Chris. I probably did not explain well the read-only id approach in Hibernate. There are no ids that are foreign keys at all. The only ids visible are the proper object ids, and even they are read-only. Also, Hibernate does let you to add/remove children without saving to the database. Once you save the primary object, all the changes to dependent objects and collections are saved, and if collection members have not changed, they are not re-saved. This actually makes the auto/lazy case more consistent, I think. > Suppose, we change the 'list_field' key to something like 'object_field' > and in addition to it's current meaning for reverse direction fetches, > we use it for forward direction fetches to optionally specify the key > under which to store the auto/lazy-fetched object (i.e. the name of the > object accessor). If not specified, the object would be fetched into the > same field (just like the original proposal). For saves and removes in > the auto/lazy-fetch case, I think the auto-fetched object should still > override the id (and we could make the id field read-only in this case > as you mentioned). Sounds good, and see above on ids. > I'm not sure I understand your example (are you modifying any > relationships or just the data?), Just the data. > so it's difficult to say which is more > suited, but it seems to me if you are going to do manual fetching and > updating, then manual saving should not be such a burden. If the book > objects haven't been fetched yet, then they haven't been modified and > don't need to be saved either, right? I'm saying that presumably the framework will generate for me an auto-save method that looks like: sub Author::save() { SUPER::save(); #Calling SPOPS methods for a single object foreach my $book ($self->books) { $book-save(); } } This will only work correctly with cache, though. If the books have been fetched before, I will get the same instances. But if you say that the code should function equally well with and without cache, than this will indeed make no sense in the manual-fetch case. > If you think the changes I proposed here will satisfy your requirements, > I'll be happy to update my proposed spec to incorporate them. For me, > having a design written up in nitty gritty detail helps in keeping > everyone on the same page and might make it easier for Chris to keep up > with all of our discussions. This may not be a bad thing. Simon -- Simon (Vsevolod ILyushchenko) si...@cs... http://www.simonf.com Terrorism is a tactic and so to declare war on terrorism is equivalent to Roosevelt's declaring war on blitzkrieg. Zbigniew Brzezinski, U.S. national security advisor, 1977-81 |
From: Chris W. <ch...@cw...> - 2004-12-01 18:48:35
|
>... > I think that both links_to and has_many share a fundamental flaw. They > both allow relationships to be defined from the "other" end, making it > possible to define conflicting behavior for a single relationship. This > unnecessarily complicates the error checking or precedence rules and > the documentation. I think always forcing the relationship to be > defined by the class with the linking field, and having a single syntax > (has_a) for defining it, makes for a much simpler, cleaner, more > consistent, easier to understand/document/implement, less error prone > design. Including both approaches (e.g. has_many and reverse_auto or > auto_by), in my opinion, only complicates and confuses things further. Good point. > For this reason, I think we should forget trying to make the new syntax > backward compatible. I think we should leave the old syntax in place > (but deprecated) as long as necessary for backward compatibility and > add a completely redesigned new syntax that folks can migrate to > gradually or as they need the new features. If it were up to me I > wouldn't touch links_to at all, and I wouldn't touch the old has_a > either. I'd just add a new configuration handler for the case where the > key of a 'has_a' spec matches one of the field names, in which case you > assume it is the new syntax I proposed, otherwise you assume it's a > class and use the old has_a semantics. My main concern is that we don't break existing code. Everyone seems to also be avoiding this so: great. Next is ensuring that new features are easily implemented, testable and maintainable. I suspect Ray's right in that overloading existing syntax will make all of these more difficult. I'm not against adding new syntax/configuration keys as necessary and if it takes care of both of these items, it's probably the best option. > [1] I confess I never really did understand the purpose of the alias. > What is the difference between the alias and the class? Isn't one of > them redundant? Yes, but IME most humans prefer the alias (e.g., 'news') vs. the class ('MyApp::Persist::News'). Plus it's less typing :-) Chris -- Chris Winters (ch...@cw...) Building enterprise-capable snack solutions since 1988. |
From: Chris W. <ch...@cw...> - 2004-12-01 18:39:26
|
Terribly sorry it took so long to respond. > Now that Chris has some time (hopefully) to consider the changes, I'd > like to submit a patch that would add has_many, auto-save/auto-delete > (optional) and full cache support (which is necessary for the > consistency circular references). Before I do that, I'd like to get > Chris's approval on the following architectural decisions: > > 1. There will be a basic cache which will always return the cached > object, not its copy. Ok. > 2. The fetch_many() now will also use cache. It will still make a call > to the database, because it won't know which ids to request from the > cache, though. Ok -- although I'd like to have the current behavior available if the cache is not enabled. (Where it fetches the actual data from the database, not just the IDs.) > 3. Auto-save and auto-delete are off by default. This is not what Ray > requested, but it's the only way to stay backwards compatible. With > auto-save on, the graph of object in memory will be traversed to > determine what has been changed. (Thus, I have deep_changed() instead of > changed().) I'm for backward compatibility. > 4. It will be possible to use a separate class that corresponds to the > linking table in links_to (think ClubMembership). The syntax I came up > with is: > Club => { > class => 'Club', > ... > links_to => > { 'Person' => > { table => 'ClubMembership', > link_class => 'ClubMembership', > link_class_alias => 'memberships', > alias => 'members', > to_id_field => 'member_id', > from_id_field => 'club_id' }, This looks good. > The ClubMembership class can have extra attributes (like date), but they > will be specified in a regular way in the config file. The new attribute > 'link_class_alias' will return an arrayref of the ClubMembership > instances. Again, this is not as elegant as Ray proposed, but backwards > compatible. Ok. > An open question is what to do with has_a. Currently it's implemented in > ClassFactory, not ClassFactory/DBI, but adding auto-save and auto-delete > functionality requires the has_a code to be DBI-aware. I can add extra > functionality into ClassFactory/DBI, but I am not sure which effect it > would have on LDAP storage, for example. We can tell ClassFactory::DBI to override the has_a from ClassFactory. (I don't remember how to do it off the top of my head, but I'm 98% sure it's already built-in.) > Oh, and a style question. I have marked all of my changes with "tags": > #Simon > #/Simon > and I commented out, not deleted, all replaced code, for ease of > reference. Is this necessary? No -- that's why we have CVS :-) Just use good comments when you commit. Chris -- Chris Winters (ch...@cw...) Building enterprise-capable snack solutions since 1988. |
From: Vsevolod (S. I. <si...@cs...> - 2004-12-01 18:49:08
|
Chris, Please review our subsequent discussion with Ray titled "has_many progress". He has pretty much converted me to his point of view. He proposes 1) keeping the cache optional (so I'll have to test two configuration), 2) storing dependent objects in hashes in objects instead of retrieving them every time (thus, with cache enabled, fetch_many can skip trips to the database), and most importantly, 3) using his syntax for defining relationships. I can still support the current syntax, of course, but forcing the new configs follow the old syntax results in awkward rules. I looked at some other frameworks, and what Ray proposes turns out to be fairly standard. I am sorry it's not as straightforward. So even though I already have the code for the currently-out-of-favor syntax, I'll have to change it to use Ray's syntax. Please let us know what you think. Simon Chris Winters wrote: > Terribly sorry it took so long to respond. > > >>Now that Chris has some time (hopefully) to consider the changes, I'd >>like to submit a patch that would add has_many, auto-save/auto-delete >>(optional) and full cache support (which is necessary for the >>consistency circular references). Before I do that, I'd like to get >>Chris's approval on the following architectural decisions: >> >>1. There will be a basic cache which will always return the cached >>object, not its copy. > > > Ok. > > >>2. The fetch_many() now will also use cache. It will still make a call >>to the database, because it won't know which ids to request from the >>cache, though. > > > Ok -- although I'd like to have the current behavior available if the > cache is not enabled. (Where it fetches the actual data from the database, > not just the IDs.) > > >>3. Auto-save and auto-delete are off by default. This is not what Ray >>requested, but it's the only way to stay backwards compatible. With >>auto-save on, the graph of object in memory will be traversed to >>determine what has been changed. (Thus, I have deep_changed() instead of >>changed().) > > > I'm for backward compatibility. > > >>4. It will be possible to use a separate class that corresponds to the >>linking table in links_to (think ClubMembership). The syntax I came up >>with is: >> Club => { >> class => 'Club', >> ... >> links_to => >> { 'Person' => >> { table => 'ClubMembership', >> link_class => 'ClubMembership', >> link_class_alias => 'memberships', >> alias => 'members', >> to_id_field => 'member_id', >> from_id_field => 'club_id' }, > > > This looks good. > > >>The ClubMembership class can have extra attributes (like date), but they >>will be specified in a regular way in the config file. The new attribute >>'link_class_alias' will return an arrayref of the ClubMembership >>instances. Again, this is not as elegant as Ray proposed, but backwards >>compatible. > > > Ok. > > >>An open question is what to do with has_a. Currently it's implemented in >>ClassFactory, not ClassFactory/DBI, but adding auto-save and auto-delete >>functionality requires the has_a code to be DBI-aware. I can add extra >>functionality into ClassFactory/DBI, but I am not sure which effect it >>would have on LDAP storage, for example. > > > We can tell ClassFactory::DBI to override the has_a from ClassFactory. (I > don't remember how to do it off the top of my head, but I'm 98% sure it's > already built-in.) > > >>Oh, and a style question. I have marked all of my changes with "tags": >>#Simon >>#/Simon >>and I commented out, not deleted, all replaced code, for ease of >>reference. Is this necessary? > > > No -- that's why we have CVS :-) Just use good comments when you commit. > > Chris > -- Simon (Vsevolod ILyushchenko) si...@cs... http://www.simonf.com Terrorism is a tactic and so to declare war on terrorism is equivalent to Roosevelt's declaring war on blitzkrieg. Zbigniew Brzezinski, U.S. national security advisor, 1977-81 |