From: Vsevolod (S. I. <si...@cs...> - 2004-04-25 22:29:48
|
Chris and Ray, I have made some more progress in writing the new code and even more=20 progress in thinking about it. So here are several points: 1a. The whole thing won't work properly without caching turned on.=20 Assume that an A has many X'es, and the table X has a column 'a_id'.=20 When I pass a list of ids to the function A->list_of_x_add (similar to=20 linksto_add), the application may have various copies of X'es with those=20 ids floating around, some of whose field values may have been changed.=20 Since (at least in the auto-save case) it does not make sense to just=20 update the 'a_id' field in the database and not the rest of the fields,=20 we need to save the relevant X objects. But we don't know about them=20 without caching! So, Chris, if you have a chance, please review my bug report about=20 caching not working. The description of the issue is verbose, but the=20 fix is trivial. 1b. However, even the current cache is inadequate for the task. Right=20 now, the first time an object is retrieved, it's saved in the cache. If=20 it's retrieved again, a copy of the object is returned. Thus, whoever=20 asked for the object first, has the "master" copy, meaning that=20 everybody else will see his changes. But if other requestors make=20 changes and the first requestor's copy is saved, their changes will be=20 lost. Here is a sample code that illustrates the problem. Assume that A still=20 has many X'es. my $a =3D A->fetch(1); my $x =3D A->list_of_x->[0]; my $a1 =3D $x->myA; Here, logically, $a and $a1 refer to the same object with the same ID.=20 But they are different Perl objects. If I change $a1 and save $a, my=20 changes to $a1 will be lost. Is there a reason the cache does not simply return the stored object? 1c. Normally, calling $a->list_of_x_add($x) will make sure that the=20 changes to the 'a_id' field in the X table are saved. There is a fun=20 special case, though - what if $a has been just created and not saved=20 yet? There are two possible behaviors: a) save $a behind the scenes to=20 obtain a_id, or b) throw an error requiring the user to call save()=20 explicitly. Variant a) makes list_of_x_add() behave similarly to the=20 normal case, but does something that the user may not want. Variant b),=20 conversely, exposes some inner workings of SPOPS to the user, but does=20 not do a potentially undesirable save. What is preferable here? 2. You may have noticed that I used 'has_many', not 'has_a' as Ray=20 originally suggested. I do think it's cleaner to separate them, but if=20 you insist, I will eventually roll them back into one - I just separate=20 them now for the ease of coding. 3. For the many-to-many 'links_to' case (where A has-many Bs via the=20 linking table X), Ray suggested having the configuration hash in the X=20 class, not in the A class where 'links_to' lives now. This has the added=20 benefit of adding more fields to X if necessary, but IMO also a major=20 drawback of changing the API. Why don't we try to keep the API as=20 constant as possible and leave the 'links_to' stanza in A? We can add=20 new hash keys to specify extra X fields and to create a Perl class=20 corresponding to X if necessary. 4. Ray also suggested two different APIs for the simple has_a case (an X=20 has one A). If a dependent object is autofetched, $x->myA returns an=20 instance of A. However, if the fetch is manual, $x->myA returns a_id,=20 and only $x->fetch_myA returns an actual object. Is there a reason to do=20 it differently? 5. The issue of avoiding circular saves can be addressed simply by=20 setting a certain flag after an object is saved and checking for this=20 flag each time an object is reached in the relationship graph during the=20 save. (Obviously, this will require full caching as described above.)=20 Let me know if this for some reason won't work. Simon --=20 Simon (Vsevolod ILyushchenko) si...@cs... http://www.simonf.com The unknown is honoured, the known is neglected - until all is known. The C=FA Chulaind myth |
From: Ray Z. <rz...@co...> - 2004-04-26 15:46:20
|
Hi Simon, This is going to be a quick, first-pass response to the issues you raise ... For number 1, let me respond with a general comment about caching and keeping multiple Perl objects in sync, etc, since I think that is the core issue. First, let me say that I have not used caching at the SPOPS level, but I have used caching of SPOPS objects at the application level in my work. In my opinion, there are two ways of handling this issue. 1. SPOPS assumes that the application is keeping track of how many copies of an object are in memory and which ones have unsaved changes, etc. The only MASTER copy of the object is the saved one. In this case SPOPS should not do anything special to try to keep things in sync, that's the job of the application level. 2. SPOPS always assumes a 1 to 1 correspondence between the Perl object and the object in the database. Do caching at the SPOPS level with SPOPS making sure there is never more than one copy of the object in memory. Copies of the object are simply multiple references to a single cached object. This is the approach used by Tangram if I'm not mistaken. Unless I'm missing something, this seems pretty clean and straightforward. However, it doesn't doesn't address all of the consistency issues in contexts where you have multiple processes running simultaneously (e.g. multiple Apache children), where there is one copy of an object in the database, but multiple copies in the memories of various processes, in their individual caches. To address this at the SPOPS caching level you have to use some sort of a shared memory cache with synchronization/locking mechanisms, which in my opinion takes you back to handling the issue at the application level again. So the bottom line, for me is, unless you are in the context where you only have one process running at a time (not the case for my apps), you ALWAYS have to handle the issue at the application level anyway. Having SPOPS do the caching as in (2) can help you with that, but assumptions can never be made at the SPOPS level that even a single cached object is necessarily in sync with the database since some other process may have changed it behind your back. On Apr 25, 2004, at 6:29 PM, Vsevolod (Simon) Ilyushchenko wrote: > 1a. The whole thing won't work properly without caching turned on. > Assume that an A has many X'es, and the table X has a column 'a_id'. > When I pass a list of ids to the function A->list_of_x_add (similar to > linksto_add), the application may have various copies of X'es with > those ids floating around, some of whose field values may have been > changed. Since (at least in the auto-save case) it does not make sense > to just update the 'a_id' field in the database and not the rest of > the fields, we need to save the relevant X objects. But we don't know > about them without caching! My proposal here (found in last section under "Fetch" in my 7/3/01 post) was to pass in the X objects, not the ids ... For auto_by and lazy_by, two additional methods are created in A, one for adding objects to its list of X's and one for removing objects from it. These can only be used after the A object has been saved. Their primary purpose is to keep the list in memory in sync with what's in the database, so when using auto_by or lazy_by it's a good idea to use only these methods to add or remove corresponding X's. If the 'name' parameter is present, the methods are named add_<name> and remove_<name>. If the 'name' parameter is not present they are named add_to_<list_field> and remove_from_<list_field>. The method to add X's takes an X object or an arrayref of X objects as inputs and returns the same object or arrayref to the objects after saving them. The method to remove X's takes an id or arrayref of ids and returns the number of X's successfuly removed. > 1b. However, even the current cache is inadequate for the task. Right > now, the first time an object is retrieved, it's saved in the cache. > If it's retrieved again, a copy of the object is returned. Thus, > whoever asked for the object first, has the "master" copy, meaning > that everybody else will see his changes. But if other requestors make > changes and the first requestor's copy is saved, their changes will be > lost. I'm not really familiar with SPOPS caching and admit I haven't paid attention to your previous posts on caching, so educate me here. My understanding is the SPOPS doesn't implement caching, it just provides hooks to do it. So is this returning of a new copy of a cached object instead of a reference to the existing cached object a feature(bug) of the hooks in SPOPS or of a particular implementation of caching? As I mentioned above, I think any caching at the SPOPS level should make sure there is only ONE copy of the object in memory. I don't see the purpose of having a "master" copy in memory with other copies of it. The only "master" copy of the object is the one in the datastore. > Here is a sample code that illustrates the problem. Assume that A > still has many X'es. > > my $a = A->fetch(1); > my $x = A->list_of_x->[0]; > my $a1 = $x->myA; > > Here, logically, $a and $a1 refer to the same object with the same ID. > But they are different Perl objects. If I change $a1 and save $a, my > changes to $a1 will be lost. > Is there a reason the cache does not simply return the stored object? I agree. An SPOPS level cache should always return the same object, not a copy. > 1c. Normally, calling $a->list_of_x_add($x) will make sure that the > changes to the 'a_id' field in the X table are saved. There is a fun > special case, though - what if $a has been just created and not saved > yet? There are two possible behaviors: a) save $a behind the scenes to > obtain a_id, or b) throw an error requiring the user to call save() > explicitly. Variant a) makes list_of_x_add() behave similarly to the > normal case, but does something that the user may not want. Variant > b), conversely, exposes some inner workings of SPOPS to the user, but > does not do a potentially undesirable save. What is preferable here? I say (b). Quoting from the same paragraph of my proposal again "These can only be used after the A object has been saved." ... implying that it throws an exception otherwise. I don't think this necessarily exposes inner workings of SPOPS to the user. I think it just needs to be documented that these methods throw exceptions if called for objects that are not saved. > 2. You may have noticed that I used 'has_many', not 'has_a' as Ray > originally suggested. I do think it's cleaner to separate them, but if > you insist, I will eventually roll them back into one - I just > separate them now for the ease of coding. I'm not sure I've seen how you're using 'has_many' in the configuration. It sounds to me though that it's putting the definition of the relationship at the other end, that's all. Does this then replace the manual_by|auto_by|lazy_by configuration syntax? I guess I would vote for sticking with only the 'has_a' unless and until I see the full detail of the syntax spelled out and can see that it doesn't bring up new issues. I spent a lot of time on the syntax I proposed and am fairly comfortable that it is general and consistent. > 3. For the many-to-many 'links_to' case (where A has-many Bs via the > linking table X), Ray suggested having the configuration hash in the X > class, not in the A class where 'links_to' lives now. This has the > added benefit of adding more fields to X if necessary, but IMO also a > major drawback of changing the API. Why don't we try to keep the API > as constant as possible and leave the 'links_to' stanza in A? We can > add new hash keys to specify extra X fields and to create a Perl class > corresponding to X if necessary. On this point (and the previous one now that I think about it), my approach regarding where to put the configuration hash was to put it in the class which has the fields. The configuration hash for a class defines the meaning of each of its fields. It can also add behavior related to those fields to other classes. I think it's essential that we are consistent about where we put configuration. You propose putting the configuration in A ... but why A and not B? > 4. Ray also suggested two different APIs for the simple has_a case (an > X has one A). If a dependent object is autofetched, $x->myA returns an > instance of A. However, if the fetch is manual, $x->myA returns a_id, > and only $x->fetch_myA returns an actual object. Is there a reason to > do it differently? My thought here was that if myA is an auto or lazy-fetched field, then you always assume that $x->{myA} is an object. Otherwise, you always assume that $x->{myA} is an id. You still have a convenience method to fetch the corresponding object if you need it, but even after fetching the object, $x->{myA} is still just the id. It just seemed the most consistent to me. Otherwise, for manual fetches you end up with the case where you don't know when you access $x->{myA} whether to expect an id or an object, since it depends on whether or not you've done the manual fetch. > 5. The issue of avoiding circular saves can be addressed simply by > setting a certain flag after an object is saved and checking for this > flag each time an object is reached in the relationship graph during > the save. (Obviously, this will require full caching as described > above.) Let me know if this for some reason won't work. Why does this require full caching? Maybe an example would help. I don't think any of what I proposed requires caching, just the assumption that consistency is being maintained, with or without a cache, by the application level logic. Thanks again, Simon, for all your work on this area ... Ray Zimmerman Director, Laboratory for Experimental Economics and Decision Research 428-B Phillips Hall, Cornell University, Ithaca, NY 14853 phone: (607) 255-9645 fax: (815) 377-3932 |
From: Vsevolod (S. I. <si...@cs...> - 2004-04-29 18:46:03
|
Ray, Thanks for your comments. I hope you don't construe my comments as criticism - they mostly stem from my ignorance of your intent. You have clearly spent a lot of time thinking this over. > So the bottom line, for me is, unless you are in the context where you > only have one process running at a time (not the case for my apps), you > ALWAYS have to handle the issue at the application level anyway. Having > SPOPS do the caching as in (2) can help you with that, but assumptions > can never be made at the SPOPS level that even a single cached object is > necessarily in sync with the database since some other process may have > changed it behind your back. In general, I agree. It's just it was never explicitly mentioned in the docs that the caching mechanism does not return the cached object itself, but its copy. See SPOPS::get_cached_object(). It returns $class->new($item_data), where $item_data is the cached object. Of course, if the caching mechanism does not store the object itself but its somehow serialized representation, the above code would be correct, but I think such approach is an overgeneralization. > My proposal here (found in last section under "Fetch" in my 7/3/01 post) > was to pass in the X objects, not the ids ... I am probably a lazy bum, but the current implementation lets you pass the X objects as well as ids, and I found it convenient. >> 2. You may have noticed that I used 'has_many', not 'has_a' as Ray >> originally suggested. I do think it's cleaner to separate them, but if >> you insist, I will eventually roll them back into one - I just >> separate them now for the ease of coding. > > > I'm not sure I've seen how you're using 'has_many' in the configuration. > It sounds to me though that it's putting the definition of the > relationship at the other end, that's all. Does this then replace the > manual_by|auto_by|lazy_by configuration syntax? I would like to replace the *_by syntax for now with has_many just because it's conceptually clearer for me during coding. Like I said, I have no issues with using has_a in the final version. > On this point (and the previous one now that I think about it), my > approach regarding where to put the configuration hash was to put it in > the class which has the fields. The configuration hash for a class > defines the meaning of each of its fields. It can also add behavior > related to those fields to other classes. I think it's essential that we > are consistent about where we put configuration. You propose putting the > configuration in A ... but why A and not B? No reason. I think it's fully symmetrical, so it can go in any of them. It's a tradeoff between a rigorous approach and user convenience, and my feeling is that convenience should win most of the time - I've heard a well-known Perl figure say that SPOPS is difficult to understand already as it is. > My thought here was that if myA is an auto or lazy-fetched field, then > you always assume that $x->{myA} is an object. Otherwise, you always > assume that $x->{myA} is an id. You still have a convenience method to > fetch the corresponding object if you need it, but even after fetching > the object, $x->{myA} is still just the id. It just seemed the most > consistent to me. Otherwise, for manual fetches you end up with the case > where you don't know when you access $x->{myA} whether to expect an id > or an object, since it depends on whether or not you've done the manual > fetch. I see. I guess my problem is that I am not sure why the manual fetch mode is even necessary when lazy fetching is available. Could you please give an example? > Why does this require full caching? Maybe an example would help. I don't > think any of what I proposed requires caching, just the assumption that > consistency is being maintained, with or without a cache, by the > application level logic. This issues stems from the same problem with the caching mechanism returning multiple copies of the same object. However, if caching is non-mandatory, then avoiding circular saves needs another approach. It has to be implemented via a hash that is either passed around among the arguments to save() or stored globally, and for each object that has been saved a value corresponding to the object's class name and id is stored in this hash. If the class name and id are already there, then the save operation is not performed. However, I'd rather not mess with the arguments of save(), which leaves the global hash. It's not pretty, but I think it's the only remaining solution. Simon -- Simon (Vsevolod ILyushchenko) si...@cs... http://www.simonf.com Terrorism is a tactic and so to declare war on terrorism is equivalent to Roosevelt's declaring war on blitzkrieg. Zbigniew Brzezinski, U.S. national security advisor, 1977-81 |
From: Teemu A. <te...@io...> - 2004-04-26 17:13:07
Attachments:
OpenInteract-1.60-sending_filehandle_contents.patch
|
Hi, I modified the OpenInteract.pm function send_static_file() in OI 1.60 to allow sending a filehandle instead of passing a filename or returning file contents. I have a system that generates a temporary file with a filehandle and when the temporary filehandle object goes out of scope, the temporary file is destroyed. This was a problem if I wanted to send the contents of the file to the client. I had two options: 1. Modifying $R->{page}{content_type} and returning the file contents from the OI Handler. This is a problem, I don't want to load an entire file in the memory. 2. Modifying $R->{page}{content_type} and using $R->{page}{send_file} for returning contents of a file. This only allows the filename and since I'm using a temporary filehandle object that goes out of scope, I had to modify OpenInteract.pm to allow returning filehandles instead of filenames. Is there a better method? I think OpenInteract was not scalable enough in this kind of fundamental issue. Is this different in OI2? -- Sincerely, Teemu Arina Ionstream Oy / Dicole Komeetankuja 4 A 02210 Espoo FINLAND Tel: +358-(0)50 - 555 7636 http://www.dicole.org |
From: Teemu A. <te...@io...> - 2004-04-27 10:35:58
|
Hello Chris, sorry about not paying attention into this until now. > I agree with your points, but my three main objections to using the > base language as the key are: > * What do you do with long strings? (One or more sentences) I thought about this and there are of course ups and downs of both approaches. Maybe allowing both methods? I mean, short sentences and system messages that usually contain something dynamic like "Hello [_1]" are used as the key. Pros are: * self-documenting * easier to translate * works as gettext was originally designed to work Very long sentences (context sensitive help for example) are usually static and hard to maintain as short, so a key like "news.help1" could be used instead. > * What happens when you change the base language text? Well, you have a couple of options. One is that you just continue using the "wrong" sentence as the key and just change the translation in the english one to correct the typo. Obviously a better idea is to change all the keys. For this purpose some scripts should be written to rewrite the base key for all translations. I made one for my work in other projects. > * Many of the base language keys will be quite long. Actually by my experience most sentences are short. Only in help texts, static content and in some other special occassions you have very long sentences at once in your code. > For the first objection I guess you could just use the first part of it > the long string. This kind of eliminates the benefit of using the base > language as key, it"s not used as often. Maybe my idea to use a short key like "news.help1" in case of long sentences is better. > The second objection seems like a big problem. How often does it > happen? Not sure. It happens mostly in an intence development process. I tend to write an application almost into the first stable stage before adding any translation. This way I avoid correcting the translation files all the time. Things tend to get added and removed. Changes to strings happen usually only to correct typos (more common if you are non-native english speaker) and if your string contains dynamic data and you add or change the way things are presented to the user. While I write the first version, I use the english string as the base key. When I'm ready with the application, I extract the base english .po file with a script from the perl code I have written. Since I use english strings as the key, I don't have to manually write the base english .po file, I just use a script to extract the strings. After this changes to base translation do not occur very often. It is more common that new strings get added than things get changed or removed. For this purpose I have a script that adds the same string for translation into every .po file of the application, avoiding manual work. > Okay, I think I could go along with this. I would like the option of > putting a package"s keys in the global namespace for applications using > multiple packages. But the default should be that when requesting a > message key you only look in your package"s space and the global space > if it"s not found there. Good, this is especially useful if one uses english strings as base key. Even if the english string could be universal all around the system, often a translation is not. Many languages are very much dependant of the context where the string occurs and an english string in two places could have two different translations in one language. > True, but I think most uses of the messages will be in templates where > it"s much more difficult to parse groups of text out into individual > entries. In my applications I avoid adding strings into templates. Why? I think the strings are part of the application business logic and should reside as part of my code. I use templates only as widgets of the web interface I'm about to generate. If I later want to use a different system other than TemplateToolkit for web pages, I don't have to move the strings from another interface code to another. For me, only help pages and other static content resides somewhere else. This is simply a matter of taste, I believe many programmers put most of the strings in their templates. I also created an API for writing applications for OI in a simple manner. This supports the idea of having strings as part of your code. See: http://www.dicole.fi/docs/dicole_api_overview.html I look forward to the I18N support for OI. I plan to port my API and applications to OI2 once it becomes available, I'm not willing to write my own I18N support for OI1. -- Sincerely, Teemu Arina Ionstream Oy / Dicole Komeetankuja 4 A 02210 Espoo FINLAND Tel: +358-(0)50 - 555 7636 http://www.dicole.org |
From: Chris W. <ch...@cw...> - 2004-04-28 01:28:19
|
On Apr 26, 2004, at 1:01 PM, Teemu Arina wrote: ... > I have a system that generates a temporary file with a filehandle and > when the > temporary filehandle object goes out of scope, the temporary file is > destroyed. This was a problem if I wanted to send the contents of the > file to > the client. I had two options: > 1. > Modifying $R->{page}{content_type} and returning the file contents > from the OI > Handler. This is a problem, I don't want to load an entire file in the > memory. > > 2. > Modifying $R->{page}{content_type} and using $R->{page}{send_file} for > returning contents of a file. This only allows the filename and since > I'm > using a temporary filehandle object that goes out of scope, I had to > modify > OpenInteract.pm to allow returning filehandles instead of filenames. Yes, the second one is definitely better and is a good idea. I'm committing a version of it to CVS. However, there's one thing we may need to change: If you specify a filehandle in $R->{page}{send_file} you must specify in '$R->{page}{content_type}' otherwise we won't know what to send the client, and some clients don't like that... I've implemented it with a default MIME type of unspecified ('application/octet-stream'). > Is there a better method? I think OpenInteract was not scalable enough > in this > kind of fundamental issue. Is this different in OI2? Well, AFAIK it hasn't come up before -- most of the time you don't have this temporary file restriction. That said, I can just added 'send_filehandle' to OI2::Response as a companion property to 'send_file'. (You'll still need to set the content type...) Chris -- Chris Winters Creating enterprise-capable snack systems since 1988 |
From: Teemu A. <te...@io...> - 2004-04-28 07:59:53
|
> If you specify a filehandle in $R->{page}{send_file} you must specify > in '$R->{page}{content_type}' otherwise we won't know what to send the > client, and some clients don't like that... I've implemented it with a > default MIME type of unspecified ('application/octet-stream'). Thanks Chris for a fast response to this problem. I was setting the content-type myself anyway. I think no-one would return a filehandle and not set the content-type, since it usually is about returning something for download. application/octet-stream is a good content-type for unknown file downloads, all browsers force the file download dialog with it. -- Sincerely, Teemu Arina Ionstream Oy / Dicole Komeetankuja 4 A 02210 Espoo FINLAND Tel: +358-(0)50 - 555 7636 http://www.dicole.fi |