Thread: [Plastic-devs] Temporary files and IDs for loadFromURL
Brought to you by:
johndavidtaylor,
thomasboch
|
From: John T. <jd...@ro...> - 2006-07-14 12:22:15
|
Hi All, We have a slight problem with the current ivo://votech.org/votable/loadFromURL message in that it uses the URL of the file as the table's ID. Here's the problem: The user is running three apps A, B, C. Application A creates a subset and wants to broadcast it to B and C. It does this by creating a temporary file and sending it to B and C. B & C dutifully make their own copies of it. The user then starts application D, and shuts down application A, which clears up its temporary file. How does the user then send the file from B to D and ensure that all three remaining apps think they are referring to the same table? The solution is to adopt the same argument list as ivo://.../votable/load and have a second argument specifying the ID. I suggest we make this second argument optional (but strongly recommended) to avoid breaking existing apps, and if it's missing assume that the id=url as we have now. Is anyone actually implementing ivo://.../votable/load? For symmetry's sake we could also make the id optional for this message, and use the table name if it's missing, but I'd prefer not to given the potential for lack of uniqueness. On the subject of optional arguments. I think it might be a good pattern that any optional args go into a struct tacked on after the mandatory arguments. That way they can be referred to by name rather than position. John -- ------------------------------------------------------------------------ AstroGrid/VOTech & Institute for Astronomy, Edinburgh Skype:johndavidtaylor <skype:johndavidtaylor?chat> ------------------------------------------------------------------------ |
|
From: Mark T. <m.b...@br...> - 2006-07-14 14:42:16
|
On Fri, 14 Jul 2006, John Taylor wrote: > Hi All, > We have a slight problem with the current > ivo://votech.org/votable/loadFromURL message in that it uses the URL of > the file as the table's ID. Here's the problem: > The user is running three apps A, B, C. Application A creates a subset > and wants to broadcast it to B and C. It does this by creating a > temporary file and sending it to B and C. B & C dutifully make their > own copies of it. The user then starts application D, and shuts down > application A, which clears up its temporary file. How does the user > then send the file from B to D and ensure that all three remaining apps > think they are referring to the same table? Actually, as per the trouble Richard had a short while ago, it's worse than that - TOPCAT at least doesn't wait until shutdown to clear up such temporary files, it does it as soon as the votable/loadFromURL request has returned. > The solution is to adopt the same argument list as > ivo://.../votable/load and have a second argument specifying the ID. I > suggest we make this second argument optional (but strongly recommended) > to avoid breaking existing apps, and if it's missing assume that the > id=url as we have now. After thinking about it for a while, I agree, but subject to the following comment. I seem to remember that the, or at least a reason we decided to use the URL for the table ID was that it meant if the table had some persistence (i.e. it was a non-temporary file in the filesystem, or something on a remote server) then two applications could know they were talking about the same thing even if they picked it up independently rather than one having got it from the other through PLASTIC. So for a file which is (expected to be) persistent the URL is a good choice for the identifier. So an application which injects a table into the PLASTIC system by sending a loadFromURL should use the URL itself as the ID (or, equivalently, supply no ID and let apps assume id=url, though I'm happy for clarity's sake to recommend that both arguments are supplied). In this way an application which happens to have loaded the same table from a non-PLASTIC source has a chance of knowing that fact. It also solves the problem of how the sending application is supposed to generate an ID guaranteed not to clash with someone else's. But if you've acquired a table by responding to a loadFromURL, then if you subsequently send a loadFromURL you should propagate the ID that came with it. This may or may not be the same as the URL you send. > Is anyone actually implementing ivo://.../votable/load? yes. > For symmetry's sake we could also make the id optional for this message, > and use the table name if it's missing, but I'd prefer not to given the > potential for lack of uniqueness. I agree there's no reason to change this (also could potentially cause backward compatibility problems). > On the subject of optional arguments. I think it might be a good > pattern that any optional args go into a struct tacked on after the > mandatory arguments. That way they can be referred to by name rather > than position. Hmm, interesting thought. My initial feeling is that for arguments which are optional but strongly recommended like the one you've suggested above (i.e. ones we should have put in in the first place but didn't realise until too late) there's not much advantage in this. However, if some commands end up with a forest of options it could be a good plan. -- Mark Taylor Astronomical Programmer Physics, Bristol University, UK m.b...@br... +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/ |
|
From: R H. <ric...@co...> - 2006-07-14 17:38:10
|
On Fri, 14 Jul 2006, Mark Taylor wrote: > On Fri, 14 Jul 2006, John Taylor wrote: > >> Hi All, >> We have a slight problem with the current >> ivo://votech.org/votable/loadFromURL message in that it uses the URL of >> the file as the table's ID. Here's the problem: >> The user is running three apps A, B, C. Application A creates a subset >> and wants to broadcast it to B and C. It does this by creating a >> temporary file and sending it to B and C. B & C dutifully make their >> own copies of it. The user then starts application D, and shuts down >> application A, which clears up its temporary file. How does the user >> then send the file from B to D and ensure that all three remaining apps >> think they are referring to the same table? > > Actually, as per the trouble Richard had a short while ago, it's worse > than that - TOPCAT at least doesn't wait until shutdown to clear up > such temporary files, it does it as soon as the votable/loadFromURL > request has returned. Sorry to keep harking on, but I still think things would be better if the hub had more control. If "A" registers an interesting file with the hub, the hub can maintain the host and file URL (a plasticFileID?) and send messages to third parties, counting how many are interested. "A" can unregister and as long as other apps are interested and registered -- ie the count remains above zero -- and only one temporary file need exist. This still leaves D and others who come along later in the dark, unless there is an option to request any 'hub-cached' files or unless say "B" specifically sends it. Either way, the count would increase by one for "D". It might also be possible to automate updates of a given file ID, and the earlier file could be removed, when all other apps have the new version. The 'plasticFileID' might (?) even be able to stay the same. > >> The solution is to adopt the same argument list as >> ivo://.../votable/load and have a second argument specifying the ID. I >> suggest we make this second argument optional (but strongly recommended) >> to avoid breaking existing apps, and if it's missing assume that the >> id=url as we have now. > Doesn't "load" have a second arg anyway? This seems to be the main difference between two. Help, I'm still confused about this one.. > After thinking about it for a while, I agree, but subject to the > following comment. I seem to remember that the, or at least a > reason we decided to use the URL for the table ID was that > it meant if the table had some persistence (i.e. it was a non-temporary > file in the filesystem, or something on a remote server) then two > applications could know they were talking about the same thing even > if they picked it up independently rather than one having got it from > the other through PLASTIC. So for a file which is (expected to be) > persistent the URL is a good choice for the identifier. This seems a bit dicey to me. Two files obtained from different sources at the same time (or vice versa) might not be the same ..?? > > So an application which injects a table into the PLASTIC system by > sending a loadFromURL should use the URL itself as the ID > (or, equivalently, supply no ID and let apps assume id=url, > though I'm happy for clarity's sake to recommend that both arguments > are supplied). In this way an application which happens to have > loaded the same table from a non-PLASTIC source has a chance of > knowing that fact. It also solves the problem of how the sending > application is supposed to generate an ID guaranteed not to > clash with someone else's. But if you've acquired a table by > responding to a loadFromURL, then if you subsequently send a > loadFromURL you should propagate the ID that came with it. > This may or may not be the same as the URL you send. > > >> Is anyone actually implementing ivo://.../votable/load? > > yes. > >> For symmetry's sake we could also make the id optional for this message, >> and use the table name if it's missing, but I'd prefer not to given the >> potential for lack of uniqueness. > Maybe it's because plastic is so new to me but I don't quite get the logic here. How many apps are going to break if you deprecate one message and introduce a few new ones (with clarification). Surely this is the time to make changes and revisions. Richard |
|
From: Mark T. <m.b...@br...> - 2006-07-17 10:22:56
|
On Fri, 14 Jul 2006, R Holbrey wrote: > On Fri, 14 Jul 2006, Mark Taylor wrote: > > > On Fri, 14 Jul 2006, John Taylor wrote: > > > >> Hi All, > >> We have a slight problem with the current > >> ivo://votech.org/votable/loadFromURL message in that it uses the URL of > >> the file as the table's ID. Here's the problem: > >> The user is running three apps A, B, C. Application A creates a subset > >> and wants to broadcast it to B and C. It does this by creating a > >> temporary file and sending it to B and C. B & C dutifully make their > >> own copies of it. The user then starts application D, and shuts down > >> application A, which clears up its temporary file. How does the user > >> then send the file from B to D and ensure that all three remaining apps > >> think they are referring to the same table? > > > > Actually, as per the trouble Richard had a short while ago, it's worse > > than that - TOPCAT at least doesn't wait until shutdown to clear up > > such temporary files, it does it as soon as the votable/loadFromURL > > request has returned. > > Sorry to keep harking on, but I still think things would be better if the > hub had more control. Richard, sorry for not making a response to your previous message, and (in advance) for the nature of the response to this one. Your ideas are quite reasonable ones, but the reason I'm not very enthusiastic about them is that part of the unwritten philosophy behind PLASTIC (at least in my understanding) is that it's simple in order to be difficult to break. For instance, reference counting to keep track of temporary resources is clearly the Right Thing to do if you have a controlled environment with reliable object destructors etc. But by its nature a hub is dealing with unreliable connections to unreliably implemented clients, and a client that forgets to unregister itself, or its interest in a file, could very easily end up leaving (possibly large) temporary files hanging around for much longer than they ought to be there. This sort of thing would be fairly easy to track down and fix in a single-process application, but practically impossible when you've got no idea what applications you might or might not be talking to and how well or badly they might behave. That's my take on it anyway - by all means other PLASTICkers chip in with your points of view (including, obviously, right of reply from Richard); I'm quite open to debating either the general or specific points and being persuaded otherwise. Either way, if we reach a consensus on this it might be a good idea to agree on a kind of explicit Manifesto or Philosophy of PLASTIC document which can be referred to to clarify this kind of debate/suggestion in the future. > Doesn't "load" have a second arg anyway? This seems to be the main > difference between two. Help, I'm still confused about this one.. the main difference is that in "loadFromURL" you pass the URL. In "load" the text of the votable is sent as the content of the first argument. Not very scalable, admittedly. > > After thinking about it for a while, I agree, but subject to the > > following comment. I seem to remember that the, or at least a > > reason we decided to use the URL for the table ID was that > > it meant if the table had some persistence (i.e. it was a non-temporary > > file in the filesystem, or something on a remote server) then two > > applications could know they were talking about the same thing even > > if they picked it up independently rather than one having got it from > > the other through PLASTIC. So for a file which is (expected to be) > > persistent the URL is a good choice for the identifier. > > This seems a bit dicey to me. Two files obtained from different sources at > the same time (or vice versa) might not be the same ..?? [assuming you mean "two files obtained from the same source at different times might not be the same"]: true, but in 99(?)% of cases it probably is the same. Here's a proposal for the manifesto: "A simple solution which does the right thing nearly all the time is better than a complicated one which does the right thing all the time". > >> For symmetry's sake we could also make the id optional for this message, > >> and use the table name if it's missing, but I'd prefer not to given the > >> potential for lack of uniqueness. > > Maybe it's because plastic is so new to me but I don't quite get the logic > here. How many apps are going to break if you deprecate one message and > introduce a few new ones (with clarification). Surely this is the time to > make changes and revisions. Well I don't think we should/will make this change. However, I think we currently consider ourselves in an early phase in which we can make some backwardly incompatible changes if we consider there are sufficient benefits since the developer community is still quite small and responsive. Mark -- Mark Taylor Astronomical Programmer Physics, Bristol University, UK m.b...@br... +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/ |
|
From: John T. <jon...@gm...> - 2006-07-17 16:04:14
|
Mark Taylor wrote: > On Fri, 14 Jul 2006, R Holbrey wrote: > > >> Doesn't "load" have a second arg anyway? This seems to be the main >> difference between two. Help, I'm still confused about this one.. >> > > the main difference is that in "loadFromURL" you pass the URL. > In "load" the text of the votable is sent as the content of the > first argument. Not very scalable, admittedly. > FWIW I think this is actually used in the Workbench though...in fact it looks as though they might appreciate a similar "in-line" version of the "load fits" message. Noel: is this for sending data that comes back by value from a CEA app? John |
|
From: Noel W. <Noe...@ma...> - 2006-07-17 16:20:09
|
On 17 Jul 2006, at 17:04, John Taylor wrote: > > > Mark Taylor wrote: >> On Fri, 14 Jul 2006, R Holbrey wrote: >> >>> Doesn't "load" have a second arg anyway? This seems to be the >>> main difference between two. Help, I'm still confused about this >>> one.. >>> >> >> the main difference is that in "loadFromURL" you pass the URL. >> In "load" the text of the votable is sent as the content of the >> first argument. Not very scalable, admittedly. >> > FWIW I think this is actually used in the Workbench though...in > fact it looks as though they might appreciate a similar "in-line" > version of the "load fits" message. Noel: is this for sending data > that comes back by value from a CEA app? > yep. that's it - data is already in memory, and if it's got that far it's obviously not tooo big. > John > |
|
From: John T. <jon...@gm...> - 2006-07-17 16:08:50
|
> > >>>> For symmetry's sake we could also make the id optional for this message, >>>> and use the table name if it's missing, but I'd prefer not to given the >>>> potential for lack of uniqueness. >>>> >> Maybe it's because plastic is so new to me but I don't quite get the logic >> here. How many apps are going to break if you deprecate one message and >> introduce a few new ones (with clarification). Surely this is the time to >> make changes and revisions. >> > > Well I don't think we should/will make this change. However, I think > we currently consider ourselves in an early phase in which we can > make some backwardly incompatible changes if we consider there are > sufficient benefits since the developer community is still quite small > and responsive. > Richard, this was me confusing the issue by thinking aloud. There's no good reason to make this ID optional except to make both loadVOTable messages have the same argument signature. Since (with hindsight) we'd actually prefer both messages to have the ID as mandatory, my raising the idea was daft. |
|
From: John T. <jd...@ro...> - 2006-07-17 16:25:36
|
My own view is that this is something we should seriously look at, but not rush into. At the moment we're dealing with files that are not going to exceed the user's disk space, not even close. So, apart from the processing overhead, it's not really a big deal if each application makes its own copy of the data. With remote URLs, there is the issue of multiple downloads of the same data, but the feeling a while back was that we can leave the user's cache to deal with this. If we do decide to do it, then I think it would have to be an optional extension of a hub, and accessed by messaging. In fact, it needn't be bundled with a hub - we can define a set of messages for a third-party application to "adopt" a file, do the reference counting and clean-up. Perhaps this "cache" will be bundled with a particular hub impl, perhaps it won't. The client application will have to deal with its presence or absence. I think we should proceed with some caution (contrary to my usual act-first-think-later behaviour!). As Tony pointed out during our meeting (Richard), this sort of thing has been done before and we should probably do some investigating. John > > Your ideas are quite reasonable ones, but the reason I'm not very > enthusiastic about them is that part of the unwritten philosophy behind > PLASTIC (at least in my understanding) is that it's simple in > order to be difficult to break. For instance, reference counting > to keep track of temporary resources is clearly the Right Thing > to do if you have a controlled environment with reliable object > destructors etc. But by its nature a hub is dealing with unreliable > connections to unreliably implemented clients, and a client that > forgets to unregister itself, or its interest in a file, could > very easily end up leaving (possibly large) temporary files hanging > around for much longer than they ought to be there. This sort of > thing would be fairly easy to track down and fix in a single-process > application, but practically impossible when you've got no idea > what applications you might or might not be talking to and how > well or badly they might behave. > > That's my take on it anyway - by all means other PLASTICkers chip > in with your points of view (including, obviously, right of reply > from Richard); I'm quite open to debating either the general or > specific points and being persuaded otherwise. Either way, if we > reach a consensus on this it might be a good idea to agree on a > kind of explicit Manifesto or Philosophy of PLASTIC document > which can be referred to to clarify this kind of debate/suggestion > in the future. > > > -- ------------------------------------------------------------------------ AstroGrid/VOTech & Institute for Astronomy, Edinburgh Skype:johndavidtaylor <skype:johndavidtaylor?chat> ------------------------------------------------------------------------ |
|
From: John T. <jon...@gm...> - 2006-07-18 09:44:12
|
> [assuming you mean "two files obtained from the same source at > different times might not be the same"]: > > true, but in 99(?)% of cases it probably is the same. Here's a > proposal for the manifesto: "A simple solution which does the right > thing nearly all the time is better than a complicated one which > does the right thing all the time". > I really like the idea of a manifesto describing what we're about. Anyone else care to summarize what we aim to do? Add your contribution to http://eurovotech.org/twiki/bin/view/VOTech/PlasticManifesto John |
|
From: John T. <jon...@gm...> - 2006-07-17 15:58:59
|
Mark Taylor wrote: > >> The solution is to adopt the same argument list as >> ivo://.../votable/load and have a second argument specifying the ID. I >> suggest we make this second argument optional (but strongly recommended) >> to avoid breaking existing apps, and if it's missing assume that the >> id=url as we have now. >> > > After thinking about it for a while, I agree, but subject to the > following comment. I seem to remember that the, or at least a > reason we decided to use the URL for the table ID was that > it meant if the table had some persistence (i.e. it was a non-temporary > file in the filesystem, or something on a remote server) then two > applications could know they were talking about the same thing even > if they picked it up independently rather than one having got it from > the other through PLASTIC. So for a file which is (expected to be) > persistent the URL is a good choice for the identifier. > I don't remember the discussion, but the idea seems pretty sensible. > [snip] > >> On the subject of optional arguments. I think it might be a good >> pattern that any optional args go into a struct tacked on after the >> mandatory arguments. That way they can be referred to by name rather >> than position. >> > > Hmm, interesting thought. My initial feeling is that for arguments > which are optional but strongly recommended like the one you've > suggested above (i.e. ones we should have put in in the first > place but didn't realise until too late) there's not much advantage > in this. However, if some commands end up with a forest of options > it could be a good plan. > I think the id parameter falls into the category of "should be mandatory from now on", so I agree, specifying it by position is easier and clearer. I was really thinking on the lines of (e.g.) if someone wanted to do something very app-specific such as send a color=blue parameter in with the showObjects message, though my original post isn't very clear. |