Thread: [Plastic-devs] Temporary files and IDs for loadFromURL

Brought to you by: johndavidtaylor, thomasboch

plastic-devs

[Plastic-devs] Temporary files and IDs for loadFromURL

From: John T. <jd...@ro...> - 2006-07-14 12:22:15

Hi All,
We have a slight problem with the current 
ivo://votech.org/votable/loadFromURL message in that it uses the URL of 
the file as the table's ID.    Here's the problem:
The user is running three apps A, B, C.  Application A creates a subset 
and wants to broadcast it to B and C.  It does this by creating a 
temporary file and sending it to B and C.  B & C dutifully make their 
own copies of it.   The user then starts application D, and shuts down 
application A, which clears up its temporary file.  How does the user 
then send the file from B to D and ensure that all three remaining apps 
think they are referring to the same table?

The solution is to adopt the same argument list as 
ivo://.../votable/load and have a second argument specifying the ID.   I 
suggest we make this second argument optional (but strongly recommended) 
to avoid breaking existing apps, and if it's missing assume that the 
id=url as we have now.

Is anyone actually implementing ivo://.../votable/load?
For symmetry's sake we could also make the id optional for this message, 
and use the table name if it's missing, but I'd prefer not to given the 
potential for lack of uniqueness.

On the subject of optional arguments.  I think it might be a good 
pattern that any optional args go into a struct tacked on after the 
mandatory arguments.  That way they can be referred to by name rather 
than position.

John



-- 
------------------------------------------------------------------------
AstroGrid/VOTech
&
Institute for Astronomy, Edinburgh
Skype:johndavidtaylor <skype:johndavidtaylor?chat>

------------------------------------------------------------------------

Re: [Plastic-devs] Temporary files and IDs for loadFromURL

From: Mark T. <m.b...@br...> - 2006-07-14 14:42:16

On Fri, 14 Jul 2006, John Taylor wrote:

> Hi All,
> We have a slight problem with the current 
> ivo://votech.org/votable/loadFromURL message in that it uses the URL of 
> the file as the table's ID.    Here's the problem:
> The user is running three apps A, B, C.  Application A creates a subset 
> and wants to broadcast it to B and C.  It does this by creating a 
> temporary file and sending it to B and C.  B & C dutifully make their 
> own copies of it.   The user then starts application D, and shuts down 
> application A, which clears up its temporary file.  How does the user 
> then send the file from B to D and ensure that all three remaining apps 
> think they are referring to the same table?

Actually, as per the trouble Richard had a short while ago, it's worse
than that - TOPCAT at least doesn't wait until shutdown to clear up
such temporary files, it does it as soon as the votable/loadFromURL
request has returned.

> The solution is to adopt the same argument list as 
> ivo://.../votable/load and have a second argument specifying the ID.   I 
> suggest we make this second argument optional (but strongly recommended) 
> to avoid breaking existing apps, and if it's missing assume that the 
> id=url as we have now.

After thinking about it for a while, I agree, but subject to the 
following comment.  I seem to remember that the, or at least a
reason we decided to use the URL for the table ID was that 
it meant if the table had some persistence (i.e. it was a non-temporary
file in the filesystem, or something on a remote server) then two
applications could know they were talking about the same thing even
if they picked it up independently rather than one having got it from
the other through PLASTIC.  So for a file which is (expected to be) 
persistent the URL is a good choice for the identifier.

So an application which injects a table into the PLASTIC system by
sending a loadFromURL should use the URL itself as the ID
(or, equivalently, supply no ID and let apps assume id=url,
though I'm happy for clarity's sake to recommend that both arguments 
are supplied).  In this way an application which happens to have 
loaded the same table from a non-PLASTIC source has a chance of 
knowing that fact.  It also solves the problem of how the sending 
application is supposed to generate an ID guaranteed not to 
clash with someone else's.  But if you've acquired a table by 
responding to a loadFromURL, then if you subsequently send a 
loadFromURL you should propagate the ID that came with it.  
This may or may not be the same as the URL you send.

> Is anyone actually implementing ivo://.../votable/load?

yes.

> For symmetry's sake we could also make the id optional for this message, 
> and use the table name if it's missing, but I'd prefer not to given the 
> potential for lack of uniqueness.

I agree there's no reason to change this (also could potentially cause
backward compatibility problems).

> On the subject of optional arguments.  I think it might be a good 
> pattern that any optional args go into a struct tacked on after the 
> mandatory arguments.  That way they can be referred to by name rather 
> than position.

Hmm, interesting thought.  My initial feeling is that for arguments
which are optional but strongly recommended like the one you've 
suggested above (i.e. ones we should have put in in the first 
place but didn't realise until too late) there's not much advantage
in this.  However, if some commands end up with a forest of options 
it could be a good plan.

-- 
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
m.b...@br... +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/

Re: [Plastic-devs] Temporary files and IDs for loadFromURL

From: R H. <ric...@co...> - 2006-07-14 17:38:10

On Fri, 14 Jul 2006, Mark Taylor wrote:

> On Fri, 14 Jul 2006, John Taylor wrote:
>
>> Hi All,
>> We have a slight problem with the current
>> ivo://votech.org/votable/loadFromURL message in that it uses the URL of
>> the file as the table's ID.    Here's the problem:
>> The user is running three apps A, B, C.  Application A creates a subset
>> and wants to broadcast it to B and C.  It does this by creating a
>> temporary file and sending it to B and C.  B & C dutifully make their
>> own copies of it.   The user then starts application D, and shuts down
>> application A, which clears up its temporary file.  How does the user
>> then send the file from B to D and ensure that all three remaining apps
>> think they are referring to the same table?
>
> Actually, as per the trouble Richard had a short while ago, it's worse
> than that - TOPCAT at least doesn't wait until shutdown to clear up
> such temporary files, it does it as soon as the votable/loadFromURL
> request has returned.

Sorry to keep harking on, but I still think things would be better if the 
hub had more control.

If "A" registers an interesting file with the hub, the hub can maintain 
the host and file URL (a plasticFileID?) and send messages to third 
parties, counting how many are interested. "A" can unregister and as long 
as other apps are interested and registered -- ie the count remains above 
zero -- and only one temporary file need exist. This still leaves D and 
others who come along later in the dark, unless there is an option to 
request any 'hub-cached' files or unless say "B" specifically sends it. 
Either way, the count would increase by one for "D".

It might also be possible to automate updates of a given file ID, and the 
earlier file could be removed, when all other apps have the new version.
The 'plasticFileID' might (?) even be able to stay the same.

>
>> The solution is to adopt the same argument list as
>> ivo://.../votable/load and have a second argument specifying the ID.   I
>> suggest we make this second argument optional (but strongly recommended)
>> to avoid breaking existing apps, and if it's missing assume that the
>> id=url as we have now.
>

Doesn't "load" have a second arg anyway? This seems to be the main 
difference between two. Help, I'm still confused about this one..

> After thinking about it for a while, I agree, but subject to the
> following comment.  I seem to remember that the, or at least a
> reason we decided to use the URL for the table ID was that
> it meant if the table had some persistence (i.e. it was a non-temporary
> file in the filesystem, or something on a remote server) then two
> applications could know they were talking about the same thing even
> if they picked it up independently rather than one having got it from
> the other through PLASTIC.  So for a file which is (expected to be)
> persistent the URL is a good choice for the identifier.

This seems a bit dicey to me. Two files obtained from different sources at 
the same time (or vice versa) might not be the same ..??

>
> So an application which injects a table into the PLASTIC system by
> sending a loadFromURL should use the URL itself as the ID
> (or, equivalently, supply no ID and let apps assume id=url,
> though I'm happy for clarity's sake to recommend that both arguments
> are supplied).  In this way an application which happens to have
> loaded the same table from a non-PLASTIC source has a chance of
> knowing that fact.  It also solves the problem of how the sending
> application is supposed to generate an ID guaranteed not to
> clash with someone else's.  But if you've acquired a table by
> responding to a loadFromURL, then if you subsequently send a
> loadFromURL you should propagate the ID that came with it.
> This may or may not be the same as the URL you send.
>
>
>> Is anyone actually implementing ivo://.../votable/load?
>
> yes.
>
>> For symmetry's sake we could also make the id optional for this message,
>> and use the table name if it's missing, but I'd prefer not to given the
>> potential for lack of uniqueness.
>

Maybe it's because plastic is so new to me but I don't quite get the logic 
here. How many apps are going to break if you deprecate one message and 
introduce a few new ones (with clarification). Surely this is the time to 
make changes and revisions.

Richard

Re: [Plastic-devs] Temporary files and IDs for loadFromURL

From: Mark T. <m.b...@br...> - 2006-07-17 10:22:56

On Fri, 14 Jul 2006, R Holbrey wrote:

> On Fri, 14 Jul 2006, Mark Taylor wrote:
> 
> > On Fri, 14 Jul 2006, John Taylor wrote:
> >
> >> Hi All,
> >> We have a slight problem with the current
> >> ivo://votech.org/votable/loadFromURL message in that it uses the URL of
> >> the file as the table's ID.    Here's the problem:
> >> The user is running three apps A, B, C.  Application A creates a subset
> >> and wants to broadcast it to B and C.  It does this by creating a
> >> temporary file and sending it to B and C.  B & C dutifully make their
> >> own copies of it.   The user then starts application D, and shuts down
> >> application A, which clears up its temporary file.  How does the user
> >> then send the file from B to D and ensure that all three remaining apps
> >> think they are referring to the same table?
> >
> > Actually, as per the trouble Richard had a short while ago, it's worse
> > than that - TOPCAT at least doesn't wait until shutdown to clear up
> > such temporary files, it does it as soon as the votable/loadFromURL
> > request has returned.
> 
> Sorry to keep harking on, but I still think things would be better if the 
> hub had more control.

Richard,

sorry for not making a response to your previous message, and (in 
advance) for the nature of the response to this one.

Your ideas are quite reasonable ones, but the reason I'm not very
enthusiastic about them is that part of the unwritten philosophy behind
PLASTIC (at least in my understanding) is that it's simple in 
order to be difficult to break.  For instance, reference counting
to keep track of temporary resources is clearly the Right Thing 
to do if you have a controlled environment with reliable object
destructors etc.  But by its nature a hub is dealing with unreliable
connections to unreliably implemented clients, and a client that 
forgets to unregister itself, or its interest in a file, could 
very easily end up leaving (possibly large) temporary files hanging 
around for much longer than they ought to be there.  This sort of 
thing would be fairly easy to track down and fix in a single-process 
application, but practically impossible when you've got no idea
what applications you might or might not be talking to and how
well or badly they might behave.

That's my take on it anyway - by all means other PLASTICkers chip 
in with your points of view (including, obviously, right of reply
from Richard); I'm quite open to debating either the general or
specific points and being persuaded otherwise.  Either way, if we 
reach a consensus on this it might be a good idea to agree on a 
kind of explicit Manifesto or Philosophy of PLASTIC document 
which can be referred to to clarify this kind of debate/suggestion 
in the future.

> Doesn't "load" have a second arg anyway? This seems to be the main 
> difference between two. Help, I'm still confused about this one..

the main difference is that in "loadFromURL" you pass the URL.
In "load" the text of the votable is sent as the content of the 
first argument.  Not very scalable, admittedly.

> > After thinking about it for a while, I agree, but subject to the
> > following comment.  I seem to remember that the, or at least a
> > reason we decided to use the URL for the table ID was that
> > it meant if the table had some persistence (i.e. it was a non-temporary
> > file in the filesystem, or something on a remote server) then two
> > applications could know they were talking about the same thing even
> > if they picked it up independently rather than one having got it from
> > the other through PLASTIC.  So for a file which is (expected to be)
> > persistent the URL is a good choice for the identifier.
> 
> This seems a bit dicey to me. Two files obtained from different sources at 
> the same time (or vice versa) might not be the same ..??

[assuming you mean "two files obtained from the same source at 
different times might not be the same"]:

true, but in 99(?)% of cases it probably is the same.  Here's a
proposal for the manifesto: "A simple solution which does the right
thing nearly all the time is better than a complicated one which
does the right thing all the time".

> >> For symmetry's sake we could also make the id optional for this message,
> >> and use the table name if it's missing, but I'd prefer not to given the
> >> potential for lack of uniqueness.
>
> Maybe it's because plastic is so new to me but I don't quite get the logic 
> here. How many apps are going to break if you deprecate one message and 
> introduce a few new ones (with clarification). Surely this is the time to 
> make changes and revisions.

Well I don't think we should/will make this change.  However, I think
we currently consider ourselves in an early phase in which we can
make some backwardly incompatible changes if we consider there are
sufficient benefits since the developer community is still quite small
and responsive.

Mark

-- 
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
m.b...@br... +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/

Re: [Plastic-devs] Temporary files and IDs for loadFromURL

From: John T. <jon...@gm...> - 2006-07-17 16:04:14


Mark Taylor wrote:
> On Fri, 14 Jul 2006, R Holbrey wrote:
>   
>   
>> Doesn't "load" have a second arg anyway? This seems to be the main 
>> difference between two. Help, I'm still confused about this one..
>>     
>
> the main difference is that in "loadFromURL" you pass the URL.
> In "load" the text of the votable is sent as the content of the 
> first argument.  Not very scalable, admittedly.
>   
FWIW I think this is actually used in the Workbench though...in fact it 
looks as though they might appreciate a similar "in-line" version of the 
"load fits" message.  Noel: is this for sending data that comes back by 
value from a CEA app?

John

Re: [Plastic-devs] Temporary files and IDs for loadFromURL

From: Noel W. <Noe...@ma...> - 2006-07-17 16:20:09

On 17 Jul 2006, at 17:04, John Taylor wrote:

>
>
> Mark Taylor wrote:
>> On Fri, 14 Jul 2006, R Holbrey wrote:
>>
>>> Doesn't "load" have a second arg anyway? This seems to be the  
>>> main difference between two. Help, I'm still confused about this  
>>> one..
>>>
>>
>> the main difference is that in "loadFromURL" you pass the URL.
>> In "load" the text of the votable is sent as the content of the  
>> first argument.  Not very scalable, admittedly.
>>
> FWIW I think this is actually used in the Workbench though...in  
> fact it looks as though they might appreciate a similar "in-line"  
> version of the "load fits" message.  Noel: is this for sending data  
> that comes back by value from a CEA app?
>

yep. that's it - data is already in memory, and if it's got that far  
it's obviously not tooo big.


> John
>

Re: [Plastic-devs] Temporary files and IDs for loadFromURL

From: John T. <jon...@gm...> - 2006-07-17 16:08:50

>
>   
>>>> For symmetry's sake we could also make the id optional for this message,
>>>> and use the table name if it's missing, but I'd prefer not to given the
>>>> potential for lack of uniqueness.
>>>>         
>> Maybe it's because plastic is so new to me but I don't quite get the logic 
>> here. How many apps are going to break if you deprecate one message and 
>> introduce a few new ones (with clarification). Surely this is the time to 
>> make changes and revisions.
>>     
>
> Well I don't think we should/will make this change.  However, I think
> we currently consider ourselves in an early phase in which we can
> make some backwardly incompatible changes if we consider there are
> sufficient benefits since the developer community is still quite small
> and responsive.
>   
Richard, this was me confusing the issue by thinking aloud.  There's no 
good reason to make this ID optional except to make both loadVOTable 
messages have the same argument signature.  Since (with hindsight) we'd 
actually prefer both messages to have the ID as mandatory, my raising 
the idea was daft.

Re: [Plastic-devs] Temporary files and IDs for loadFromURL

From: John T. <jd...@ro...> - 2006-07-17 16:25:36

My own view is that this is something we should seriously look at, but 
not rush into.  At the moment we're dealing with files that are not 
going to exceed the user's disk space, not even close.  So, apart from 
the processing overhead, it's not really a big deal if each application 
makes its own copy of the data.  With remote URLs, there is the issue of 
multiple downloads of the same data, but the feeling a while back was 
that we can leave the user's cache to deal with this.

If we do decide to do it, then I think it would have to be an optional 
extension of a hub, and accessed by messaging.  In fact, it needn't be 
bundled with a hub - we can define a set of messages for a third-party 
application to "adopt" a file, do the reference counting and clean-up.  
Perhaps this "cache" will be bundled with a particular hub impl, perhaps 
it won't.  The client application will have to deal with its presence or 
absence.

I think we should proceed with some caution (contrary to my usual 
act-first-think-later behaviour!).  As Tony pointed out during our 
meeting (Richard), this sort of thing has been done before and we should 
probably do some investigating.

John


>
> Your ideas are quite reasonable ones, but the reason I'm not very
> enthusiastic about them is that part of the unwritten philosophy behind
> PLASTIC (at least in my understanding) is that it's simple in 
> order to be difficult to break.  For instance, reference counting
> to keep track of temporary resources is clearly the Right Thing 
> to do if you have a controlled environment with reliable object
> destructors etc.  But by its nature a hub is dealing with unreliable
> connections to unreliably implemented clients, and a client that 
> forgets to unregister itself, or its interest in a file, could 
> very easily end up leaving (possibly large) temporary files hanging 
> around for much longer than they ought to be there.  This sort of 
> thing would be fairly easy to track down and fix in a single-process 
> application, but practically impossible when you've got no idea
> what applications you might or might not be talking to and how
> well or badly they might behave.
>
> That's my take on it anyway - by all means other PLASTICkers chip 
> in with your points of view (including, obviously, right of reply
> from Richard); I'm quite open to debating either the general or
> specific points and being persuaded otherwise.  Either way, if we 
> reach a consensus on this it might be a good idea to agree on a 
> kind of explicit Manifesto or Philosophy of PLASTIC document 
> which can be referred to to clarify this kind of debate/suggestion 
> in the future.
>
>   
>   

-- 
------------------------------------------------------------------------
AstroGrid/VOTech
&
Institute for Astronomy, Edinburgh
Skype:johndavidtaylor <skype:johndavidtaylor?chat>

------------------------------------------------------------------------

[Plastic-devs] Manifesto [was: Re: Temporary files and IDs for loadFromURL

From: John T. <jon...@gm...> - 2006-07-18 09:44:12

> [assuming you mean "two files obtained from the same source at 
> different times might not be the same"]:
>
> true, but in 99(?)% of cases it probably is the same.  Here's a
> proposal for the manifesto: "A simple solution which does the right
> thing nearly all the time is better than a complicated one which
> does the right thing all the time".
>   
I really like the idea of a manifesto describing what we're about.

Anyone else care to summarize what we aim to do?  Add your contribution to
http://eurovotech.org/twiki/bin/view/VOTech/PlasticManifesto

John

Re: [Plastic-devs] Temporary files and IDs for loadFromURL

From: John T. <jon...@gm...> - 2006-07-17 15:58:59


Mark Taylor wrote:
>   
>> The solution is to adopt the same argument list as 
>> ivo://.../votable/load and have a second argument specifying the ID.   I 
>> suggest we make this second argument optional (but strongly recommended) 
>> to avoid breaking existing apps, and if it's missing assume that the 
>> id=url as we have now.
>>     
>
> After thinking about it for a while, I agree, but subject to the 
> following comment.  I seem to remember that the, or at least a
> reason we decided to use the URL for the table ID was that 
> it meant if the table had some persistence (i.e. it was a non-temporary
> file in the filesystem, or something on a remote server) then two
> applications could know they were talking about the same thing even
> if they picked it up independently rather than one having got it from
> the other through PLASTIC.  So for a file which is (expected to be) 
> persistent the URL is a good choice for the identifier.
>   
I don't remember the discussion, but the idea seems pretty sensible.
> [snip]

>   
>> On the subject of optional arguments.  I think it might be a good 
>> pattern that any optional args go into a struct tacked on after the 
>> mandatory arguments.  That way they can be referred to by name rather 
>> than position.
>>     
>
> Hmm, interesting thought.  My initial feeling is that for arguments
> which are optional but strongly recommended like the one you've 
> suggested above (i.e. ones we should have put in in the first 
> place but didn't realise until too late) there's not much advantage
> in this.  However, if some commands end up with a forest of options 
> it could be a good plan.
>   
I think the id parameter falls into the category of "should be mandatory 
from now on", so I agree, specifying it by position is easier and 
clearer.  I was really thinking on the lines of (e.g.) if someone wanted 
to do something very app-specific such as send a color=blue parameter in 
with the showObjects message, though my original post isn't very clear.