From: Stian S. <sso...@cs...> - 2006-06-28 09:37:46
|
Now that provenance is becoming important, we should also take care of the LSID authority issue. 1) Move the service to say http://www.mygrid.org.uk/authority on port 80. Apache rewrite rules can do this even if the underlying Tomcat is on say :8081, but I couldn't get it to work now as the service returns a URL with port :8081 inside the XML. 2) Add a LSID proxy class in Taverna. If the mygrid authority (or whatever is configured) cannot be used, step back to locally generated LSIDs 3) Provide a LSID provider class that generates random LSIDs instead of just incremental ones. Then at least two runs of Taverna won't produce duplicate IDs. 4) We need to define what the LSID of a workflow really identifies. As it is now, the LSID is generated when a new workflow is created. If a worflow is loaded and then heavily modified, the LSID is still the same. (Unless the user presses the "New LSID" button, but since probably only 2% of our users know what an LSID is, why would they?) The big problem here is not with different versions of a workflow, you can add :2 etc. to the LSID to specify incremental versions - naively assuming that there are not two forks from say version 2 to version 3. One could in that case say that the LSID identifies the abstract workflow the experimenter has in his head, and that the versions are just approximations this workflow idea. In my opinion, this is not a very good concept, because researchers probably don't do things linearly, their workflow might even at their own desktop fork into two different experiments (and workflows) at some point. The big problem is the sharing of workflows. With the upcoming repository, this problem will be even more evident, and we should expect to see more and more workflows with the very same LSID although they are very different in both internal and external workings. This is of course also an interesting research issue, because the LSIDs show *related* workflows. The problem is we don't know in what direction the workflows are related, just that they are in the same family. Discussed with Jun and Antoon earlier, I think the simplest way is to just generate a new LSID whenever the workflow is modified structurally. This could include even renaming an input port, as port names have semantics on their own when viewed by the user, compare say "gene_id" and "mouse_gene_id" - not to mention impact on nested workflows. The "old" LSIDs could just be kept as a history in the scufl file. There is not much preventing us from doing so today, say a rough example: <s:scufl xmlns:s="http://org.embl.ebi.escience/xscufl/0.1alpha" version="0.2" log="0"> <s:workflowdescription lsid="urn:lsid:www.mygrid.org.uk:operation:H86ZQCFRPS0" author="Daniele Turi" title="Nested Workflow Failure Example"> Nested Workflow Failure Example </s:workflowdescription> <s:derivedfrom> <s:workflowdescription lsid="urn:lsid:www.mygrid.org.uk:operation:H86ZQCFRPW3" author="Daniele Turi" title="Nested Workflow" /> <s:workflowdescription lsid="urn:lsid:www.mygrid.org.uk:operation:P12337721SW" author="Tom Oinn" title="Nested Workflow" /> <s:workflowdescription lsid="urn:lsid:www.mygrid.org.uk:operation:Q1337P1502X" author="Tom Oinn" title="Fail-if example" /> </s:derivedfrom> At the same time, I would suggest (again): 1) Formalise the XML schema 2) Introduce versioning in the schema. Now it says "0.1alpha" and "0.2", but it has for a long time, even if we have added stuff. (Adding is much better than changing things, so this is good.) By keeping the old identifiers, we will be sure to be able to catch the relations correctly, and this could be really cool for the workflow repository. It doesn't matter that much if an intermediate workflow was never published anywhere, we could still show how that there was a workflow which is a parent to these three workflows, and it is originally derived from this other workflow. A simpler idea is that whenever a change is done, a new version is assigned. However, this version assignment could be done by the LSID authority. You could say "I have Q1337P1502X:23 - can you give me a new version?" and get back Q1337P1502X:53. The LSID authority could save (nasty) that :53 is derived from :23. The bad thing about this is that the scufl doesn't contain the revision history, and so we also kind of hide it from the user. Unless they have clicked lots of OK buttons, this is kind of surveillance. By keeping it in the document, users who have derived from workflows and don't want to show it off, could simply delete it from the scufl file or through some nice provenance-GUI showing "This workflow is derived from" with a [x] buttons. We also had some thoughts that uploading your workflow to the repository could "formalize" the LSID and change it if it was not unique. This could make more problems than it would solve, so I'm not so sure about this, specially if we go into the myexperiment.org idea and the researcher will keep and download his own workflow every day. -------- Original Message -------- Subject: Re: [Taverna-users] LSID Authority on Phoebus Not Reachable Date: Wed, 28 Jun 2006 10:10:18 +0100 From: Stian Soiland <sso...@cs...> To: Mark A Fortner <phi...@ya...> CC: tav...@li... References: <200...@we...> Mark A Fortner wrote: > I've noticed that the LSID authority on Phoebus seems to be down. Could someone restart that? I'd like to demonstrate the new Provenance plugin capabilities to some colleagues. As far as I can tell, the service is up. Maybe you are behind a firewall or restricted wireless network? Unfortunately, the current LSID service runs on a weird port (namely 8081 at phoebus.cs.man.ac.uk). We will try to move this service (among other services) to :80, but probably not before we get new hardware. If you are unable to reach the LSID authority, you can use another LSID provider for demo purposes. Editing conf/mygrid.properties, locate this line: taverna.lsid.providerclass=org.embl.ebi.escience.baclava.AssigningServiceClient Then comment it out by using # # taverna.lsid.providerclass=... Then, a few lines earlier in the file, uncomment this line: taverna.lsid.providerclass=org.embl.ebi.escience.baclava.StupidLSIDProvider This will just build incremental LSIDs which are not globally unique, but it should work for the provenance locally and within a single Taverna run. -- Stian Soiland School of Computer Science The University of Manchester http://www.cs.man.ac.uk/~ssoiland/ -- Stian Soiland School of Computer Science The University of Manchester http://www.cs.man.ac.uk/~ssoiland/ |