From: Tom O. <tm...@eb...> - 2007-01-24 21:58:18
|
Roger Hyam wrote: > Hi Everyone, > > One of the fundamental features of the LSID spec is the fact that the > data returned for any LSID must remain byte identical or not be returned > at all. I can see the reasoning for this in many disciplines but I am > beginning to find it a pain in the neck for some of the applications we > would like to use LSIDs for. > > I was in a meeting recently talking about an XML Schema to handle > biographical information of biologists. It would be nice to identify a > biography with an LSID. It would be good to return some metadata about > the biography in RDF (so that it can be understood by semantic type > applications) and then the actual xml text of the biography as the data. > This seems like an entirely reasonable thing to do to me because: > > * The biography is basically a text document and a good example of the > kind of thing that shouldn't be entirely marked up in RDF. It should > really be an XHTML extension or something similar. > > * The biography is data. It would be possible to return it as another > format of metadata but then we would have to establish a naming > convention that applications would need to know. > > * To say the biography is metadata about the person seem plain wrong. > The LSID is to the biography record not the person. > > The trouble is: > > * An XML document won't be byte identical unless a hack is put in to > stream it to file or similar. > > * People's biographies change. The subset of biologists that are alive > may progress their careers and will all eventually die - thus requiring > an update to their biographies. > > So we can't return the biography in the getData call. > > Generating an new LSID every time the biography changes seem overkill. > For consistency one would probably have to have an LSID for the abstract > notion of the biography that just returned metadata about where the > latest version was or maintain the old LSIDs with replacedBy links in > the metadata that the client would have to navigate. > > There are other examples of where it would be nice to return an XML > document or something else variable as data. > > Can anyone see a reason why the spec shouldn't be changed in respect to > this feature. Should there be a metadata flag to say data is variable? > It would just make life so much easier! Please do not change the fundamental nature of LSIDs, this would be a really bad idea[tm]. You can't just change a contract such as immutability and expect everything to carry on working, it's really the most powerful part of the specification for every application I've seen. It would certainly break our provenance capture mechanism and any form of caching such as is done by the default client implementations, so it would be a big big alteration, however... Sounds to me like you could be using the versioning mechanism though? I mean, that's the whole point of it, to cope with data that has multiple versions... I think generating a new version of the same LSID every time the biography changes is exactly in the spirit of the specification and shouldn't be too onerous - the exact versioning scheme is up to you as well, you could choose to have versions corresponding to dates over time in a certain format which would then give your resolver the ability to serve up the biography for a particular scientist at any moment in time, which is kind of cool. Off topic, what are you doing with this? We might be very interested as we're currently working on a system called myExperiment to allow users (biologists and bioinformaticians primarily) to share workflow fragments and methodologies in a principled way, a biography component would fit perfectly with what we're working on in that area. Let me know off list if it sounds like there might be an intersection of interests? Tom (Taverna, myGrid and some other stuff[tm]) |