From: William P. <wil...@ya...> - 2012-06-20 20:46:54
|
On Jun 20, 2012, at 11:44 AM, Hilmar Lapp wrote: > It seems that the studyID in TreeBASE's NeXML output is given with the leading 'S', although everywhere else the studyID is quoted with the leading 'S'. For example for study S794: > > <meta content="794" datatype="xsd:string" id="meta901" property="tb:identifier.study" xsi:type="nex:LiteralMeta"/> > > http://purl.org/phylo/treebase/phylows/study/TB2:S794?format=nexml > > I notice that I can search either variant through the UI and it finds the study. However, this URL does *not* work: > > http://purl.org/phylo/treebase/phylows/study/TB2:794?format=nexml > > What does this mean semantically? Are S794 and 794 as studyIDs synonymous or not? And shouldn't the <meta> attribute give the ID with the leading 'S' if that's also the ID exposed everywhere else? I guess some of this is a legacy of the older TreeBASE, in which we encouraged people to prefix numbers with a letter so that it's clear what they're referring to. e.g. SN12345 = "submission number" S12345 = study ID number M12345 = matrix ID number etc. With TreeBASE2 the idea was that each study would be identified using a long globally unique string that itself is resolvable to some metadata about the object, and the chosen format looks like: 'http://purl.org/phylo/treebase/phylows/study/TB2:S794'. That is supposed to be the identifier that will continue to work regardless of any new migrations to TreeBASE3, TreeBASE4, etc. -- i.e. it's our version of a Dryad data DOI without having to pay for one. If we dissect this identifier, we can see the id for the record in the study table is somewhere in there (794) with the 'S' tradition retained. Perhaps the UI should be designed to match on /(.+S|S)*\d+$/ so that it hits on both fully qualified and semi-qualified versions of the full identifier, e.g.: 'http://purl.org/phylo/treebase/phylows/study/TB2:S794', 'TB2:S794', 'S794', '794'. For the NeXML, the xml:base is defined: xml:base="http://purl.org/phylo/treebase/phylows/study/TB2:" thus expecting to append 'S794' via an xlink:href value of "S794" or a "#S794" value. So that would be in keeping with the idea that the real identifier is the long, globally unique string. But separately, as you point out, something called "tb:identifier.study" is defined in the NeXML with just the number "794" -- which is the ID number for the row in the study table. Perhaps this should be changed to the fully qualified 'http://purl.org/phylo/treebase/phylows/study/TB2:S794' ? bp |