From: Brian P. <bri...@in...> - 2007-10-16 18:27:44
|
(First of all, thanks to Frank for shedding more light on the topic - heat, we have already!) Matt, You're right about OBO not limiting itself to is_a and part_of, but it appears that PSI has explicitly chosen to do so. I doubt we have the political heft to change that now, or that we should want to do so. Further contortions to turn CV into something to rival the readily available power of XSD are misguided, in my opinion. Frankly it seems to me that the CV doesn't really need to be all that logically consistent: in its current bogus state it doesn't seem to have bothered anyone, including the official validator. PSI clearly never meant for CV to do things like datatyping and range limiting so we should stop pushing on that rope and just allow CV to play its proper role in disambiguating the terms we use in the XSD, by use of accession numbers in the XSD. The thing to do now is to transfer most of the intelligence in the ms-mapping.xml schema file (for it is indeed a schema, albeit written in a nonstandard format) to the XSD file then add the proper datatyping and range checking. I was happy to see that this second schema contains the work I thought we were going to have to generate from the CV itself, although I was also somewhat surprised to learn of the existence of such a key artifact this late in the discussion. Or maybe I just missed it somehow. As I've said before we should be braver than we have been so far. The refusal to put useful content in the XSD file simply for fear of being wrong about it is just deplorable and doesn't serve the purposes of the community. And I'm appalled at the disingenuousness of claiming a "stable schema" when many key parts of the spec are in fact expressed in a schema (ms-mapping.xml) which is explicitly unstable. The charge has been leveled on this list that (paraphrasing here) some old dogs are resisting learning new tricks when it comes to the use of CV. That's always something to be mindful of, but after careful consideration I really just don't see the advantage of a CV-centric approach, when all the added complexity and reinvention still leaves us well short of where proper use of XSD would get us. Fully realized XSD that references CV to define its terms seems like the obvious choice for a system that wants to gain widespread and rapid adoption. - Brian -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Matthew Chambers Sent: Tuesday, October 16, 2007 8:27 AM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] mzML 0.99.0 comments Hi Frank, I read the Guidelines you linked to and also the paper describing the Relation Ontology (http://genomebiology.com/2005/6/5/R46) which is referenced from the Guidelines. The Relation Ontology does not in any way suggest that reliable OBO CVs should be limited to IS_A and PART_OF relationships! Rather, it does a good job of defining when IS_A and PART_OF should be used and what they really mean. I think if we looked closely we could find quite a few cases in the CV where the use of IS_A and PART_OF is bogus according to the Relation Ontology definition, especially with regard to values being indistinct from categories. Therefore, I take issue with the following text from the Guidelines which has no corresponding rationale and which is currently biting us in the arse: 11. Relations between RU's As the PSI CV will be developed under the OBO umbrella [3], the relations created between terms MUST ascribe to the definitions and formal requirements provided in the OBO Relations Ontology (RO) paper [7], as the relations 'is_a' and 'part_of'. It is not clear whether the Relation Ontology recommends or discourages using OBO to typedef new relationship types into existence (my proposed 'value of'), but that won't be necessary. I think we can accomplish the same effect with the existing relationship, 'instance_of', which IS part of the Relation Ontology. In fact, 'instance_of' is a primitive relation in the Relation Ontology, whereas 'is_a' is not. Here is the Relation Ontology definition for 'instance_of': p instance_of P - a primitive relation between a process instance and a class which it instantiates holding independently of time That sounds like a pretty good way to distinguish between values (instances) and categories (classes) to me! Further, the instance_of relationship can be used in addition to the current part_of and is_a relationships and it will serve to disambiguate a branch of the CV where the actual category that a value belongs to is an ancestor instead of a direct parent. For instance: MS:1000173 "MAT900XP" is a MS:1000493 "Finnigan MAT" part of MS:1000483 "Thermo Fisher Scientific" is a MS:1000031 "model by vendor" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" What category does the controlled value "MAT900XP" belong to, i.e. if we used cvParam method B, would it look like: <cvParam cvLabel="MS" categoryName="Finnigan MAT" categoryAccession="MS:1000493" accession="MS:1000173" name="MAT900XP"/> Or would it look like: <cvParam cvLabel="MS" categoryName="model by vendor" categoryAccession="MS:1000031" accession="MS:1000173" name="MAT900XP"/> Of course I think it should be the latter, but how would you derive that from the CV? You can't, unless you add a new relationship or convention, so I suggest: MS:1000173 "MAT900XP" instance of MS:1000031 "model by vendor" is a MS:1000493 "Finnigan MAT" part of MS:1000483 "Thermo Fisher Scientific" is a MS:1000031 "model by vendor" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" It would also be good to get rid of the MS:1000483->MS:1000031 relationship at that point because "Thermo Fisher Scientific" is NOT an instrument model. I have to disagree with your assertion that OBO does not allow a CV to model datatypes and cardinality. I think the trailing modifiers (which may have been added since you last looked at the OBO language spec) would serve to model those properties quite nicely. -Matt frank gibson wrote: > Hi > > I have been following this discussion and there seems to be some > confusion about the CV, its use, and development. Using the OBO > language this allows you to record "words" or strings. It does not > allow you to model what the words represent such as restrictions, > cardinality or datatypes for values (such as int, double and xml > datatypes). This is a limitation of the chosen language. > > The PSI have developed "Guidelines for the development of Controlled > Vocabularies" which is a final document and describes the > recommendation's and best practice in designing CVs for the PSI. It > includes and described several issues which have been raised on this > list such as what the relationships of is_a and part_of semanticaly > mean. In addition it includes how to normalise the natural language > definitions for each RA, the maintainance procedures, obselecsing tems > and the process for term addition. > > The Final document can be found at the following URL > http://psidev.info/index.php?q=node/258 > > > I hope these comments and the information contained within this > document is helpful in the development of the MS CV > > Cheers > > Frank ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |