From: Brian P. <bri...@in...> - 2007-10-08 18:39:10
|
Eh, it's even more broken than I thought. I've amended my amendments inline below, new changes in double parenthesis. After a day so of messing with this, it is now: MANIFESTO TIME! RESOLVED: The mzML specification process should be schema-centric, and the CV should be generated from the schema (should be a fairly simple matter of XSLT, since XSD is itself XML). REASON 1: THE CV-CENTRIC APPROACH IS ERROR PRONE. The kinds of inheritance errors shown below are, if not actually impossible, much harder to make in the context of a W3C schema when using readily available software tools to create and maintain the schema. REASON 2: OBO/CV IS AN INSUFFICIENT TOOL FOR THE JOB OF PRODUCING A READILY AND THOROUGHLY VALIDATABLE DATA FORMAT. CV apparently provides no means for specifying range or formatting of instance values. An "isolation width" (MS:1000023) could happily have a value of "-2", "2", "two", or "extra sprinkles, please". You could (and should) certainly put some text in the description along the lines of "this is a non-negative floating point value" but that's no help to a validating parser. XSD on the other hand has standardized syntax for enforcing precisely these kinds of restrictions, meaning that validating parsers and code generators (for both read and write) don't need any special-purpose logic added. There are a handful of places where value range restrictions have been attempted in the MS CV, but these are awkward because of the tools. The reflectron_state, for example, has two children "on" and "off", but this only confuses things, since these are not *values* of reflectron state but rather *are* reflectron states, a distinction which may be meaningless in English but significant when attempting to create a data structure. Picture how this looks in an instance doc: <cvParam cvLabel="MS" accession="MS:1000105" name="off" value="" /> I can't think of anything nice to say about that. Better it should read: <reflectronState accession="MS:1000021" off/> CONCLUSION: THE CV WORK TO DATE IS IMPORTANT AND USEFUL, BUT SHOULD BE RECAST AS SCHEMA WORK The CV should not attempt to be a replacement for the schema - it just hasn't got the requisite mechanisms to do the job. The information CV can convey is only a subset of the information that is needed to fully specify a data format. The information in the CV as it stands should be folded into the mzML schema, and maintained therein moving forward. An actual OBO/CV file can be generated as needed. - Brian _____ From: Brian Pratt [mailto:bri...@in...] Sent: Friday, October 05, 2007 11:52 PM To: 'Mass spectrometry standard development' Subject: more is_a vs. part_of errors? There are a handful of other cases where it appears that the authors have gotten "is a" and "part_of" confused. My proposed corrections (IN CAPS) inline: MS:1000025 "magnetic field strength" part of ((IS_A)) MS:1000480 "analyzer attribute" is a (PART_OF) MS:1000451 "analyzer description" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000024 "final MS exponent" part of ((IS_A)) MS:1000480 "analyzer attribute" is a (PART_OF) MS:1000451 "analyzer description" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000022 "TOF Total Path Length" part of ((IS_A)) MS:1000480 "analyzer attribute" is a (PART_OF) MS:1000451 "analyzer description" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000014 "accuracy" part of ((IS_A)) MS:1000480 "analyzer attribute" is a (PART_OF) MS:1000451 "analyzer description" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" ((note, these next two are just ugly, see notes at top of message)) MS:1000106 "on" is a MS:1000021 "reflectron state" part of ((IS_A)) MS:1000480 "analyzer attribute" is a (PART_OF) MS:1000451 "analyzer description" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000105 "off" is a MS:1000021 "reflectron state" part of ((IS_A)) MS:1000480 "analyzer attribute" is a (PART_OF) MS:1000451 "analyzer description" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" The following changes would make the Thermo and ABI stuff look like all the other vendors: MS:1000495 "Applied Biosystems" part of (IS_A) MS:1000121 "ABI / SCIEX" is a MS:1000031 "model by vendor" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000176 "MAT95XP Trap" is a (IS_A) MS:1000493 "Finnigan MAT" part of MS:1000483 "Thermo Fisher Scientific" is a MS:1000031 "model by vendor" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000175 "MAT95XP" is a MS:1000493 "Finnigan MAT" part of (IS_A) MS:1000483 "Thermo Fisher Scientific" is a MS:1000031 "model by vendor" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000174 "MAT900XP Trap" is a MS:1000493 "Finnigan MAT" part of (IS_A) MS:1000483 "Thermo Fisher Scientific" is a MS:1000031 "model by vendor" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000173 "MAT900XP" is a MS:1000493 "Finnigan MAT" part of (IS_A) MS:1000483 "Thermo Fisher Scientific" is a MS:1000031 "model by vendor" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000172 "MAT253" is a MS:1000493 "Finnigan MAT" part of (IS_A) MS:1000483 "Thermo Fisher Scientific" is a MS:1000031 "model by vendor" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" I still think there's a schema in there, albeit jammed in slightly sideways at the moment. (( I don't think that anymore. I think there's a subset of a schema in there. )) - Brian |