From: Matthew C. <mat...@va...> - 2007-10-11 15:29:47
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type"> <title></title> </head> <body bgcolor="#ffffff" text="#000000"> Moving on to an appropriate subject so we can reform the Church of Controlled Vocabulary... :)<br> <br> <br> Eric Deutsch wrote: <blockquote cite="mid:5BE...@he..." type="cite"> <meta http-equiv="Content-Type" content="text/html; "> <meta name="Generator" content="Microsoft Word 11 (filtered medium)"> <!--[if !mso]> <style> v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} </style> <![endif]--> <style> <!-- /* Font Definitions */ @font-face {font-family:Wingdings; panose-1:5 0 0 0 0 0 0 0 0 0;} @font-face {font-family:Tahoma; panose-1:2 11 6 4 3 5 4 4 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; margin-bottom:.0001pt; font-size:12.0pt; font-family:"Times New Roman";} a:link, span.MsoHyperlink {color:blue; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {color:purple; text-decoration:underline;} span.EmailStyle17 {mso-style-type:personal; font-family:Arial; color:windowtext;} span.EmailStyle18 {mso-style-type:personal; font-family:Arial; color:navy;} span.m1 {color:blue;} span.t1 {color:#990000;} span.EmailStyle21 {mso-style-type:personal-reply; font-family:Arial; color:navy;} @page Section1 {size:8.5in 11.0in; margin:1.0in 1.25in 1.0in 1.25in;} div.Section1 {page:Section1;} /* List Definitions */ @list l0 {mso-list-id:1639410469; mso-list-type:hybrid; mso-list-template-ids:1919608944 1959066812 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;} @list l0:level1 {mso-level-start-at:0; mso-level-number-format:bullet; mso-level-text:-; mso-level-tab-stop:.5in; mso-level-number-position:left; text-indent:-.25in; font-family:Arial; mso-fareast-font-family:"Times New Roman";} ol {margin-bottom:0in;} ul {margin-bottom:0in;} --> </style> <div class="Section1"> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">Hi everyone, I’ve taken some time to think carefully about what Brian says and here is my attempt at focusing the discussion:<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">- First: yes, there are several problems in the CV is_a and part_of. We agreed at the CV meeting that we will tackle this to try to make it uniform.<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">- Here are two rules within the CV worth that may hold true and should be documented:<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"> - if a term’s direct parent is a “xxxx attribute”, then it must furnish a value within the cvParam element, else it cannot<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"> - if a term has children, then it cannot be specified as a cvParam (except as a category/parent in option C)<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"> Is this correct? Counter examples?<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">- Regarding the reflectron example, I think the CV should look like this, even though it does not quite now:<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"> - “reflectron on” is_a “reflectron state” is_a “analyzer attribute”<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"> - “reflectron off” is_a “reflectron state” is_a “analyzer attribute”<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> </div> </blockquote> These points do not address the more significant issue that the CV is apparently incapable of defining types for categories with uncontrolled values and there is no automatic way to distinguish between a category and a controlled value (i.e. an accession number that represents a category vs. an accession number that represents a value). I suggest the convention (like Angel mentions in his reply to this post) where categories have a pure PART_OF relationship and controlled values have an IS_A relationship to their parent category. I still don't know how to encapsulate the type information for uncontrolled values in the CV though. Perhaps each type (real, integer, string, etc.) could be given a special accession number which indicates the type and also indicates to the validator/parser that the value should be taken from the name/text attribute instead of the accession attribute? But then I'm not sure how to assign that accession number to the uncontrolled classes, because each type would have an IS_A relationship to multiple categories.<br> <br> <br> <blockquote cite="mid:5BE...@he..." type="cite"> <div class="Section1"> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">- Thus cvParams would be used like this:<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"> Option A: <cvParam cvLabel="MS" accession="MS:1000105" name="reflectron off" value="" /><o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"> Option C+: <cvParam name="reflectron off" cvLabel="MS" accession="MS:1000105" parentAccession=” MS:1000021”/><o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> </div> </blockquote> I will regurgitate my preferred version of Option C:<br> Option E: <cvParam name="reflectron state" valueName="off" accession="MS:1000021" valueAccession="MS:1000105"/><br> Same information, but IMO more intuitive, human readable, and it avoids the potentially nasty pitfall of defining what a "parent" is (i.e. is it one level up the CV branch, all the way up, part of the way up?).<br> <br> <br> <blockquote cite="mid:5BE...@he..." type="cite"> <div class="Section1"> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">- Brian proposed:<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"> <reflectronState accession=”MS:1000021” off/><o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"> This does not seem like well formed XML to me. Or is it?? I assume he meant this:<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"> <reflectronState accession=”MS:1000105” name=“reflectron off”/><o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">- If so, the real dilemma is between:<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"> 1) <cvParam name="reflectron off" cvLabel="MS" accession="MS:1000105" parentAccession=” MS:1000021”/><o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"> 2) <reflectronState accession=”MS:1000105” name=“reflectron off”/><o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"> Brian, would you agree that these are the two sides? They both seem fully complete to me. If I’ve got it wrong, then the rest would seem premature, but I’ll press on believing I’ve got it right. Because by creating an element in the schema <reflectronState>, this automatically takes the place of { cvLabel="MS" parentAccession=” MS:1000021” }</span></font></p> </div> </blockquote> Yes, that is the real dilemma. I cast my vote in for going either ALL CV or ALL schema. I don't like the idea of mixing the two. I am a bit confused though and Brian will need to clarify: he previously suggested that the entire schema would be hand-rolled and the CV would be generated FROM the schema. Would that mean that accession numbers would be assigned in the schema and propagated into the CV? I don't recall Brian proposing the <reflectronState ...> method while still filling in the schema from a separately maintained CV - that would be too much hassle.<br> <br> No matter which route we take though, we should have a fully descriptive XML schema in order to allow standard XML tools to do the semantic validation. In the case of the CV, that schema will be auto-generated every time the CV changes. In the case of the hand-rolled schema, it'll be completely self-contained.<br> <br> <br> <blockquote cite="mid:5BE...@he..." type="cite"> <div class="Section1"> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">- So for option 1, we’re essentially at that right now (we would need to adjust option A to option 1, but it’s close)<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">- For option 2, we would need to find all the CV terms that we think deserve to be promoted to element status and add them to schema. I don’t know how many there are, but there would be lots. The schema would increase in size many fold.<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">- A further complication is where does this element go? Does it go in the instrument description section? Or could the reflectron be turned on and off for different spectra and thus go in the scan element? I have no idea. If we put it in the schema, we’ve got to get it right now. If we don’t, then the schema will have to be updated to fix it.<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">- The current state is a flexible (some might say lazy or dangerous) way. We acknowledge that we don’t have all the CV terms and we’re not exactly sure where some will be used, so we leave it open. No example instance document yet has reflectron state information in it. I’d be delighted if someone could provide one.</span></font></p> </div> </blockquote> No matter which way we go, CV w/ autogenerated schema or hand-rolled schema, or cvParams or explicit elements, changing an element's valid location from one part of the document to another will break backward compatibility with the semantic validation, as well as breaking all but the smartest parsers. We should definitely try to avoid moving terms around once we've released the spec!<br> <br> <br> <blockquote cite="mid:5BE...@he..." type="cite"> <div class="Section1"> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">- So what we can do today is provide a term “reflectron off” that almost no one really cares much about and let someone out there who does care write some mzML with this annotation in it. When this document is checked against the semantic validator, the validator will complain that you’ve used a child term of “reflectron state” in a place where it’s not allowed. But the writer insists that it should be allowed there. The PSI-MS WG is pursuaded it should be. So we update the semantic validator and the CV perhaps and these new documents are written out with reflectron state information and validate. Most software doesn’t care a hoot about the reflectron state and that cvParam can be safely ignored or dumbly displayed to the user in case the user cares. All the above can happen without a rev of the schema.<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">- But that’s the same thing as updating the schema except in name, you say. Perhaps.<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> </div> </blockquote> I also say it's the same as updating the schema, because the schema DOES have to be updated when the CV is updated in order to reflect the new changes. Right now we have a pretty useless schema because it is inadequate to do semantic validation or write a parser.<br> <br> <br> <blockquote cite="mid:5BE...@he..." type="cite"> <div class="Section1"> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">- So, I hope I have helped this discussion rather than confused it. Clearly the current schema has a big element of flexibility/power/danger in it. Some would believe that this will allow us to improve the format in minor ways without schema revision and provide a way for producers to express their data with annotations that make sense to them. The only thing standing between flexibility and utter mayhem is the semantic validator. Perhaps in some sense, this is half XML schema and half pseudo RDF. Can we pull it off or are we lunatics for trying it?<o:p></o:p></span></font></p> </div> </blockquote> We need to re-evaluate the idea that the schema should be perpetually unchanging. To me, that is an illogical and contradictory requirement when we also have the requirement to do semantic validation with an ever-changing CV. Why should we be afraid of schema revisions? We should, more specifically, be afraid of removing existing terms, shifting them from one part of the spec to another, and adding new features (like new compression types for the peak lists, new precision types, etc.). And I hope everyone can see that these fears should exist for both a CV-based schema and a hand-rolled schema.<br> <br> <br> <blockquote cite="mid:5BE...@he..." type="cite"> <div class="Section1"> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">- I am clearly biased here, but I try to keep an open mind.<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">- To my mind, the most important unconsidered problem that Brian brings up is the data type problem. Consider the example:<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"> <cvParam cvLabel="MS" accession="MS:1000285" name="total ion current" value="1.66755e+007" parentAccession=”MS:1000499”/><o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"> Brian’s proposed alternative is (I hope I’m right):<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"> <spectrumAttribute accession="MS:1000285" name="total ion current" value="1.66755e+007"><o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"> In principle, this second way would allow me to specify a data type and let XML validators enforce it. However, this may not quite work either, because what if I want:<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"> <spectrumAttribute accession="MS:1009999" name="spectrum subjective quality" value="10"><o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">To be allowed? All spectrumAttributes would have to have the same data type for that to work. The example is pretty contrived. Unless every single attribute got its own element like:<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"> <totalIonCurrent value="1.66755e+007"><o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">- The latter here is fully specified and concrete. But if we get anything wrong or want to add anything, then we have to release a new version of the schema. One possible option is to full specify in schema everything we can think of now, and then for new or later things use cvParam. If we do that, then we’re still needing to apply sematic validation so we’ve only half-solved the problem. Finally, a dangerous door may be opening. If we want to expand this duality, we have a possible “more than one way to do it” problem. Some might choose to use the cvParam, and some the schema element. The only thing that could prevent that is the semantic validator again.</span></font></p> </div> </blockquote> No duality should be possible. A category should either be done with an element or with a cvParam, and I prefer that all categories should be done with one or the other instead of a mix of the two. But certainly no single category should have both an element and a cvParam method for specifying its value.<br> <br> <br> <blockquote cite="mid:5BE...@he..." type="cite"> <div class="Section1"> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">- I wonder whether we can add a nice method of datatype validation to option 1 above? Any ideas?<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> </div> </blockquote> First we have to get data type specification into the CV that is complete and comprehensible to machines (so we can auto-generate a schema from the CV). Let's figure that out first. :) And if we CAN'T do that, we are pretty much forced to go with a hand-rolled schema because at that point I see very little reason to use the OBO CV at all.<br> <br> <br> <blockquote cite="mid:5BE...@he..." type="cite"> <div class="Section1"> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">I had hoped to focus the discussion, but rereading it, all I did was shake the already-opened can of worms.<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">Let the commentary ensue.<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">Regards,<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">Eric<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0in 0in 0in 4pt;"> <div> <div class="MsoNormal" style="text-align: center;" align="center"><font face="Times New Roman" size="3"><span style="font-size: 12pt;"> <hr tabindex="-1" align="center" size="2" width="100%"></span></font></div> <p class="MsoNormal"><b><font face="Tahoma" size="2"><span style="font-size: 10pt; font-family: Tahoma; font-weight: bold;">From:</span></font></b><font face="Tahoma" size="2"><span style="font-size: 10pt; font-family: Tahoma;"> <a class="moz-txt-link-abbreviated" href="mailto:psi...@li...">psi...@li...</a> [<a class="moz-txt-link-freetext" href="mailto:psi...@li...">mailto:psi...@li...</a>] <b><span style="font-weight: bold;">On Behalf Of </span></b>Brian Pratt<br> <b><span style="font-weight: bold;">Sent:</span></b> Monday, October 08, 2007 11:38 AM<br> <b><span style="font-weight: bold;">To:</span></b> 'Mass spectrometry standard development'<br> <b><span style="font-weight: bold;">Subject:</span></b> [Psidev-ms-dev] MANIFESTO TIME! (was RE: more is_a vs. part_oferrors?)</span></font><o:p></o:p></p> </div> <p class="MsoNormal"><font face="Times New Roman" size="3"><span style="font-size: 12pt;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">Eh, it’s even more broken than I thought. I’ve amended my amendments inline below, new changes in double parenthesis. <o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">After a day so of messing with this, it is now:<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">MANIFESTO TIME!<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">RESOLVED:<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">The mzML specification process should be schema-centric, and the CV should be generated from the schema (should be a fairly simple matter of XSLT, since XSD is itself XML). <o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">REASON 1: THE CV-CENTRIC APPROACH IS ERROR PRONE.<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">The kinds of inheritance errors shown below are, if not actually impossible, much harder to make in the context of a W3C schema when using readily available software tools to create and maintain the schema.<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">REASON 2: OBO/CV IS AN INSUFFICIENT TOOL FOR THE JOB OF PRODUCING A READILY AND THOROUGHLY VALIDATABLE DATA FORMAT.<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">CV apparently provides no means for specifying range or formatting of instance values. An “isolation width” (</span></font><font face="Courier New" size="2"><span style="font-size: 10pt; font-family: "Courier New";">MS:1000023) </span></font><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">could happily have a value of “-2”, “2”, “two”, or “extra sprinkles, please”. You could (and should) certainly put some text in the description along the lines of “this is a non-negative floating point value” but that’s no help to a validating parser. XSD on the other hand has standardized syntax for enforcing precisely these kinds of restrictions, meaning that validating parsers and code generators (for both read and write) don’t need any special-purpose logic added. <o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">There are a handful of places where value range restrictions have been attempted in the MS CV, but these are awkward because of the tools. The reflectron_state, for example, has two children “on” and “off”, but this only confuses things, since these are not *<b><span style="font-weight: bold;">values</span></b>* of reflectron state but rather *<b><span style="font-weight: bold;">are</span></b>* reflectron states, a distinction which may be meaningless in English but significant when attempting to create a data structure. Picture how this looks in an instance doc:<o:p></o:p></span></font></p> <p class="MsoNormal" style="text-indent: 0.5in;"><span class="m1"><font color="black" face="Courier New" size="2"><span style="font-size: 10pt; font-family: "Courier New"; color: black;"><</span></font></span><span class="t1"><font color="black" face="Courier New" size="2"><span style="font-size: 10pt; font-family: "Courier New"; color: black;">cvParam</span></font></span><font color="black" face="Courier New" size="2"><span style="font-size: 10pt; font-family: "Courier New"; color: black;"> <span class="t1"><font color="black"><span style="color: black;">cvLabel</span></font></span><span class="m1"><font color="black"><span style="color: black;">="</span></font></span><b><span style="font-weight: bold;">MS</span></b><span class="m1"><font color="black"><span style="color: black;">"</span></font></span><span class="t1"><font color="black"><span style="color: black;"> accession</span></font></span><span class="m1"><font color="black"><span style="color: black;">="</span></font></span><b><span style="font-weight: bold;">MS:1000105</span></b><span class="m1"><font color="black"><span style="color: black;">"</span></font></span><span class="t1"><font color="black"><span style="color: black;"> name</span></font></span><span class="m1"><font color="black"><span style="color: black;">="</span></font></span><b><span style="font-weight: bold;">off</span></b><span class="m1"><font color="black"><span style="color: black;">"</span></font></span><span class="t1"><font color="black"><span style="color: black;"> value</span></font></span><span class="m1"><font color="black"><span style="color: black;">="" /></span></font></span></span></font><span class="m1"><font color="black" face="Courier New"><span style="font-family: "Courier New"; color: black;"><o:p></o:p></span></font></span></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">I can’t think of anything nice to say about that. Better it should read:</span></font><font color="navy" face="Arial"><span style="font-family: Arial; color: navy;"><o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"> </span></font><font color="black" face="Courier New" size="2"><span style="font-size: 10pt; font-family: "Courier New"; color: black;"><reflectronState accession=”MS:1000021” off/><o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">CONCLUSION: THE CV WORK TO DATE IS IMPORTANT AND USEFUL, BUT SHOULD BE RECAST AS SCHEMA WORK<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">The CV should not attempt to be a replacement for the schema - it just hasn’t got the requisite mechanisms to do the job. The information CV can convey is only a subset of the information that is needed to fully specify a data format. The information in the CV as it stands should be folded into the mzML schema, and maintained therein moving forward. An actual OBO/CV file can be generated as needed. <o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">- Brian<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> </div> </div> </blockquote> <br> </body> </html> |