From: Matt C. <mat...@va...> - 2007-10-16 01:49:49
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type"> <title></title> </head> <body bgcolor="#ffffff" text="#000000"> Eric Deutsch wrote: <blockquote cite="mid...@he..." type="cite"> <meta http-equiv="Content-Type" content="text/html; "> <meta name="Generator" content="Microsoft Word 11 (filtered medium)"> <!--[if !mso]> <style> v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} </style> <![endif]--> <style> <!-- /* Font Definitions */ @font-face {font-family:Tahoma; panose-1:2 11 6 4 3 5 4 4 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; margin-bottom:.0001pt; font-size:12.0pt; font-family:"Times New Roman";} h1 {margin-top:12.0pt; margin-right:0in; margin-bottom:3.0pt; margin-left:0in; text-indent:0in; page-break-after:avoid; mso-list:l0 level1 lfo7; font-size:16.0pt; font-family:Arial;} h3 {margin-top:12.0pt; margin-right:0in; margin-bottom:3.0pt; margin-left:1.0in; text-indent:0in; page-break-after:avoid; mso-list:l0 level3 lfo7; font-size:12.0pt; font-family:Arial;} h4 {margin-top:12.0pt; margin-right:0in; margin-bottom:3.0pt; margin-left:1.5in; text-indent:0in; page-break-after:avoid; mso-list:l0 level4 lfo7; font-size:14.0pt; font-family:"Times New Roman";} a:link, span.MsoHyperlink {color:blue; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {color:purple; text-decoration:underline;} pre {margin:0in; margin-bottom:.0001pt; font-size:10.0pt; font-family:"Courier New";} span.EmailStyle18 {mso-style-type:personal; font-family:Arial; color:windowtext;} span.EmailStyle19 {mso-style-type:personal-reply; font-family:Arial; color:navy;} @page Section1 {size:8.5in 11.0in; margin:1.0in 1.25in 1.0in 1.25in;} div.Section1 {page:Section1;} /* List Definitions */ @list l0 {mso-list-id:346298316; mso-list-template-ids:67698727;} @list l0:level1 {mso-level-number-format:roman-upper; mso-level-style-link:"Heading 1"; mso-level-tab-stop:.25in; mso-level-number-position:left; margin-left:0in; text-indent:0in;} @list l0:level2 {mso-level-number-format:alpha-upper; mso-level-tab-stop:.75in; mso-level-number-position:left; margin-left:.5in; text-indent:0in;} @list l0:level3 {mso-level-style-link:"Heading 3"; mso-level-tab-stop:1.25in; mso-level-number-position:left; margin-left:1.0in; text-indent:0in;} @list l0:level4 {mso-level-number-format:alpha-lower; mso-level-style-link:"Heading 4"; mso-level-text:"%4\)"; mso-level-tab-stop:1.75in; mso-level-number-position:left; margin-left:1.5in; text-indent:0in;} @list l0:level5 {mso-level-text:"\(%5\)"; mso-level-tab-stop:2.25in; mso-level-number-position:left; margin-left:2.0in; text-indent:0in;} @list l0:level6 {mso-level-number-format:alpha-lower; mso-level-text:"\(%6\)"; mso-level-tab-stop:2.75in; mso-level-number-position:left; margin-left:2.5in; text-indent:0in;} @list l0:level7 {mso-level-number-format:roman-lower; mso-level-text:"\(%7\)"; mso-level-tab-stop:3.25in; mso-level-number-position:left; margin-left:3.0in; text-indent:0in;} @list l0:level8 {mso-level-number-format:alpha-lower; mso-level-text:"\(%8\)"; mso-level-tab-stop:3.75in; mso-level-number-position:left; margin-left:3.5in; text-indent:0in;} @list l0:level9 {mso-level-number-format:roman-lower; mso-level-text:"\(%9\)"; mso-level-tab-stop:4.25in; mso-level-number-position:left; margin-left:4.0in; text-indent:0in;} ol {margin-bottom:0in;} ul {margin-bottom:0in;} --> </style><!--[if gte mso 9]><xml> <o:shapedefaults v:ext="edit" spidmax="1026" /> </xml><![endif]--><!--[if gte mso 9]><xml> <o:shapelayout v:ext="edit"> <o:idmap v:ext="edit" data="1" /> </o:shapelayout></xml><![endif]--> <div class="Section1"> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">Hi Brian, thank you for your continued input and effort. I’m sorry I’ve been slow to respond on many of your posts, I have a bunch of other pots boiling over here. However, I think I can answer your questions here and promote further testing.<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">1) Regarding 2min.mzML, we’ll fix it, thanks.<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">2) Regarding how does the validator know that MS:1000528 is invalid, please download:<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><a href="http://tools.proteomecenter.org/software/mzMLKit/mzML_0.99.0_large.zip">http://tools.proteomecenter.org/software/mzMLKit/mzML_0.99.0_large.zip</a><o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">(this is hyperlinked from the main development page <a href="http://www.psidev.info/index.php?q=node/257">http://www.psidev.info/index.php?q=node/257</a>)<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">In it, you will find the semantic validator software. One of the files in the distro is ms-mapping.xml. It is this file that encodes these rules and is what is used by the semantic validator. This file should be more prominently posted and will be.<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">3) The semantic validator is FOSS, please see the PSI SVN repository and contribute!<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><a href="https://psidev.svn.sourceforge.net/svnroot/psidev/psi/mzml/">https://psidev.svn.sourceforge.net/svnroot/psidev/psi/mzml/</a><o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">(this is hyperlinked from the main development page <a href="http://www.psidev.info/index.php?q=node/257">http://www.psidev.info/index.php?q=node/257</a>)<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">4) So, it turns out that the semantic validator is using an XML file to enforce the semantic rules, it is NOT reading the doc. It should be noted that this software and the mapping mechanism was developed originally for the PSI molecular interactions schema. That format uses the same built-in flexibility with semantic validation. We are borrowing that mechanism and software for mzML.<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">5) Further, in the doc, the cvParams section for each element is meant to represent “Some examples of allowed cvParams (not necessarily complete)”. I will clarify that in the doc. Further, one of the things I realized that we need to do, is include in the doc the rules set forth in the ms-mapping.xml file. These rules are NOT currently in the doc, but they should be and will be. The doc is actually autogenerated from the other files, so I just need to include some code that parses this ms-mapping file and includes that information in the doc. This will be done for 0.99.1. Thanks!<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">6) Regarding your Observation Two: It is true that the standard relies on the maintenance of three artifacts: xsd, cv, mapping-ms.xml (not the doc as you had inferred; the doc is essentially autogenerated from the former) (and behind the scenes, the example instance documents also need to be maintained). This translates to the desired-stable schema, the evolving controlled vocabulary, and the evolving ruleset on how you may use the CV within the xsd. This is where we are led by the requirement that the schema be stable with provisions for flexibility in annotating many kinds of mass spec data.</span></font></p> </div> </blockquote> >From my perspective, it should be possible to hand-maintain only the CV and a templated schema which gets fleshed out by an autogenerator when the CV changes. The mapping file seems like a hack to compensate for two missing features in the CV: 1) the ability to distinguish between values and categories, and 2) to specify the types and ranges of the uncontrolled values. I think it's less contrived to extend the capabilities of the CV format that we use so that it has those features, or alternately, set conventions in the CV (like how to interpret IS_A and PART_OF relationships) which provide the illusion of those features in a well-defined way.<br> <br> I just looked up the OBO format in more detail and it seems to me that we can very legitimately use convention to solve our CV problems:<br> 1) Distinguish between value/category terms:<br> We can use [Typedef] stanzas to define a new relationship type, "value_of", like:<br> [Typedef]<br> id: value_of<br> name: value_of<br> range: OBO:TERM_OR_TYPE ! there should be some way to say "not a term which has a value_of relationship" but I don't know how and it's not really necessary<br> domain: OBO:TERM_OR_TYPE<br> def: Indicates that the subject term is a controlled value of the object term (which implies the object is a category)<br> <br> 2) Add types, min and max properties:<br> There are several ways to specify data type and min and max ranges, and OBO is even aware of XSD types in some contexts. But because I'm not exactly sure what those contexts are, it would be just as easy to add type, min, and max properties to our category terms (which don't have any value_of relationships pointing to them) as "trailing modifiers", like:<br> <pre>[Term] id: MS:1000016 name: scan time {type="decimal", min="0"} ! a missing min or max implies no limit (or limit defined by the type; "xsd:" prefix is implied for the type def: "The time taken for an acquisition by scanning analyzers." [PSI:MS] is_a: MS:1000503 ! scan attribute</pre> <br> Such a CV, combined with a stable, templated schema, we can auto-generate a full-fledged semantic validating schema and we can avoid using any non-standard approaches.<br> <br> -Matt<font color="navy"><font size="2"><font face="Arial"><br> <br> </font></font></font> <blockquote cite="mid...@he..." type="cite"> <div class="Section1"> <p class="MsoNormal"><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;"><o:p> </o:p></span></font></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0in 0in 0in 4pt;"> <div> <div class="MsoNormal" style="text-align: center;" align="center"><font face="Times New Roman" size="3"><span style="font-size: 12pt;"> <hr tabindex="-1" align="center" size="2" width="100%"></span></font></div> <p class="MsoNormal"><b><font face="Tahoma" size="2"><span style="font-size: 10pt; font-family: Tahoma; font-weight: bold;">From:</span></font></b><font face="Tahoma" size="2"><span style="font-size: 10pt; font-family: Tahoma;"> <a class="moz-txt-link-abbreviated" href="mailto:psi...@li...">psi...@li...</a> [<a class="moz-txt-link-freetext" href="mailto:psi...@li...">mailto:psi...@li...</a>] <b><span style="font-weight: bold;">On Behalf Of </span></b>Brian Pratt<br> <b><span style="font-weight: bold;">Sent:</span></b> Monday, October 15, 2007 3:19 PM<br> <b><span style="font-weight: bold;">To:</span></b> 'Mass spectrometry standard development'<br> <b><span style="font-weight: bold;">Subject:</span></b> [Psidev-ms-dev] mzML validator experiences</span></font><o:p></o:p></p> </div> <p class="MsoNormal"><font face="Times New Roman" size="3"><span style="font-size: 12pt;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;">Hello All,<o:p></o:p></span></font></p> <p class="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;">I decided to fool around with the validator at <a href="http://eddie.thep.lu.se/prodac_validator/validator.pl">http://eddie.thep.lu.se/prodac_validator/validator.pl</a> to see how well that can be done in the presence of an inadequately specified file format. My plan was to take a valid file, mess with it, and see if the validator would notice.<o:p></o:p></span></font></p> <p class="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;">A little hiccup at first - I gave it the automatically generated file <a href="http://psidev.cvs.sourceforge.net/*checkout*/psidev/psi/psi-ms/mzML/instanceFile/2min.mzML">http://psidev.cvs.sourceforge.net/*checkout*/psidev/psi/psi-ms/mzML/instanceFile/2min.mzML</a><o:p></o:p></span></font></p> <p class="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;">- it doesn’t actually validate, claiming a missing index element. Somebody might want to check that out.<o:p></o:p></span></font></p> <p class="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;">Then I gave it the handrolled <a href="http://psidev.cvs.sourceforge.net/*checkout*/psidev/psi/psi-ms/mzML/instanceFile/tiny4_LTQ-FT.mzML0.99.0.mzML">http://psidev.cvs.sourceforge.net/*checkout*/psidev/psi/psi-ms/mzML/instanceFile/tiny4_LTQ-FT.mzML0.99.0.mzML</a> - this validates fine. So, let the mayhem begin. <o:p></o:p></span></font></p> <p class="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;">I tried removing the selectionWindow element surrounding the cvParams declaring the upper and lower bounds of the selection window, but the validator is XSD aware so it caught that easily. <o:p></o:p></span></font></p> <p class="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;">Then I tried changing the accession numbers in the selection window for others that might be honestly conceptually mistaken by an incautious output module author:<o:p></o:p></span></font></p> <p class="MsoNormal" style="text-indent: 0.5in;"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;">accession="MS:1000501" name="scan m/z lower limit"<o:p></o:p></span></font></p> <p class="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;">changed to<o:p></o:p></span></font></p> <p class="MsoNormal" style="text-indent: 0.5in;"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;">accession="MS:1000528" name="lowest m/z value"<o:p></o:p></span></font></p> <p class="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;">the validator caught this as well, flagging the use of accession numbers that were incorrect for that context. But the knowledge behind this doesn’t seem to come from the XSD or the CV file. So, how does the validator know? <o:p></o:p></span></font></p> <p class="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><b><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; font-weight: bold;">Observation one</span></font></b><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;">: the validator doesn’t appear to be open source (or if it is, a prominent link to the source should be provided). The use of a closed source tool like this in a standards effort isn’t a good idea, since it’s hard to answer questions like the one above.<o:p></o:p></span></font></p> <p class="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;">Apparently the author of the validator made excellent use of the documentation at<o:p></o:p></span></font></p> <p class="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;"><a href="http://psidev.cvs.sourceforge.net/*checkout*/psidev/psi/psi-ms/mzML/document/mzML0.99.0_specificationDocument.doc">http://psidev.cvs.sourceforge.net/*checkout*/psidev/psi/psi-ms/mzML/document/mzML0.99.0_specificationDocument.doc</a><o:p></o:p></span></font></p> <p class="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;">which stipulates in English that the only valid cvParams in that context are:<o:p></o:p></span></font></p> <pre><font face="Courier New" size="2"><span style="font-size: 10pt;"><cvParam cvLabel="MS" accession="MS:1000501" name="scan m/z lower limit" value="400.000000"/><o:p></o:p></span></font></pre> <pre><font face="Courier New" size="2"><span style="font-size: 10pt;"><cvParam cvLabel="MS" accession="MS:1000500" name="scan m/z upper limit" value="1800.000000"/><o:p></o:p></span></font></pre> <p class="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;">Ignore for the moment that this appears to be an example rather than a spec. Do note though that there’s nothing to say that one of each has to be present. Of course a reasonable human would probably infer this, but words like “reasonable human” and “infer” are not really what you want to hear when discussing a machine readable data format standard.<o:p></o:p></span></font></p> <p class="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><b><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; font-weight: bold;">Observation two</span></font></b><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;">: I’m not at all keen on the idea of a data format that relies on the understanding and simultaneous maintenance of three different artifacts (xsd, cv, doc), one of which (.doc) is not really machine readable.<o:p></o:p></span></font></p> <p class="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;">I think (but I can’t be 100% sure without seeing the code) that the author has done a very good job under the circumstances, but probably had a harder time then was necessary given the bizarre construction of the spec. He or she probably would have appreciated more xsd content to do the heavy lifting, and certainly had to make a few fairly safe guesses along the way like the “must have one of each of </span></font>MS:1000501 <font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;">and </span></font>MS:1000500<font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;"> “ thing.<o:p></o:p></span></font></p> <p class="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;">- Brian<o:p></o:p></span></font></p> </div> </div> </blockquote> <br> </body> </html> |