From: Matthew C. <mat...@va...> - 2007-08-07 16:40:18
|
I'm a little confused about the parameters which use the accession number as a kind of value instead of the accession number identifying a variable and then using the value attribute to assign the value. I don't understand why: <cvParam cvLabel="MS" accession="MS:1000130" name="Positive Scan" value=""/> (from mzML) Is preferable to: <cvParam cvLabel="psi" accession="PSI:1000037" name="Polarity" value="positive"/> (from mzData) There are other examples of this as well. What's the logic here? -Matt Chambers |
From: Eric D. <ede...@sy...> - 2007-08-07 17:12:46
|
Hi Matt, the agree-upon rule here is that the cvParams should always refer to the most detailed concept, and the value attribute should *only* be filled if there is a scalar value associated with the concept that cannot be in the CV itself. So: =20 <cvParam cvLabel=3D"MS" accession=3D"MS:1000554" name=3D"LCQ Deca" = value=3D""/> <cvParam cvLabel=3D"MS" accession=3D"MS:1000529" name=3D"Instrument = Serial Number" value=3D"23433"/> =20 So for the first, the term/concept is "LCQ Deca". For the CV, one can learn that an "LCQ Deca" IS A "instrument model", and so there's no need (and is perhaps a little dangerous) to put "LCQ Deca" as a value of "instrument model". =20 However, "instrument serial number" is the most specific concept in the CV, and thus the actual SN is the value. =20 This was discussed at some length and this is the new way of doing things, that will be uniform across all PSI and FuGE implementations. At least, that is my understanding. This does mean that parsers need to be a little smarter and be "CV-aware". The parser/interpreter can no longer assume that there will be a term "instrument model" and look for its value. But rather, the parser/interpreter must now look to see if any of the terms provided are a child of "instrument model" in the CV. =20 Regards, Eric =20 =20 =20 ________________________________ From: psi...@li... [mailto:psi...@li...] On Behalf Of Matthew Chambers Sent: Tuesday, August 07, 2007 9:40 AM To: psi...@li... Subject: [Psidev-ms-dev] cvParams using name attribute as value =20 I'm a little confused about the parameters which use the accession number as a kind of value instead of the accession number identifying a variable and then using the value attribute to assign the value. I don't understand why: <cvParam cvLabel=3D"MS" accession=3D"MS:1000130" name=3D"Positive Scan" value=3D""/> (from mzML) Is preferable to: <cvParam cvLabel=3D"psi" accession=3D"PSI:1000037" name=3D"Polarity" value=3D"positive"/> (from mzData) =20 There are other examples of this as well. What's the logic here? =20 -Matt Chambers |
From: Mike C. <tu...@gm...> - 2007-08-07 18:06:11
|
On 8/7/07, Eric Deutsch <ede...@sy...> wrote: > <cvParam cvLabel="MS" accession="MS:1000554" name="LCQ Deca" value=""/> > > <cvParam cvLabel="MS" accession="MS:1000529" name="Instrument Serial Number" > value="23433"/> > > > So for the first, the term/concept is "LCQ Deca". For the CV, one can learn > that an "LCQ Deca" IS A "instrument model", and so there's no need (and is > perhaps a little dangerous) to put "LCQ Deca" as a value of "instrument > model". > > > However, "instrument serial number" is the most specific concept in the CV, > and thus the actual SN is the value. > > > This was discussed at some length and this is the new way of doing things, > that will be uniform across all PSI and FuGE implementations. At least, that > is my understanding. This does mean that parsers need to be a little smarter > and be "CV-aware". The parser/interpreter can no longer assume that there > will be a term "instrument model" and look for its value. But rather, the > parser/interpreter must now look to see if any of the terms provided are a > child of "instrument model" in the CV. Actually, the parser really should not only check whether the term provided *is* a child in the current CV, but also whether it ever *will be* in a future version of the CV. Unfortunately, the technology required to make such a check is not yet available. :-) I'm not very familiar with how CV is supposed to work, but from this example it appears that the namespaces for different kinds of things have been merged together, and that there is an assumption that there will be no collisions. And that anything that doesn't currently have a name basically doesn't exist. In the example given of writing a parser, the task of extracting the name of the instrument, given just the mzML file, is changed from being trivial to being essentially impossible. The mzML file becomes meaningless in itself, and only has meaning relative to a particular version of the CV, which the parser must have access to. Am I misunderstanding something? Mike |
From: Brian P. <bri...@in...> - 2007-08-07 18:45:49
|
Piling on with Mike, here: So the first thing any parser must do is load up the OBO file. In practice, such a software system will need to bundle an OBO in some fashion, in the extremely likely event that the OBO used by the mzML file in question is not present. Don't forget to update your distro each time the OBO gets updated, and make sure that in the event the OBO used by the mzML file IS present, you use that intead. Then, read: <cvParam cvLabel="MS" accession="MS:1000554" name="LCQ Deca" value=""/> then ask yourself, "whazzat?", and look up: id: MS:1000554 name: LCQ Deca def: "ThermoFinnigan LCQ Deca." [PSI:MS] is_a: MS:1000125 ! thermo finnigan which leads you to: id: MS:1000125 name: thermo finnigan def: "ThermoFinnigan from Thermo Electron Corporation" [PSI:MS] is_a: MS:1000483 ! thermo fisher scientific which leads you to: id: MS:1000483 name: thermo fisher scientific def: "Thermo Fisher Scientific. Also known as Thermo Finnigan corporation." [PSI:MS] related_synonym: "Thermo Scientific" [] is_a: MS:1000031 ! model by vendor which leads you to: id: MS:1000031 name: model by vendor def: "Instrument's model name (everything but the vendor's name) ---Free text ?" [PSI:MS] relationship: part_of MS:1000463 ! instrument description which leads you to: id: MS:1000463 name: instrument description def: "Device which performs a measurement." [PSI:MS] relationship: part_of MS:0000000 ! mzOntology aha! now populate the "instrument description" element in your database. Which is all fine, in its way, until a new instrument "LCQ Spiff-o" comes out and the OBO isn't immediately updated to match, in which case the parser can't even tell that it's an instrument declaration. This is a curiously upside down way to write XML. If I were designing it I'd make the CV stuff an attribute of the instrument info, for anyone that cares to dive into the OBO, but allow the XML to stand alone in the absence of a suitable OBO. I'd make an effort to use the same terminology in the XML element and attribute names as in the OBO just to reduce confusion. I guess what I'm describing is something like mzXML with the addition of CV info as attributes of the existing element types to aid those interested in using OBO to unify data from different sources, without annoying those uninterested in unifying data from different systems. But, some of you will recall that the use of the CV stuff in lieu of proper XML (in the sense that you have no real hope of making full sense of mzML without access to an external file) is a longstanding crank of mine, and I don't really expect to change it this late in the game. - Brian _____ From: psi...@li... [mailto:psi...@li...] On Behalf Of Eric Deutsch Sent: Tuesday, August 07, 2007 10:13 AM To: Matthew Chambers; psi...@li... Subject: Re: [Psidev-ms-dev] cvParams using name attribute as value Hi Matt, the agree-upon rule here is that the cvParams should always refer to the most detailed concept, and the value attribute should *only* be filled if there is a scalar value associated with the concept that cannot be in the CV itself. So: <cvParam cvLabel="MS" accession="MS:1000554" name="LCQ Deca" value=""/> <cvParam cvLabel="MS" accession="MS:1000529" name="Instrument Serial Number" value="23433"/> So for the first, the term/concept is "LCQ Deca". For the CV, one can learn that an "LCQ Deca" IS A "instrument model", and so there's no need (and is perhaps a little dangerous) to put "LCQ Deca" as a value of "instrument model". However, "instrument serial number" is the most specific concept in the CV, and thus the actual SN is the value. This was discussed at some length and this is the new way of doing things, that will be uniform across all PSI and FuGE implementations. At least, that is my understanding. This does mean that parsers need to be a little smarter and be "CV-aware". The parser/interpreter can no longer assume that there will be a term "instrument model" and look for its value. But rather, the parser/interpreter must now look to see if any of the terms provided are a child of "instrument model" in the CV. Regards, Eric _____ From: psi...@li... [mailto:psi...@li...] On Behalf Of Matthew Chambers Sent: Tuesday, August 07, 2007 9:40 AM To: psi...@li... Subject: [Psidev-ms-dev] cvParams using name attribute as value I'm a little confused about the parameters which use the accession number as a kind of value instead of the accession number identifying a variable and then using the value attribute to assign the value. I don't understand why: <cvParam cvLabel="MS" accession="MS:1000130" name="Positive Scan" value=""/> (from mzML) Is preferable to: <cvParam cvLabel="psi" accession="PSI:1000037" name="Polarity" value="positive"/> (from mzData) There are other examples of this as well. What's the logic here? -Matt Chambers |
From: Matthew C. <mat...@va...> - 2007-08-07 18:57:21
|
In addition to Mike's and Brian's concerns, I am wondering how "LCQ Deca" is called a "term/concept?" "Instrument model" is the closest relevant term/concept as I understand those words. Is the cvParam not capable of controlling both the name and possible values of its definitions? Also, why are the different instrument models part of the CV anyway? It seems that the CV should support controlling both terms and the values (or instances) of those terms: "LCQ Deca" IS A VALID INSTANCE OF "thermo finnigan" IS A "thermo fisher scientific" IS A "instrument model" I don't really understand the middle two jumps either, i.e. why are they redundant? _____ From: Eric Deutsch [mailto:ede...@sy...] Sent: Tuesday, August 07, 2007 12:13 PM To: Matthew Chambers; psi...@li... Subject: RE: [Psidev-ms-dev] cvParams using name attribute as value Hi Matt, the agree-upon rule here is that the cvParams should always refer to the most detailed concept, and the value attribute should *only* be filled if there is a scalar value associated with the concept that cannot be in the CV itself. So: <cvParam cvLabel="MS" accession="MS:1000554" name="LCQ Deca" value=""/> <cvParam cvLabel="MS" accession="MS:1000529" name="Instrument Serial Number" value="23433"/> So for the first, the term/concept is "LCQ Deca". For the CV, one can learn that an "LCQ Deca" IS A "instrument model", and so there's no need (and is perhaps a little dangerous) to put "LCQ Deca" as a value of "instrument model". However, "instrument serial number" is the most specific concept in the CV, and thus the actual SN is the value. This was discussed at some length and this is the new way of doing things, that will be uniform across all PSI and FuGE implementations. At least, that is my understanding. This does mean that parsers need to be a little smarter and be "CV-aware". The parser/interpreter can no longer assume that there will be a term "instrument model" and look for its value. But rather, the parser/interpreter must now look to see if any of the terms provided are a child of "instrument model" in the CV. Regards, Eric _____ From: psi...@li... [mailto:psi...@li...] On Behalf Of Matthew Chambers Sent: Tuesday, August 07, 2007 9:40 AM To: psi...@li... Subject: [Psidev-ms-dev] cvParams using name attribute as value I'm a little confused about the parameters which use the accession number as a kind of value instead of the accession number identifying a variable and then using the value attribute to assign the value. I don't understand why: <cvParam cvLabel="MS" accession="MS:1000130" name="Positive Scan" value=""/> (from mzML) Is preferable to: <cvParam cvLabel="psi" accession="PSI:1000037" name="Polarity" value="positive"/> (from mzData) There are other examples of this as well. What's the logic here? -Matt Chambers |
From: Brian P. <bri...@in...> - 2007-08-07 20:00:51
|
Upon reflection, I realize that this is, for me, actually a new objection to mzML. My original problem with the reliance on CV/OBO is that an XML parser for it looks something like this: for each element { if (element.name=="cvParam") then { a whole bunch of handrolled logic to pick this apart } else { there isn't much else } } That's not really an XML parser, therefore I conclude that mzML isn't really XML. But I have previously beaten that horse to death. Now we have something new not to like: it's impossible to write a parser that's even remotely future-proof. Or maybe it's not new, and I just missed it before. Either way, this all looks increasingly ill conceived to me. Sorry to be such a downer. Hey, the horse just twitched: by placing CVparam information in attributes of the elements of a conventionally structured XML schema (ala mzXML) we can make use of the OBO work without adding a lot of unwanted complexity to software systems that aren't really interested in it. An mzML that integrates well with OBO-aware systems is an excellent idea, but an mzML that demands you BE an OBO-aware system seems less likely to achieve widespread adoption. I do understand the desire to maintain an ontology instead of an ontology and an XML schema, but I'm not sure we can really get away with it. By having a schema that offloads most of its work to an external ontology, we're just pushing the work that having a proper schema saves onto the folks creating the readers and writers, making their job much more complicated that it ought to be - you can't autogenerate a parser or serializer without a fully realized schema. I think we risk them deciding that mzXML and mzData aren't really all that broken after all. Brian _____ From: psi...@li... [mailto:psi...@li...] On Behalf Of Matthew Chambers Sent: Tuesday, August 07, 2007 11:57 AM To: psi...@li... Subject: Re: [Psidev-ms-dev] cvParams using name attribute as value In addition to Mike's and Brian's concerns, I am wondering how "LCQ Deca" is called a "term/concept?" "Instrument model" is the closest relevant term/concept as I understand those words. Is the cvParam not capable of controlling both the name and possible values of its definitions? Also, why are the different instrument models part of the CV anyway? It seems that the CV should support controlling both terms and the values (or instances) of those terms: "LCQ Deca" IS A VALID INSTANCE OF "thermo finnigan" IS A "thermo fisher scientific" IS A "instrument model" I don't really understand the middle two jumps either, i.e. why are they redundant? _____ From: Eric Deutsch [mailto:ede...@sy...] Sent: Tuesday, August 07, 2007 12:13 PM To: Matthew Chambers; psi...@li... Subject: RE: [Psidev-ms-dev] cvParams using name attribute as value Hi Matt, the agree-upon rule here is that the cvParams should always refer to the most detailed concept, and the value attribute should *only* be filled if there is a scalar value associated with the concept that cannot be in the CV itself. So: <cvParam cvLabel="MS" accession="MS:1000554" name="LCQ Deca" value=""/> <cvParam cvLabel="MS" accession="MS:1000529" name="Instrument Serial Number" value="23433"/> So for the first, the term/concept is "LCQ Deca". For the CV, one can learn that an "LCQ Deca" IS A "instrument model", and so there's no need (and is perhaps a little dangerous) to put "LCQ Deca" as a value of "instrument model". However, "instrument serial number" is the most specific concept in the CV, and thus the actual SN is the value. This was discussed at some length and this is the new way of doing things, that will be uniform across all PSI and FuGE implementations. At least, that is my understanding. This does mean that parsers need to be a little smarter and be "CV-aware". The parser/interpreter can no longer assume that there will be a term "instrument model" and look for its value. But rather, the parser/interpreter must now look to see if any of the terms provided are a child of "instrument model" in the CV. Regards, Eric _____ From: psi...@li... [mailto:psi...@li...] On Behalf Of Matthew Chambers Sent: Tuesday, August 07, 2007 9:40 AM To: psi...@li... Subject: [Psidev-ms-dev] cvParams using name attribute as value I'm a little confused about the parameters which use the accession number as a kind of value instead of the accession number identifying a variable and then using the value attribute to assign the value. I don't understand why: <cvParam cvLabel="MS" accession="MS:1000130" name="Positive Scan" value=""/> (from mzML) Is preferable to: <cvParam cvLabel="psi" accession="PSI:1000037" name="Polarity" value="positive"/> (from mzData) There are other examples of this as well. What's the logic here? -Matt Chambers |
From: Matthew C. <mat...@va...> - 2007-08-07 20:44:18
|
As long as the name/value paradigm is used, the loop doesn't get much more complicated than: if( element.parent == "spectrumDescription" ) { for each child { if (child.name=="cvParam") then { if( child.attrs['name'] == "Polarity" ) spectrum.polarity = child.attrs['value']; } } But if you have to do: if( element.parent == "spectrumDescription" ) { for each child { if (child.name=="cvParam") then { if( child.attrs['name'] == "Positive" ) spectrum.polarity = "positive"; else if( child.attrs['name'] == "Negative" ) spectrum.polarity = "negative"; } } ...parsers will be painful to write and adoption will suffer because of it I think. Not to mention the fact that the idea of adding these things that should really be values as "terms" in the vocabulary is indeed not future-proof. In the future, there might be another IS_A relationship for "LCQ Deca" so that merely by seeing LCQ Deca you won't know that you're looking at an instrument model parameter. Of course, the accession number would tell you uniquely, but then you'll have two accession numbers in the vocabulary with the name "LCQ Deca." Yuck! I think values for terms should be given a special relationship in the CV, they shouldn't be given an "IS_A" relationship and expect the parser to look up the implication of that relationship every time a value-as-term is encountered. -Matt _____ From: psi...@li... [mailto:psi...@li...] On Behalf Of Brian Pratt Sent: Tuesday, August 07, 2007 3:00 PM To: psi...@li... Subject: Re: [Psidev-ms-dev] cvParams using name attribute as value Upon reflection, I realize that this is, for me, actually a new objection to mzML. My original problem with the reliance on CV/OBO is that an XML parser for it looks something like this: for each element { if (element.name=="cvParam") then { a whole bunch of handrolled logic to pick this apart } else { there isn't much else } } That's not really an XML parser, therefore I conclude that mzML isn't really XML. But I have previously beaten that horse to death. Now we have something new not to like: it's impossible to write a parser that's even remotely future-proof. Or maybe it's not new, and I just missed it before. Either way, this all looks increasingly ill conceived to me. Sorry to be such a downer. Hey, the horse just twitched: by placing CVparam information in attributes of the elements of a conventionally structured XML schema (ala mzXML) we can make use of the OBO work without adding a lot of unwanted complexity to software systems that aren't really interested in it. An mzML that integrates well with OBO-aware systems is an excellent idea, but an mzML that demands you BE an OBO-aware system seems less likely to achieve widespread adoption. I do understand the desire to maintain an ontology instead of an ontology and an XML schema, but I'm not sure we can really get away with it. By having a schema that offloads most of its work to an external ontology, we're just pushing the work that having a proper schema saves onto the folks creating the readers and writers, making their job much more complicated that it ought to be - you can't autogenerate a parser or serializer without a fully realized schema. I think we risk them deciding that mzXML and mzData aren't really all that broken after all. Brian |
From: Eric D. <ede...@sy...> - 2007-08-08 06:35:22
|
Thank you all for the lively discussion. =20 One proposal I once made in Lyon (which was roundly dismissed I believe) was something like this: instead of: =20 <cvParam cvLabel=3D"MS" accession=3D"MS:1000554" name=3D"LCQ Deca" = value=3D""/> =20 Have: =20 <cvParam cvLabel=3D"MS" parentAccession=3D"MS:1000031" accession=3D"MS:1000554" name=3D"LCQ Deca" value=3D""/> =20 Thus the parser can easily be coded to know that any cvParam with a parentAccession=3D"MS:1000031" is going to be an instrument model = whether or not it's in the CV. The mzML semantic validator tool would, of course, check all this. The main argument against this was the potential for inconsistency, I seem to recall. =20 The decision was made to make individual models cv terms to avoid problems like: =20 <cvParam cvLabel=3D"MS" accession=3D"MS:1000031" name=3D"instrument = model" value=3D"LCQ Deca"/> <cvParam cvLabel=3D"MS" accession=3D"MS:1000031" name=3D"instrument = model" value=3D"LCQ DECA"/> <cvParam cvLabel=3D"MS" accession=3D"MS:1000031" name=3D"instrument = model" value=3D"LTQ FT"/> <cvParam cvLabel=3D"MS" accession=3D"MS:1000031" name=3D"instrument = model" value=3D"LTQ-FT"/> <cvParam cvLabel=3D"MS" accession=3D"MS:1000031" name=3D"instrument = model" value=3D"LTQFT"/> =20 I would argue that your code snippet below would better look like: =20 #define MS_CV_POLARITY_TYPE "MS:1000037" if( element.parent =3D=3D "spectrumDescription" ) { for each child { if (child.name=3D=3D"cvParam") then { if( cv.isChildOf(child.attrs['accession], MS_CV_POLARITY_TYPE) ) // if a polarity type spectrum.polarity =3D cv.getName(child.attrs['accession']); } } =20 Note that the cvParam name (should that be "positive" or "Positive" or "positive polarity" or "Polarity" or "polarity"?) is not in the code, just MS:1000037 which can be considered final. =20 This does require a CV class and some methods: cv.loadFromFile() cv.isChildOf() cv.getName() =20 but this is not really complicated. =20 Take cover! Eric =20 =20 ________________________________ From: psi...@li... [mailto:psi...@li...] On Behalf Of Matthew Chambers Sent: Tuesday, August 07, 2007 1:43 PM To: psi...@li... Subject: Re: [Psidev-ms-dev] cvParams using name attribute as value =20 =20 As long as the name/value paradigm is used, the loop doesn't get much more complicated than: if( element.parent =3D=3D "spectrumDescription" ) { for each child { if (child.name=3D=3D"cvParam") then { if( child.attrs['name'] =3D=3D "Polarity" ) spectrum.polarity =3D child.attrs['value']; } } =20 But if you have to do: if( element.parent =3D=3D "spectrumDescription" ) { for each child { if (child.name=3D=3D"cvParam") then { if( child.attrs['name'] =3D=3D "Positive" ) spectrum.polarity =3D "positive"; else if( child.attrs['name'] =3D=3D "Negative" ) spectrum.polarity =3D "negative"; } } ...parsers will be painful to write and adoption will suffer because of it I think. Not to mention the fact that the idea of adding these things that should really be values as "terms" in the vocabulary is indeed not future-proof. In the future, there might be another IS_A relationship for "LCQ Deca" so that merely by seeing LCQ Deca you won't know that you're looking at an instrument model parameter. Of course, the accession number would tell you uniquely, but then you'll have two accession numbers in the vocabulary with the name "LCQ Deca." Yuck! =20 I think values for terms should be given a special relationship in the CV, they shouldn't be given an "IS_A" relationship and expect the parser to look up the implication of that relationship every time a value-as-term is encountered. =20 -Matt =20 =20 ________________________________ From: psi...@li... [mailto:psi...@li...] On Behalf Of Brian Pratt Sent: Tuesday, August 07, 2007 3:00 PM To: psi...@li... Subject: Re: [Psidev-ms-dev] cvParams using name attribute as value =20 Upon reflection, I realize that this is, for me, actually a new objection to mzML. My original problem with the reliance on CV/OBO is that an XML parser for it looks something like this: =20 for each element { if (element.name=3D=3D"cvParam") then { a whole bunch of handrolled logic to pick this apart } else { there isn't much else } } =20 That's not really an XML parser, therefore I conclude that mzML isn't really XML. But I have previously beaten that horse to death. =20 =20 Now we have something new not to like: it's impossible to write a parser that's even remotely future-proof. Or maybe it's not new, and I just missed it before. Either way, this all looks increasingly ill conceived to me. Sorry to be such a downer. =20 Hey, the horse just twitched: by placing CVparam information in attributes of the elements of a conventionally structured XML schema (ala mzXML) we can make use of the OBO work without adding a lot of unwanted complexity to software systems that aren't really interested in it. An mzML that integrates well with OBO-aware systems is an excellent idea, but an mzML that demands you BE an OBO-aware system seems less likely to achieve widespread adoption. =20 I do understand the desire to maintain an ontology instead of an ontology and an XML schema, but I'm not sure we can really get away with it. By having a schema that offloads most of its work to an external ontology, we're just pushing the work that having a proper schema saves onto the folks creating the readers and writers, making their job much more complicated that it ought to be - you can't autogenerate a parser or serializer without a fully realized schema. I think we risk them deciding that mzXML and mzData aren't really all that broken after all. =20 Brian =20 |
From: Angel P. <an...@ma...> - 2007-08-08 13:04:21
|
On 8/8/07, Eric Deutsch <ede...@sy...> wrote: > > Thank you all for the lively discussion. > > > > One proposal I once made in Lyon (which was roundly dismissed I believe) > was something like this: instead of: > > > > <cvParam cvLabel="MS" accession="MS:1000554" name="LCQ Deca" value=""/> > > > > Have: > > > > <cvParam cvLabel="MS" parentAccession="MS:1000031" accession="MS:1000554" > name="LCQ Deca" value=""/> > > > > Thus the parser can easily be coded to know that any cvParam with a > parentAccession="MS:1000031" is going to be an instrument model whether or > not it's in the CV. The mzML semantic validator tool would, of course, check > all this. The main argument against this was the potential for > inconsistency, I seem to recall. > The argument was that MAGE v1 did cv terms this way and caused tremendous amount of confusion for the MAGE producers and array express annotation checking team alike. It is infinitely easier to deal with nested cvParams than trying to output a term and a parent at the same time. |
From: Angel P. <an...@ma...> - 2007-08-07 20:10:15
|
On 8/7/07, Brian Pratt <bri...@in...> wrote: > > > Hey, the horse just twitched: by placing CVparam information in > attributes of the elements of a conventionally structured XML schema (ala > mzXML) we can make use of the OBO work without adding a lot of unwanted > complexity to software systems that aren't really interested in it. An > mzML that integrates well with OBO-aware systems is an excellent idea, but > an mzML that demands you BE an OBO-aware system seems less likely to achieve > widespread adoption. > Can you name specific attributes that you want to have cv terms be the value for that are currently not in the schema? -angel |
From: Brian P. <bri...@in...> - 2007-08-07 21:20:28
|
Hi Angel, If I understand your question to be about identifying current mismatches between terminology in the schema and the ontology, I'm not sure there are any - but probably only because the schema has so little actual terminology in it. Consider this example: <xs:element name="selectionWindow" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="cvParam" type="dx:CVParamType" minOccurs="2" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> which says absolutely nothing at all about what a selectionWindow element can be expected to contain when you encounter it. It just says it will contain at least two "parameters". Not much of an aid to software development. The schema, if we can call it that, doesn't even specify what some of the most fundamental information about a scan looks like. For example, it specifies that a scan may have a list of precursors, each of which will contain an ionSelection, but stops short of telling you what an ionSelection looks like: <xs:element name="ionSelection" type="dx:ParamGroupType"> <xs:annotation> <xs:documentation>This captures the type of ion selection being performed, and trigger m/z (or m/z's), neutral loss criteria etc. for tandem-MS or data dependent scans.</xs:documentation> </xs:annotation> </xs:element> Nearly all the details of nearly all the elements are just unspecified blobs. Normally with an XML format you can expect to at least start your work by running it through something like XMLSpy that will autogenerate a reader and a writer that you can then polish up (to handle, for example, the necessary weirdness of base64+zlib in the peaklists). But with this, you get no kind of a head start at all, since the vast majority of the syntax is hidden behind blobs like dx:CVParamType and dx:ParamGroupType. It's just not a specification. The statement that led to your question, I think, was just me saying that if we *did* create an actual schema, we'd want its terminology to agree with the ontology where ever possible. But it has to actually contain some terminology, unlike the current schema. Brian _____ From: del...@gm... [mailto:del...@gm...] On Behalf Of Angel Pizarro Sent: Tuesday, August 07, 2007 1:10 PM To: Brian Pratt Cc: psi...@li... Subject: Re: [Psidev-ms-dev] cvParams using name attribute as value On 8/7/07, Brian Pratt <bri...@in...> wrote: Hey, the horse just twitched: by placing CVparam information in attributes of the elements of a conventionally structured XML schema (ala mzXML) we can make use of the OBO work without adding a lot of unwanted complexity to software systems that aren't really interested in it. An mzML that integrates well with OBO-aware systems is an excellent idea, but an mzML that demands you BE an OBO-aware system seems less likely to achieve widespread adoption. Can you name specific attributes that you want to have cv terms be the value for that are currently not in the schema? -angel |
From: Angel P. <an...@ma...> - 2007-08-08 13:00:35
|
On 8/7/07, Brian Pratt <bri...@in...> wrote: > > Hi Angel, > > If I understand your question to be about identifying current mismatches > between terminology in the schema and the ontology, I'm not sure there are > any - but probably only because the schema has so little actual terminology > in it. > My question was more of a pragmatic one, about where would you add specificity into the mzML schema. Your selecitonWindow example below is a good one, in that the specification of of selectWindow is probably a range value and we should have two sub-elements that corresponding to type the cvParam values to define the window (or just a well defined range sub-element, skipping cvParam altogether). I don't think your second example is a good one tho, since there are so many permutations of an ionSelection protocol and that more are certainly one the way, t is better handled by an ontology specification. Yes this does make parsers slightly harder, since now you must pay attention to the incoming ontology, but it is the same amount of work as if everything was in the schema. mzXML could get away with tight specification of these complex and changing annotations, since its sole purpose was support of the ISB pipeline. Its open source status only served to increase the user base, but the schema changes were solely driven by the needs of that pipeline and solely by the community that used it. Tryin to build consensus across many different groups has led to the current version of mzML and that major structure of mzML will not change at this point, so please let's just get to the specifics of going through the schema and identifying where you think an annotation should be promoted to the level of a schema element, and we'll discuss as a group. -angel Consider this example: > > <xs:element name="selectionWindow" maxOccurs="unbounded"> > <xs:complexType> > <xs:sequence> > <xs:element name="cvParam" type="dx:CVParamType" minOccurs="2" > maxOccurs="unbounded"/> > </xs:sequence> > </xs:complexType> > </xs:element> > > which says absolutely nothing at all about what a selectionWindow > element can be expected to contain when you encounter it. It just says it > will contain at least two "parameters". Not much of an aid to software > development. > > The schema, if we can call it that, doesn't even specify what some of the > most fundamental information about a scan looks like. For example, it > specifies that a scan may have a list of precursors, each of which will > contain an ionSelection, but stops short of telling you what an > ionSelection looks like: > > <xs:element name="ionSelection" type="dx:ParamGroupType"> > <xs:annotation> > <xs:documentation>This captures the type of ion selection being performed, > and trigger m/z (or m/z's), neutral loss criteria etc. for tandem-MS or data > dependent scans.</xs:documentation> > </xs:annotation> > </xs:element> > Nearly all the details of nearly all the elements are just unspecified > blobs. Normally with an XML format you can expect to at least start your > work by running it through something like XMLSpy that will autogenerate a > reader and a writer that you can then polish up (to handle, for example, the > necessary weirdness of base64+zlib in the peaklists). But with this, you > get no kind of a head start at all, since the vast majority of the syntax is > hidden behind blobs like dx:CVParamType and dx:ParamGroupType. It's just > not a specification. > > The statement that led to your question, I think, was just me saying that > if we *did* create an actual schema, we'd want its terminology to agree with > the ontology where ever possible. But it has to actually contain > some terminology, unlike the current schema. > > Brian > > > ------------------------------ > *From:* del...@gm... [mailto:del...@gm...] *On Behalf Of *Angel > Pizarro > *Sent:* Tuesday, August 07, 2007 1:10 PM > *To:* Brian Pratt > *Cc:* psi...@li... > *Subject:* Re: [Psidev-ms-dev] cvParams using name attribute as value > > > > On 8/7/07, Brian Pratt <bri...@in...> wrote: > > > > > > Hey, the horse just twitched: by placing CVparam information in > > attributes of the elements of a conventionally structured XML schema (ala > > mzXML) we can make use of the OBO work without adding a lot of unwanted > > complexity to software systems that aren't really interested in it. An > > mzML that integrates well with OBO-aware systems is an excellent idea, but > > an mzML that demands you BE an OBO-aware system seems less likely to achieve > > widespread adoption. > > > > Can you name specific attributes that you want to have cv terms be the > value for that are currently not in the schema? > -angel > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > -- Angel Pizarro Director, Bioinformatics Facility Institute for Translational Medicine and Therapeutics University of Pennsylvania 806 BRB II/III 421 Curie Blvd. Philadelphia, PA 19104-6160 P: 215-573-3736 F: 215-573-9004 |
From: Brian P. <bri...@in...> - 2007-08-08 15:49:39
|
If ionSelection is just one of many things that are too complicated and varied and dynamic to actually specify, then just off the top of my head I think it's going to be pretty hard to do a good job of parsing mzML. I take your point about mzXML being too specific, but there's such a thing as too general as well. My fear is that we'll see it balkanized, with most parsers only really able to deal with the mode of mzML usage that the author really cares about, which just leaves us with a bunch of ad hoc standards. The instrument name example (wherein a parser cannot be made robust enough to read future versions) makes me think that not enough mental energy has gone into considering the practicalities of being a consumer of mzML. I've seen this in other standards efforts I've been involved with in other industries (internet security, circuit board manufacturing) - writers (mostly hardware vendors) love the fexibility because they can just do it their way, but readers (software vendors) bear the brunt of what amounts to one format per vendor, and finally just fall back onto the per-vendor solutions they have already invested in. >> it is the same amount of work as if everything was in the schema. There actually *is* an advantage of specifying via schema instead of ontology, which I've already pointed out - W3C schema is itself a standard with a host of tools built up around it that will generate readers and writers from properly formed schemas. If mzML just used elements for everything and each element had an attribute pointing at the ontololgy I think we'd be better off. The schema and the ontology would need to evolve together, of course. But, as you say, this thing is more or less nailed down at this point, so I'm wasting the list's time with this schema talk, and I do apologise. I don't blame anyone for being annoyed at me dredging up these fundamental objections yet again so late in the process. Anyway, off for vacation until the end of next week. Sorry to start a flame then abandon it. Cheers, Brian _____ From: del...@gm... [mailto:del...@gm...] On Behalf Of Angel Pizarro Sent: Wednesday, August 08, 2007 6:01 AM To: Brian Pratt Cc: psi...@li... Subject: Re: [Psidev-ms-dev] cvParams using name attribute as value On 8/7/07, Brian Pratt <bri...@in...> wrote: Hi Angel, If I understand your question to be about identifying current mismatches between terminology in the schema and the ontology, I'm not sure there are any - but probably only because the schema has so little actual terminology in it. My question was more of a pragmatic one, about where would you add specificity into the mzML schema. Your selecitonWindow example below is a good one, in that the specification of of selectWindow is probably a range value and we should have two sub-elements that corresponding to type the cvParam values to define the window (or just a well defined range sub-element, skipping cvParam altogether). I don't think your second example is a good one tho, since there are so many permutations of an ionSelection protocol and that more are certainly one the way, t is better handled by an ontology specification. Yes this does make parsers slightly harder, since now you must pay attention to the incoming ontology, but it is the same amount of work as if everything was in the schema. mzXML could get away with tight specification of these complex and changing annotations, since its sole purpose was support of the ISB pipeline. Its open source status only served to increase the user base, but the schema changes were solely driven by the needs of that pipeline and solely by the community that used it. Tryin to build consensus across many different groups has led to the current version of mzML and that major structure of mzML will not change at this point, so please let's just get to the specifics of going through the schema and identifying where you think an annotation should be promoted to the level of a schema element, and we'll discuss as a group. -angel Consider this example: <xs:element name="selectionWindow" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="cvParam" type="dx:CVParamType" minOccurs="2" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> which says absolutely nothing at all about what a selectionWindow element can be expected to contain when you encounter it. It just says it will contain at least two "parameters". Not much of an aid to software development. The schema, if we can call it that, doesn't even specify what some of the most fundamental information about a scan looks like. For example, it specifies that a scan may have a list of precursors, each of which will contain an ionSelection, but stops short of telling you what an ionSelection looks like: <xs:element name="ionSelection" type="dx:ParamGroupType"> <xs:annotation> <xs:documentation>This captures the type of ion selection being performed, and trigger m/z (or m/z's), neutral loss criteria etc. for tandem-MS or data dependent scans.</xs:documentation> </xs:annotation> </xs:element> Nearly all the details of nearly all the elements are just unspecified blobs. Normally with an XML format you can expect to at least start your work by running it through something like XMLSpy that will autogenerate a reader and a writer that you can then polish up (to handle, for example, the necessary weirdness of base64+zlib in the peaklists). But with this, you get no kind of a head start at all, since the vast majority of the syntax is hidden behind blobs like dx:CVParamType and dx:ParamGroupType . It's just not a specification. The statement that led to your question, I think, was just me saying that if we *did* create an actual schema, we'd want its terminology to agree with the ontology where ever possible. But it has to actually contain some terminology, unlike the current schema. Brian _____ From: del...@gm... [mailto:del...@gm...] On Behalf Of Angel Pizarro Sent: Tuesday, August 07, 2007 1:10 PM To: Brian Pratt Cc: psi...@li... Subject: Re: [Psidev-ms-dev] cvParams using name attribute as value On 8/7/07, Brian Pratt <bri...@in...> wrote: Hey, the horse just twitched: by placing CVparam information in attributes of the elements of a conventionally structured XML schema (ala mzXML) we can make use of the OBO work without adding a lot of unwanted complexity to software systems that aren't really interested in it. An mzML that integrates well with OBO-aware systems is an excellent idea, but an mzML that demands you BE an OBO-aware system seems less likely to achieve widespread adoption. Can you name specific attributes that you want to have cv terms be the value for that are currently not in the schema? -angel ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev -- Angel Pizarro Director, Bioinformatics Facility Institute for Translational Medicine and Therapeutics University of Pennsylvania 806 BRB II/III 421 Curie Blvd. Philadelphia, PA 19104-6160 P: 215-573-3736 F: 215-573-9004 |
From: Matt C. <mat...@va...> - 2007-08-08 13:24:53
|
Eric Deutsch wrote: > > The decision was made to make individual models cv terms to avoid > problems like: > > <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" > value="LCQ Deca"/> > > <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" > value="LCQ DECA"/> > > <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" > value="LTQ FT"/> > > <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" > value="LTQ-FT"/> > > <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" > value="LTQFT"/> > Is this the main/only reason for this usage of terms? This just seems like a great argument for having the ontology control the values of the terms and not just the terms themselves. That way, the simple term/name->value relationship is always maintained, and this problem is eliminated. I am not advocating changing the structure of mzML at this point, I see this as a rather minor change. > I would argue that your code snippet below would better look like: > > #define MS_CV_POLARITY_TYPE “MS:1000037” > > if( element.parent == “spectrumDescription” ) { > > for each child { > > if (child.name=="cvParam") then { > > if( cv.isChildOf(child.attrs[‘accession], MS_CV_POLARITY_TYPE) ) // if > a polarity type > > spectrum.polarity = cv.getName(child.attrs[‘accession’]); > > } > > } > > Note that the cvParam name (should that be “positive” or “Positive” or > “positive polarity” or “Polarity” or “polarity”?) is not in the code, > just MS:1000037 which can be considered final. > > This does require a CV class and some methods: > > cv.loadFromFile() > > cv.isChildOf() > > cv.getName() > > but this is not really complicated. > But it is really relatively complicated. It is more conceptually and computationally complicated than simple string comparison (with the OPTION of checking the CV to see if the value is a controlled one). And worse, it's a complication I don't see a justification for unless there is a better reason than the one you gave above which has a more simple solution. Why force parsers to create a CV class and methods just to ensure that "LCQ Deca" is spelled right (or that it's given its proper accession number)? -Matt |
From: Mike C. <tu...@gm...> - 2007-08-08 16:24:34
|
On 8/8/07, Matt Chambers <mat...@va...> wrote: > > This does require a CV class and some methods: > > cv.loadFromFile() > > cv.isChildOf() > > cv.getName() > > > > but this is not really complicated. > > But it is really relatively complicated. It is more conceptually and > computationally complicated than simple string comparison (with the > OPTION of checking the CV to see if the value is a controlled one). And > worse, it's a complication I don't see a justification for unless there > is a better reason than the one you gave above which has a more simple > solution. I agree with Matt. A call like "isChildOf" looks simple, but what's entailed in that call is that the *correct* CV is available and has been parsed into a tree in memory. There are good reasons to think that this will be fairly difficult to do correctly in practice. But on top of that, it just seems needlessly difficult. It'd be a little like having products in your grocery store marked with their trademark name, but not a succinct description of what they *are*--which you can only find out with a stock list lookup. ("Shimmer? Is that a floor polish or a dessert topping? Hope my stock list is up to date...") The alternative here would appear to be very simple. Something like the previously mentioned <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" value="LTQ-FT"/> would work fine. As for the differing spellings of "LTQ-FT", there's a canonical spelling available in the CV, and anyone that can't get that right will probably find the complexity of multiple CV versions insurmountable. Consider also, how should newly created instruments be handled? If our lab invents the "MassMaster2000", do we need to create our own augmented CV in order to handle this? Does everyone who wants to read MassMaster2000 mzML files need a copy of this augmented CV? What if they have twenty other augmented CVs? How are those to be managed? Mike |