From: Eric D. <ede...@sy...> - 2007-10-02 22:32:26
|
Hi everyone, I am happy to announce that the mzML 0.99.0 specification document has been submitted to the PSI document process. This is an important milestone in the completion of mzML, but it is most certainly not the end of development and feedback. =20 The specification document and all related materials are publicly available at: =20 http://psidev.info/index.php?q=3Dnode/257 =20 There are various kits of instance documents, xsds, the controlled vocabulary, validators, etc. listed at that site. Please examine and respond. =20 The actual specification document is posted at: =20 http://psidev.info/index.php?q=3Dnode/300 =20 You may post comments at that site, or you may send them to this list. We addressed nearly all issues brought up in the preview period in August. The one main issue that remains unresolved is the problem of cvParams and how to handle the inevitable scenario of new terms and older software. This is an important issue. There is a discussion of it in the specification document. Your input is sought. =20 We encourage you to begin developing (or adapting) software that implements the format if you are comfortable knowing that there will be changes before the 1.0.0 release. I believe that it is primarily by attempting to implement the format that the community will test the format most rigorously and reveal issues that still need to be resolved; this is far more effective than gazing at the specification document. =20 Regards, Eric =20 =20 ---------------------------------- Eric Deutsch, Ph.D. Institute for Systems Biology 1441 North 34th Street Seattle WA 98103 Tel: 206-732-1397 Fax: 206-732-1260 Email: ede...@sy... WWW: http://www.systemsbiology.org/Senior_Research_Scientists/Eric_Deutsch =20 |
From: Brian P. <bri...@in...> - 2007-10-03 16:11:27
|
Looks like most commenting happens on this list, so here goes: >From the spec: "The mzData format was a far more flexible format than mzXML. The support of new technologies could be added to mzData files by adding new controlled vocabulary terms, while mzXML often required a full schema revision. This is evidenced by mzData still at version 1.05 while mzXML is currently at version 3.1. However, mzData did suffer from a problem of inconsistently used vocabulary terms and there appeared several different dialects of mzData, encoding the same information in subtly different ways. This was not usually a problem for human inspection of the file, but caused difficulty writing and maintaining reader software." This is specious. The fact that mzData hasn't revved only says to me that it's badly underspecified, which the paragraph in fact goes on to illustrate. The occasional revision of the mzXML schema, to my mind, indicates a well maintained standard*. A stable schema and evolving ontology produce as much or more reader/writer code maintenance work as an evolving schema-only does. It's not like mzData readers don't have to be updated every time something gets added to the ontology. At least with a schema there are ways to generate code for these kinds of changes automatically, and to easily validate the results. Frankly when it comes to data formats I think the term "flexible" is synonymous for "trouble" - convenient for the writers, hell for the readers, and often a dead end for that reason. I really think mzML will just perpetuate the issues mzData presented. Better we should figure out a way to generate a proper XML schema based on the ontology document. The rest of the world uses proper XML, I really don't see what makes us special. Well, hey, you asked. - Brian *note that most of the mzXML revisions had to do with things like adding data compression to peaklists. It wasn't getting banged around every time somebody came out with a new mass spec, like the ontology will. _____ From: psi...@li... [mailto:psi...@li...] On Behalf Of Eric Deutsch Sent: Tuesday, October 02, 2007 3:32 PM To: psi...@li... Cc: Eric Deutsch Subject: [Psidev-ms-dev] mzML 0.99.0 submitted to document process Hi everyone, I am happy to announce that the mzML 0.99.0 specification document has been submitted to the PSI document process. This is an important milestone in the completion of mzML, but it is most certainly not the end of development and feedback. The specification document and all related materials are publicly available at: http://psidev.info/index.php?q=node/257 There are various kits of instance documents, xsds, the controlled vocabulary, validators, etc. listed at that site. Please examine and respond. The actual specification document is posted at: http://psidev.info/index.php?q=node/300 You may post comments at that site, or you may send them to this list. We addressed nearly all issues brought up in the preview period in August. The one main issue that remains unresolved is the problem of cvParams and how to handle the inevitable scenario of new terms and older software. This is an important issue. There is a discussion of it in the specification document. Your input is sought. We encourage you to begin developing (or adapting) software that implements the format if you are comfortable knowing that there will be changes before the 1.0.0 release. I believe that it is primarily by attempting to implement the format that the community will test the format most rigorously and reveal issues that still need to be resolved; this is far more effective than gazing at the specification document. Regards, Eric ---------------------------------- Eric Deutsch, Ph.D. Institute for Systems Biology 1441 North 34th Street Seattle WA 98103 Tel: 206-732-1397 Fax: 206-732-1260 Email: ede...@sy... WWW: http://www.systemsbiology.org/Senior_Research_Scientists/Eric_Deutsch |
From: Lennart M. <len...@gm...> - 2007-10-04 10:36:06
|
Hi Brian, > This is specious. The fact that mzData hasn’t revved only says to me > that it’s badly underspecified, which the paragraph in fact goes on to > illustrate. The occasional revision of the mzXML schema, to my mind, > indicates a well maintained standard*. A stable schema and evolving > ontology produce as much or more reader/writer code maintenance work as > an evolving schema-only does. PRIDE has a stable schema, yet a rapidly evolving CV. We did not need to recode PRIDE whenever we changed the CV. So from experience: a stable schema + evolving (but initially well-organized) CV is not a problem in terms of maintenance. Having to redo the schema every other month is also possible, but nevertheless more hassle. > It’s not like mzData readers don’t have > to be updated every time something gets added to the ontology. At least > with a schema there are ways to generate code for these kinds of changes > automatically, and to easily validate the results. Frankly when it > comes to data formats I think the term “flexible” is synonymous for > “trouble” – convenient for the writers, hell for the readers, and often > a dead end for that reason. Let me make a black and white scenario for you - you have everything as attributes in the schema, and you auto-generate parsing code every week since you keep adding or changing attributes. Fine, no worries. Zero backwards compatibility, but hey - who cares about yesterdays data, right? And your generated code will swallow anything that is remotely using the right glyphs in those attributes (e.g.: 'I'm not providing sensible information here' as the value for the 'instrument_name' attribute). If your objective is convenience for the programmers (whose job it should be to program), you choose the 'everything in schema' path. If your objective is to transmit meaningful and validated/validatable data, you go the current mzML path. Now which one would make the most sense for a standard? > I really think mzML will just perpetuate the issues mzData presented. > Better we should figure out a way to generate a proper XML schema based > on the ontology document. The rest of the world uses proper XML, I > really don’t see what makes us special. I do not believe that (a) mzData presents more issues than uses, (b) even if (a) were true, that mzML blatantly propagates these, and (c) that starting from scratch with a far too rigid, implicitly non-backwards compatible and unvalidatable (content-wise, which is where it matters) data transmission format is the way to go forward. > *note that most of the mzXML revisions had to do with things like adding > data compression to peaklists. It wasn’t getting banged around every > time somebody came out with a new mass spec, like the ontology will. mzML will not get 'banged about' every time a new mass spec is added. That is the whole point. Please do try to understand the relatively simple concept - an addition to the instruments is completely and utterly transparant. Cheers, lnnrt. |
From: Marc S. <st...@in...> - 2007-10-04 08:06:21
|
Hi all, first of all i would like to thank Eric and all the others in the working group for their effort. Here are my comments: (1) The new CV term problem A is clear and simple. B is simply a bad idea in my opinion. Why not use the child accession if we have it? C helps the software to know where the new term belongs, but the software does not know what to do with it in most cases. I think most of software implements these enum-like CV terms as enum types and thus cannot handle new values anyway. Additionally it is error prone (mismatching parent and child). As C is an extension of A, i vote for A or C, but i don't think that C helps very much. (2) Semantic validator The semantic validator is a nice feature, but i think you must publish a file that defines the mapping of CV terms to the schema. This file must answer questions like: Where can i use which term? How often can i repeat a term? etc. With the heavy use of CV terms such a file is a non-optional part of the format definition. What happened to that format Luisa proposed? (3) Comments to CV / Schema - The term MS:1000543 "data processing action" is missing some child terms i think. What about smoothing, baseline reduction and removal low intensity data points? - Putting the software name in a CV will cause much trouble i think. Where are way to many upcoming tools and you will be constantly updating that obo file. I really think we should put that into a string attribute - I would add a new optional and unbounded element "parameter" with attributes "name", "type", value" to the dx:dataProcessing element to store the parameters of the software that were used for processing. (4) General Finally i'd like to say that i agree with Brian Pratt. There is too much CV and too little XML in the format for my taste. I don't argue against CV in general it's a nice technique that allows the schema to be stable for a long time. But now everything is in the CV and there are hardly any XML attributes left. This makes the format hard to implement and impossible to check with an XML validator. And i don't see the advantage in most cases: I have to adapt the software to new terms just as i would adapt it to new XML elements. Best regards, Marc |
From: Lennart M. <len...@gm...> - 2007-10-04 10:45:24
|
Hi Marc, > (2) Semantic validator > The semantic validator is a nice feature, but i think you must publish a > file that defines the mapping of CV terms to the schema. > This file must answer questions like: Where can i use which term? How > often can i repeat a term? etc. > With the heavy use of CV terms such a file is a non-optional part of the > format definition. > What happened to that format Luisa proposed? It is included :). Look in the 'ms-mapping.xml' file. It is (quite literally so) Luisa's file. The whole validator relies on a role-based 'separation of concerns', so that the application is nearly 100% dynamically configured. It is a nice piece of work that we are currently writing up in order to publish it. Meanwhile, I'd be happy to provide more information on how the whole thing works. Just let me know what you want to learn. > (4) General > Finally i'd like to say that i agree with Brian Pratt. There is too much > CV and too little XML in the format for my taste. > I don't argue against CV in general it's a nice technique that allows > the schema to be stable for a long time. > But now everything is in the CV and there are hardly any XML attributes > left. This makes the format hard to implement and impossible to check > with an XML validator. > And i don't see the advantage in most cases: I have to adapt the > software to new terms just as i would adapt it to new XML elements. If you could use software that answered simple CV questions like 'what is the parent of X', or 'get children for X', or 'is X one of the children of Y (optionally with maximum Z generations)' (for instance); and if this software is on the net and always up-to-date, would that still mean you always have to redo everything? I at least wouldn't expect so. It just requires a new way of dealing with the content of the file (which again, is what matters). Also remember that the semantic validator, in series after a schema validator, provides maximum validation for a file like an mzML file - both structure and content are thoroughly verfied (and nearly 100% dynamically configured - zero recoding necessary when new children get added, for instance). Cheers, lnnrt. |
From: Jones, A. [jonesar] <And...@li...> - 2007-10-04 10:40:05
|
Hi all, The decision about how to implement CV terms is pretty important and we = should try to come up with a coherent policy across PSI if possible. = Here are my thoughts: A while back Luisa and myself drafted a proposal for mapping model = elements to CV terms that may simplify some of the problems currently = being worked through. The draft and sample instance are here: = http://www.psidev.info/index.php?q=3Dnode/159 (see Mapping between = exchange schema and CVs). I would strongly vote for option A, and in addition maintain a mapping = file. This is more work for the CV coordinators (but hopefully can be = mainly automated), and would force software implementers to interact = with the CV WG when they need new terms, but given the heavy reliance on = CV terms in the mzML schema I see no way around this.=20 If a mapping file is kept updated in parallel to the CV, software can = check whether a valid term has been provided for a particular model = element. In the example of spectrumType, the mapping file would specify = that only child terms of spectrumType are allowed (e.g. for the model = element fileContent). If a vendor publishes a file with: <fileContent> <cvParam cvLabel=3D"MS" accession=3D"MS:9999999" name=3D"SRM spectrum" = value=3D""/> </fileContent> This would automatically be rejected by the validator (or at least a = warning output), as it should be, since there's no point having a CV = where the terms are not controlled! =20 Option B <cvParam cvLabel=3D"MS" accession=3D"MS:1000035" = name=3D"spectrum type" value=3D"SRM spectrum"/> looks particular bad to = me, since there is no check that correct values are given. As was = mentioned elsewhere on the list, you run into problems with upper/lower = case, spacing etc. If software is going to rely on particular values = being present, those values must be in the CV with persistent = identifiers.=20 I believe OBO does not have the ability to distinguish between = ontological classes (i.e. there as branch structure) and = instances/individuals (i.e. leaf nodes used as values to annotate data). = Again, this could be handled by the mapping file that specifies which = terms can be used to annotate model elements. A related point, in mzData, there is inconsistent usage of the value = slot, since the specification has no ability to say whether a value (and = a unit) should be given or not e.g. for term "sample mass (MS:1000004)" = software should know that a value and unit must be given. It is = reasonable that software should be able to check whether to expect a = value or not for particular CV terms. Logically, this should be part of = the CV itself, but as far as I'm aware OBO does not have this = capability. One solution would be to add this to the mapping file as two = Booleans on the cvTerm (allowsValue =3D "true/false" and requiresUnit = =3D "true/false"). Cheers Andy > -----Original Message----- > From: psi...@li... = [mailto:psidev-ms-dev- > bo...@li...] On Behalf Of Marc Sturm > Sent: 04 October 2007 09:06 > To: psi...@li... > Subject: Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process >=20 > Hi all, >=20 > first of all i would like to thank Eric and all the others in the > working group for their effort. > Here are my comments: >=20 > (1) The new CV term problem > A is clear and simple. > B is simply a bad idea in my opinion. Why not use the child accession = if > we have it? > C helps the software to know where the new term belongs, but the > software does not know what to do with it in most cases. I think most = of > software implements these enum-like CV terms as enum types and thus > cannot handle new values anyway. Additionally it is error prone > (mismatching parent and child). >=20 > As C is an extension of A, i vote for A or C, but i don't think that C > helps very much. >=20 > (2) Semantic validator > The semantic validator is a nice feature, but i think you must publish = a > file that defines the mapping of CV terms to the schema. > This file must answer questions like: Where can i use which term? How > often can i repeat a term? etc. > With the heavy use of CV terms such a file is a non-optional part of = the > format definition. > What happened to that format Luisa proposed? >=20 > (3) Comments to CV / Schema > - The term MS:1000543 "data processing action" is missing some child > terms i think. What about smoothing, baseline reduction and removal = low > intensity data points? > - Putting the software name in a CV will cause much trouble i think. > Where are way to many upcoming tools and you will be constantly = updating > that obo file. I really think we should put that into a string = attribute > - I would add a new optional and unbounded element "parameter" with > attributes "name", "type", value" to the dx:dataProcessing element to > store the parameters of the software that were used for processing. >=20 > (4) General > Finally i'd like to say that i agree with Brian Pratt. There is too = much > CV and too little XML in the format for my taste. > I don't argue against CV in general it's a nice technique that allows > the schema to be stable for a long time. > But now everything is in the CV and there are hardly any XML = attributes > left. This makes the format hard to implement and impossible to check > with an XML validator. > And i don't see the advantage in most cases: I have to adapt the > software to new terms just as i would adapt it to new XML elements. >=20 > Best regards, > Marc >=20 > = -------------------------------------------------------------------------= > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a = browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Lennart M. <len...@gm...> - 2007-10-04 10:53:35
|
Hi Andy, > The decision about how to implement CV terms is pretty important and we should try to come up with a coherent policy across PSI if possible. Here are my thoughts: > > A while back Luisa and myself drafted a proposal for mapping model elements to CV terms that may simplify some of the problems currently being worked through. The draft and sample instance are here: http://www.psidev.info/index.php?q=node/159 (see Mapping between exchange schema and CVs). > > I would strongly vote for option A, and in addition maintain a mapping file. This is more work for the CV coordinators (but hopefully can be mainly automated), and would force software implementers to interact with the CV WG when they need new terms, but given the heavy reliance on CV terms in the mzML schema I see no way around this. > > If a mapping file is kept updated in parallel to the CV, software can check whether a valid term has been provided for a particular model element. In the example of spectrumType, the mapping file would specify that only child terms of spectrumType are allowed (e.g. for the model element fileContent). If a vendor publishes a file with: > > <fileContent> > <cvParam cvLabel="MS" accession="MS:9999999" name="SRM spectrum" value=""/> > </fileContent> > > This would automatically be rejected by the validator (or at least a warning output), as it should be, since there's no point having a CV where the terms are not controlled! That mapping file is effectively in use by our mzML semantic validator, for exactly the reasons you outlined above! So yes - this has been made available in the larger mzML kit and has also been implemented online (your above example indeed does not validate). Cheers, lnnrt. |
From: Angel P. <an...@ma...> - 2007-10-04 13:22:05
|
so where this mzML kit that you mention? With the OLS? -angel On 10/4/07, Lennart Martens <len...@gm...> wrote: > > > So yes - this has been made available in the larger mzML kit and has > also been implemented online (your above example indeed does not > validate). > > |
From: Matthew C. <mat...@va...> - 2007-10-04 17:05:45
|
I'll comment here on the mzML schema and validation of mzML instances. I do not see why a proper XML schema with semantic significance could not be generated for mzML. XML schema have the capability to provide robust restrictions on both elements and attributes, and such a schema could be automatically generated from the CV itself (when combined with a skeleton model of mzML). Some people complain that mzML is not true XML. That's rather misleading. Others say it needs a special "semantic" validator with its own mapping file. I say that is duplicative and even overkill. Existing schema technology can handle the format specified here, but I grant that the schema WILL have to be very complicated (you won't just have a single cvParam type or ParamGroupType, each part of the schema will have its own cvParam elements with semantically relevant restrictions on the accession numbers) and almost certainly should be machine-generated. I see nothing wrong with a complicated schema though, because the variety of data that we are intending to represent is also very complicated! I don't know if existing automatic code generators work for very complicated schema, but the automatic XML validators definitely should and thus the need for a separate "semantic" validator is unclear to me when the semantic relationships can be encapsulated in an automatically generated XML schema. For example, the <contact> element could be defined semantically in XML schema like this: <xs:complexType name="ContactParamGroupType"> <xs:sequence> <xs:element name="paramGroupRef" type="dx:ContactParamGroupRefType" minOccurs="0" maxOccurs="unbounded"/> <xs:element name="cvParam" minOccurs="0" maxOccurs="1"> <xs:complexType> <xs:attribute name="cvLabel" type="xs:string"> <xs:restriction base="xs:string"> <xs:pattern value="MS"/> </xs:restriction> </xs:attribute> <xs:attribute name="accession" type="xs:string"> <xs:restriction base="xs:string"> <xs:pattern value="MS:1000586"/> </xs:restriction> </xs:attribute> <xs:attribute name="name" type="xs:string"> <xs:restriction base="xs:string"> <xs:pattern value="contact name"/> </xs:restriction> </xs:attribute> <xs:attribute name="value" type="xs:string"/> </xs:complexType> </xs:element> <xs:element name="cvParam" minOccurs="0" maxOccurs="1"> <xs:complexType> <xs:attribute name="cvLabel" type="xs:string"> <xs:restriction base="xs:string"> <xs:pattern value="MS"/> </xs:restriction> </xs:attribute> <xs:attribute name="accession" type="xs:string"> <xs:restriction base="xs:string"> <xs:pattern value="MS:1000587"/> </xs:restriction> </xs:attribute> <xs:attribute name="name" type="xs:string"> <xs:restriction base="xs:string"> <xs:pattern value="contact address"/> </xs:restriction> </xs:attribute> <xs:attribute name="value" type="xs:string"/> </xs:complexType> </xs:element> <xs:element name="cvParam" minOccurs="0" maxOccurs="1"> <xs:complexType> <xs:attribute name="cvLabel" type="xs:string"> <xs:restriction base="xs:string"> <xs:pattern value="MS"/> </xs:restriction> </xs:attribute> <xs:attribute name="accession" type="xs:string"> <xs:restriction base="xs:string"> <xs:pattern value="MS:1000588"/> </xs:restriction> </xs:attribute> <xs:attribute name="name" type="xs:string"> <xs:restriction base="xs:string"> <xs:pattern value="contact URL"/> </xs:restriction> </xs:attribute> <xs:attribute name="value" type="xs:anyURI"/> </xs:complexType> </xs:element> <xs:element name="cvParam" minOccurs="0" maxOccurs="1"> <xs:complexType> <xs:attribute name="cvLabel" type="xs:string"> <xs:restriction base="xs:string"> <xs:pattern value="MS"/> </xs:restriction> </xs:attribute> <xs:attribute name="accession" type="xs:string"> <xs:restriction base="xs:string"> <xs:pattern value="MS:1000589"/> </xs:restriction> </xs:attribute> <xs:attribute name="name" type="xs:string"> <xs:restriction base="xs:string"> <xs:pattern value="contact email"/> </xs:restriction> </xs:attribute> <xs:attribute name="value" type="dx:email"/> </xs:complexType> </xs:element> <xs:element name="userParam" type="dx:UserParamType" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:element name="contact" type="dx:ContactParamGroupType" minOccurs="0" maxOccurs="unbounded"/> Like I said, this needs to be machine generated, but it would create a XML schema that removes the need for any other kind of semantic mapping and any new tool to do the validation with that mapping. Now that I think about it again, this kind of often-updated schema would violate the unchangedness requirement from the specification: "It was hoped that the actual xsd schema could remain stable for many years while the accompanying controlled vocabulary could be frequently updated to support new technologies, instruments, and methods of acquiring data." But what is the different between a frequently updated mapping file which is REQUIRED to get semantic validation, and a frequently updated primary schema which is REQUIRED to get semantic validation? -Matt Lennart Martens wrote: > That mapping file is effectively in use by our mzML semantic validator, > for exactly the reasons you outlined above! > So yes - this has been made available in the larger mzML kit and has > also been implemented online (your above example indeed does not validate). > > |
From: Angel P. <an...@ma...> - 2007-10-04 17:59:32
|
On 10/4/07, Matthew Chambers <mat...@va...> wrote: > > I'll comment here on the mzML schema and validation of mzML instances. > I do not see why a proper XML schema with semantic significance could > not be generated for mzML. XML schema have the capability to provide > robust restrictions on both elements and attributes, and such a schema > could be automatically generated from the CV itself (when combined with > a skeleton model of mzML). This is an interesting idea, but as you mention below there are no tools for doing this, so if you have a CS masters student available .... ;) Some people complain that mzML is not true > XML. That's rather misleading. +1 on that. mzML is valid and real XML. It just isn't using the enumerated values of XML. -angel |
From: Lennart M. <len...@gm...> - 2007-10-04 22:21:07
|
Hi Matt, > But what is the different between a frequently updated mapping > file which is REQUIRED to get semantic validation, and a frequently > updated primary schema which is REQUIRED to get semantic validation? The fact that the mapping file most often does not need to be updated to operate correctly after CV changes, since it is based on the CV structure (term-to-term links) rather than the actual accession numbers. Indeed, for many CV param elements, the required (allowed) accession numbers for that alement are not even in the cv mapping. Cheers, lnnrt. |
From: Brian P. <bri...@in...> - 2007-10-04 22:47:39
|
Hi Lennart, I'm not sure I understand, but my guess is that what's being said here is that most CV additions are just leaves on the inheritance tree, along the lines of our example of the introduction of "Super Ion Trap Turbo", and are minimally disruptive. Such additions would be minimally disruptive to a W3C schema as well, as long as it doesn't bother with restriction elements for things like instrument names, which it really shouldn't (it's not an error to come up with a new instrument name value). Thus the addition of instrument type "Super Ion Trap Turbo" to the CV would not provoke a rev of the the W3C schema, so that's nothing to worry about if we went that route. Come to think of it, it sounds a bit like that mapping file is just another dialect of schema? Maybe we're nearly there already. But I'm pretty sure I didn't understand... perhaps an example would help? Thanks, Brian -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Lennart Martens Sent: Thursday, October 04, 2007 3:21 PM To: Matthew Chambers Cc: psi...@li... Subject: Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process Hi Matt, > But what is the different between a frequently updated mapping > file which is REQUIRED to get semantic validation, and a frequently > updated primary schema which is REQUIRED to get semantic validation? The fact that the mapping file most often does not need to be updated to operate correctly after CV changes, since it is based on the CV structure (term-to-term links) rather than the actual accession numbers. Indeed, for many CV param elements, the required (allowed) accession numbers for that alement are not even in the cv mapping. Cheers, lnnrt. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Matt C. <mat...@va...> - 2007-10-04 23:20:20
|
I think I may understand him. However, as far as I know there ARE supposed to be restriction elements for instrument names (otherwise you wouldn't have a valid accession number; although like I've already suggested, we could have a special accession number to mean 'not yet in CV' or 'CV entry pending'). With the external mapping file, they've got the following logic: > Given our current parser state in the "spectrum description" section of a spectrum, make sure all cvParams in this section have an accession number in the CV that pertains to describing the spectrum, e.g. the accession number for "SRM Spectrum." It can get more specific than that, of course. So the mapping file could stay the same when terms are added, it would only need to be changed when the schema's structure changed. As far as I know, with an XML schema, there is no way to create an enumeration dynamically, i.e. for a cvParam in the spectrum description section: <xs:restriction><-- dynamically restrict to accession numbers in CV related to spectrum description --></xs:restriction> If I understand this right, I still don't get the advantage. What do we gain by having a stable mapping file which dynamically restricts by looking up to the CV, versus a machine-generated schema which is automatically updated every time the CV changes? In both cases, you can't remove terms from the CV without breaking backward compatibility, but otherwise you should be fine. The only changes between schema versions would be changes to the <xs:restriction> enumerations that define which accession numbers can appear where. -Matt Brian Pratt wrote: > Hi Lennart, > > I'm not sure I understand, but my guess is that what's being said here is > that most CV additions are just leaves on the inheritance tree, along the > lines of our example of the introduction of "Super Ion Trap Turbo", and are > minimally disruptive. Such additions would be minimally disruptive to a W3C > schema as well, as long as it doesn't bother with restriction elements for > things like instrument names, which it really shouldn't (it's not an error > to come up with a new instrument name value). Thus the addition of > instrument type "Super Ion Trap Turbo" to the CV would not provoke a rev of > the the W3C schema, so that's nothing to worry about if we went that route. > > > Come to think of it, it sounds a bit like that mapping file is just another > dialect of schema? Maybe we're nearly there already. > > But I'm pretty sure I didn't understand... perhaps an example would help? > > Thanks, > > Brian > > > > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On Behalf Of Lennart > Martens > Sent: Thursday, October 04, 2007 3:21 PM > To: Matthew Chambers > Cc: psi...@li... > Subject: Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process > > Hi Matt, > > >> But what is the different between a frequently updated mapping >> file which is REQUIRED to get semantic validation, and a frequently >> updated primary schema which is REQUIRED to get semantic validation? >> > > The fact that the mapping file most often does not need to be updated to > operate correctly after CV changes, since it is based on the CV > structure (term-to-term links) rather than the actual accession numbers. > Indeed, for many CV param elements, the required (allowed) accession > numbers for that alement are not even in the cv mapping. > > > Cheers, > > lnnrt. > |
From: Brian P. <bri...@in...> - 2007-10-05 00:05:46
|
Hi Matt, You're right, to get complete automated validation from standard XML handling tools you'd want to employ restriction elements in the schema. So, yes, the schema would officially rev every time the CV officially did, which makes sense as it's a tool for checking CV conformance. And in the end, stability of the schema isn't the goal - stability of the code that deals with the data format is the goal. For the kind of leaf-level CV changes we're talking about, most parsers would *not* change since they do not in general bother with validating against restriction lists for performance reasons. As such parsers would also function perfectly well on most data that anticipate official CV+schema updates. And, the mzML format would be more compact and much more human readable. This external CV mapping file sounds like an artifact that could just as readily be derived on the fly by examining the is_a and part_of fields in the CV itself, yes? Have you got a URL for an example? Brian -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Matt Chambers Sent: Thursday, October 04, 2007 4:18 PM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process I think I may understand him. However, as far as I know there ARE supposed to be restriction elements for instrument names (otherwise you wouldn't have a valid accession number; although like I've already suggested, we could have a special accession number to mean 'not yet in CV' or 'CV entry pending'). With the external mapping file, they've got the following logic: > Given our current parser state in the "spectrum description" section of a spectrum, make sure all cvParams in this section have an accession number in the CV that pertains to describing the spectrum, e.g. the accession number for "SRM Spectrum." It can get more specific than that, of course. So the mapping file could stay the same when terms are added, it would only need to be changed when the schema's structure changed. As far as I know, with an XML schema, there is no way to create an enumeration dynamically, i.e. for a cvParam in the spectrum description section: <xs:restriction><-- dynamically restrict to accession numbers in CV related to spectrum description --></xs:restriction> If I understand this right, I still don't get the advantage. What do we gain by having a stable mapping file which dynamically restricts by looking up to the CV, versus a machine-generated schema which is automatically updated every time the CV changes? In both cases, you can't remove terms from the CV without breaking backward compatibility, but otherwise you should be fine. The only changes between schema versions would be changes to the <xs:restriction> enumerations that define which accession numbers can appear where. -Matt Brian Pratt wrote: > Hi Lennart, > > I'm not sure I understand, but my guess is that what's being said here is > that most CV additions are just leaves on the inheritance tree, along the > lines of our example of the introduction of "Super Ion Trap Turbo", and are > minimally disruptive. Such additions would be minimally disruptive to a W3C > schema as well, as long as it doesn't bother with restriction elements for > things like instrument names, which it really shouldn't (it's not an error > to come up with a new instrument name value). Thus the addition of > instrument type "Super Ion Trap Turbo" to the CV would not provoke a rev of > the the W3C schema, so that's nothing to worry about if we went that route. > > > Come to think of it, it sounds a bit like that mapping file is just another > dialect of schema? Maybe we're nearly there already. > > But I'm pretty sure I didn't understand... perhaps an example would help? > > Thanks, > > Brian > > > > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On Behalf Of Lennart > Martens > Sent: Thursday, October 04, 2007 3:21 PM > To: Matthew Chambers > Cc: psi...@li... > Subject: Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process > > Hi Matt, > > >> But what is the different between a frequently updated mapping >> file which is REQUIRED to get semantic validation, and a frequently >> updated primary schema which is REQUIRED to get semantic validation? >> > > The fact that the mapping file most often does not need to be updated to > operate correctly after CV changes, since it is based on the CV > structure (term-to-term links) rather than the actual accession numbers. > Indeed, for many CV param elements, the required (allowed) accession > numbers for that alement are not even in the cv mapping. > > > Cheers, > > lnnrt. > ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Brian P. <bri...@in...> - 2007-10-04 19:05:12
|
To review: A) <cvParam cvLabel="MS" accession="MS:1000583" name="SRM spectrum" value=""/> B) <cvParam cvLabel="MS" accession="MS:1000035" name="spectrum type" value="SRM spectrum"/> C) <cvParam cvLabel="MS" categoryAccession="MS:1000035" categoryName="spectrum type" accession="MS:1000583" name="SRM spectrum" value=""/> I'd propose option D (or C+ if you prefer): <cvParam cvLabel="MS" categoryAccession="MS:1000035" accession="MS:1000583" name="SRM spectrum" /> The category (I'd prefer "parent") name is redundant - the parser is going to use the accession number, and the human is going to get meaning from the name itself with the CV as a fallback. The value for "value" should be defaulted to "", it's just taking up space. Also, for eyeballing purposes it would be nice if the human readable part came first rather than last, if it's all the same to the parsers. And, I'd move the parent to the end since it's likely it won't be needed. So, <cvParam name="SRM spectrum" cvLabel="MS" accession="MS:1000583" parentAccession="MS:1000035"/> - Brian |
From: Mike C. <tu...@gm...> - 2007-10-04 19:34:25
|
F) <cvParam cvLabel="MS" categoryName="spectrum type" name="SRM spectrum"> ? That is, can the accession number be uniquely determined from the name? If so, could these be looked up later if needed? Mike |
From: Matthew C. <mat...@va...> - 2007-10-04 19:37:31
|
Brian Pratt wrote: > To review: > A) <cvParam cvLabel="MS" accession="MS:1000583" name="SRM spectrum" > value=""/> > > B) <cvParam cvLabel="MS" accession="MS:1000035" name="spectrum type" > value="SRM spectrum"/> > > C) <cvParam cvLabel="MS" categoryAccession=”MS:1000035” > categoryName=”spectrum type” accession="MS:1000583" name="SRM > spectrum" value=""/> > > I'd propose option D (or C+ if you prefer): > > <cvParam cvLabel="MS" categoryAccession=”MS:1000035” > accession="MS:1000583" name="SRM spectrum" /> > > The category (I'd prefer "parent") name is redundant - the parser is > going to use the accession number, and the human is going to get > meaning from the name itself with the CV as a fallback. The value for > "value" should be defaulted to "", it's just taking up space. > > Also, for eyeballing purposes it would be nice if the human readable > part came first rather than last, if it's all the same to the parsers. > And, I'd move the parent to the end since it's likely it won't be > needed. So, > > <cvParam name="SRM spectrum" cvLabel="MS" accession="MS:1000583" > parentAccession=”MS:1000035”/> > > - Brian I agree that ordering the attributes the way you have them might be good for convention and they should be that way in the examples, there's no reason to actually require them to be in the order, is there? Also, to add my proposal from the other post, I'll call it: E) <cvParam cvLabel="MS" accession=”MS:1000035” name=”spectrum type” valueAccession="MS:1000583" valueName="SRM spectrum"/> I feel rather strongly that the "name" of a "parameter" should not ever be interpreted as a value. A "valueName" on the other hand, can be a text description of the valueAccession which is what the parser will usually care about. Additionally, this proposal allows the "accession" attribute to consistently refer to a category, instead of sometimes referring to a category and sometimes referring to a value, which is counter-intuitive. Another thing to discuss with either C, D, or E, is what exactly is the "category" accession going to refer to? In a previous post of yours Brian, you wrote: > Piling on with Mike, here: > So the first thing any parser must do is load up the OBO file. In > practice, such a software system will need to bundle an OBO in some > fashion, in the extremely likely event that the OBO used by the mzML > file in question is not present. Don't forget to update your distro > each time the OBO gets updated, and make sure that in the event the > OBO used by the mzML file IS present, you use that intead. > Then, read: > > > <cvParam cvLabel="MS" accession="MS:1000554" name="LCQ Deca" value=""/> > > then ask yourself, "whazzat?", and look up: > > id: MS:1000554 > name: LCQ Deca > def: "ThermoFinnigan LCQ Deca." [PSI:MS] > is_a: MS:1000125 ! thermo finnigan > > which leads you to: > > id: MS:1000125 > name: thermo finnigan > def: "ThermoFinnigan from Thermo Electron Corporation" [PSI:MS] > is_a: MS:1000483 ! thermo fisher scientific > > which leads you to: > > id: MS:1000483 > name: thermo fisher scientific > def: "Thermo Fisher Scientific. Also known as Thermo Finnigan > corporation." [PSI:MS] > related_synonym: "Thermo Scientific" [] > is_a: MS:1000031 ! model by vendor > > which leads you to: > > id: MS:1000031 > name: model by vendor > def: "Instrument's model name (everything but the vendor's name) > ---Free text ?" [PSI:MS] > relationship: part_of MS:1000463 ! instrument description > > which leads you to: > > id: MS:1000463 > name: instrument description > def: "Device which performs a measurement." [PSI:MS] > relationship: part_of MS:0000000 ! mzOntology > > aha! now populate the "instrument description" element in your database. > So the main category is MS:1000463, but MS:1000463 is not the parent of MS:1000554 (it is an ancestor, but more specifically it is the root). Intuitively, the category accession number should of course be the root in this case, but will that always be the case? -Matt |
From: Chris T. <chr...@eb...> - 2007-10-04 19:53:27
|
There is another reason for numerical accessions in classifications (that I may have missed someone else offering in the flood today) be it a CV or a DB like GenBank or whatever, which is kind of trivial but nonetheless worth keeping in mind (and regardless, let us remember that not only the PSI's CVs constitute use cases for whatever structure is agreed -- while the MS CV is under PSI control, little else is): The reason is a simple one -- accession _numbers_ are most usually used because they are assigned like tickets for people waiting in line at the store -- whatever turns up gets the next available number from the stack basically. Using meaningful strings makes this much more of a pain as the space of 'nice' names will get used up and you can guess the rest -- names will ultimately get less intuitive (and remember a good CV can take a paragraph to _define_ a concept to avoid misinterpretation, so a word/phrase is not enough to achieve interpretability in many cases anyway); it'll be an increasing pain checking uniqueness before assigning new labels; case-sensitivity issues may even arise in some contexts perhaps (although I know you will tell me that lookup and other processing is unaffacted). A nice contrast can be had by comparing DeltaMass (term = accession -- worst case scenario) to say RESID or Unimod. Another thought occurs -- would one need to agree a naming convention for names as accessions? No white space -- underscores versus CamelHump versus camelHump etc. A world of hurt as Jesse Ventura once put it ;) Cheers, Chris. P.S. I know none of the above are killer arguments, but maybe strawsForTheCamelsBack? Matthew Chambers wrote: > Brian Pratt wrote: >> To review: >> A) <cvParam cvLabel="MS" accession="MS:1000583" name="SRM spectrum" >> value=""/> >> >> B) <cvParam cvLabel="MS" accession="MS:1000035" name="spectrum type" >> value="SRM spectrum"/> >> >> C) <cvParam cvLabel="MS" categoryAccession=”MS:1000035” >> categoryName=”spectrum type” accession="MS:1000583" name="SRM >> spectrum" value=""/> >> >> I'd propose option D (or C+ if you prefer): >> >> <cvParam cvLabel="MS" categoryAccession=”MS:1000035” >> accession="MS:1000583" name="SRM spectrum" /> >> >> The category (I'd prefer "parent") name is redundant - the parser is >> going to use the accession number, and the human is going to get >> meaning from the name itself with the CV as a fallback. The value for >> "value" should be defaulted to "", it's just taking up space. >> >> Also, for eyeballing purposes it would be nice if the human readable >> part came first rather than last, if it's all the same to the parsers. >> And, I'd move the parent to the end since it's likely it won't be >> needed. So, >> >> <cvParam name="SRM spectrum" cvLabel="MS" accession="MS:1000583" >> parentAccession=”MS:1000035”/> >> >> - Brian > I agree that ordering the attributes the way you have them might be good > for convention and they should be that way in the examples, there's no > reason to actually require them to be in the order, is there? Also, to > add my proposal from the other post, I'll call it: > E) <cvParam cvLabel="MS" accession=”MS:1000035” name=”spectrum type” > valueAccession="MS:1000583" valueName="SRM spectrum"/> > > I feel rather strongly that the "name" of a "parameter" should not ever > be interpreted as a value. A "valueName" on the other hand, can be a > text description of the valueAccession which is what the parser will > usually care about. Additionally, this proposal allows the "accession" > attribute to consistently refer to a category, instead of sometimes > referring to a category and sometimes referring to a value, which is > counter-intuitive. > > Another thing to discuss with either C, D, or E, is what exactly is the > "category" accession going to refer to? In a previous post of yours > Brian, you wrote: >> Piling on with Mike, here: >> So the first thing any parser must do is load up the OBO file. In >> practice, such a software system will need to bundle an OBO in some >> fashion, in the extremely likely event that the OBO used by the mzML >> file in question is not present. Don't forget to update your distro >> each time the OBO gets updated, and make sure that in the event the >> OBO used by the mzML file IS present, you use that intead. >> Then, read: >> >> >> <cvParam cvLabel="MS" accession="MS:1000554" name="LCQ Deca" value=""/> >> >> then ask yourself, "whazzat?", and look up: >> >> id: MS:1000554 >> name: LCQ Deca >> def: "ThermoFinnigan LCQ Deca." [PSI:MS] >> is_a: MS:1000125 ! thermo finnigan >> >> which leads you to: >> >> id: MS:1000125 >> name: thermo finnigan >> def: "ThermoFinnigan from Thermo Electron Corporation" [PSI:MS] >> is_a: MS:1000483 ! thermo fisher scientific >> >> which leads you to: >> >> id: MS:1000483 >> name: thermo fisher scientific >> def: "Thermo Fisher Scientific. Also known as Thermo Finnigan >> corporation." [PSI:MS] >> related_synonym: "Thermo Scientific" [] >> is_a: MS:1000031 ! model by vendor >> >> which leads you to: >> >> id: MS:1000031 >> name: model by vendor >> def: "Instrument's model name (everything but the vendor's name) >> ---Free text ?" [PSI:MS] >> relationship: part_of MS:1000463 ! instrument description >> >> which leads you to: >> >> id: MS:1000463 >> name: instrument description >> def: "Device which performs a measurement." [PSI:MS] >> relationship: part_of MS:0000000 ! mzOntology >> >> aha! now populate the "instrument description" element in your database. >> > So the main category is MS:1000463, but MS:1000463 is not the parent of > MS:1000554 (it is an ancestor, but more specifically it is the root). > Intuitively, the category accession number should of course be the root > in this case, but will that always be the case? > > -Matt > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > -- ~~~~~~~~~~~~~~~~~~~~~~~~ chr...@eb... http://mibbi.sf.net/ ~~~~~~~~~~~~~~~~~~~~~~~~ |
From: Brian P. <bri...@in...> - 2007-10-04 21:11:34
|
Quite right, attribute order ought not to matter syntactically. Just a convention suggestion. I was thinking that the parentAcession would be the immediate parent in the inheritance tree so you could begin finding your way up to something you recognize (the root of the tree might be higher than you wanted to go, and finding your way down is even more annoying than finding your way up). Of course having the immediate parent's accession number is not much help if the parent isn't in the CV, but all we're really hoping to guard against here is failing in the case of things like the new "LCQ Deca Turbo" model coming out, when the data looks otherwise the same as that from the "LCQ Deca" model. There's no magic bullet for dealing with radical additions to the syntax - I think we're really just wrangling about how to deal with new enum values. It still kind of amazes me that this is a problem we're solving from scratch in a world with W3C schema in it, but I'm trying to play nice since the cvParam thing seems to have unstoppable inertia. I'd much prefer this: <InstrumentType name="LCQ Deca" accession="MS:1000554" /> - that's proper XML, to my mind, as opposed to merely valid XML, and it still leverages the power of the CV. A schema generated from and referring to the CV just doesn't seem like a problem - there's a schema in the CV crying to get out, in the form of the is_a and part_of data (and if there isn't, the CV is probably broken, so it's a useful exercise either way). - Brian -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Matthew Chambers Sent: Thursday, October 04, 2007 12:38 PM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] Option A, B, or C Brian Pratt wrote: > To review: > A) <cvParam cvLabel="MS" accession="MS:1000583" name="SRM spectrum" > value=""/> > > B) <cvParam cvLabel="MS" accession="MS:1000035" name="spectrum type" > value="SRM spectrum"/> > > C) <cvParam cvLabel="MS" categoryAccession="MS:1000035" > categoryName="spectrum type" accession="MS:1000583" name="SRM > spectrum" value=""/> > > I'd propose option D (or C+ if you prefer): > > <cvParam cvLabel="MS" categoryAccession="MS:1000035" > accession="MS:1000583" name="SRM spectrum" /> > > The category (I'd prefer "parent") name is redundant - the parser is > going to use the accession number, and the human is going to get > meaning from the name itself with the CV as a fallback. The value for > "value" should be defaulted to "", it's just taking up space. > > Also, for eyeballing purposes it would be nice if the human readable > part came first rather than last, if it's all the same to the parsers. > And, I'd move the parent to the end since it's likely it won't be > needed. So, > > <cvParam name="SRM spectrum" cvLabel="MS" accession="MS:1000583" > parentAccession="MS:1000035"/> > > - Brian I agree that ordering the attributes the way you have them might be good for convention and they should be that way in the examples, there's no reason to actually require them to be in the order, is there? Also, to add my proposal from the other post, I'll call it: E) <cvParam cvLabel="MS" accession="MS:1000035" name="spectrum type" valueAccession="MS:1000583" valueName="SRM spectrum"/> I feel rather strongly that the "name" of a "parameter" should not ever be interpreted as a value. A "valueName" on the other hand, can be a text description of the valueAccession which is what the parser will usually care about. Additionally, this proposal allows the "accession" attribute to consistently refer to a category, instead of sometimes referring to a category and sometimes referring to a value, which is counter-intuitive. Another thing to discuss with either C, D, or E, is what exactly is the "category" accession going to refer to? In a previous post of yours Brian, you wrote: > Piling on with Mike, here: > So the first thing any parser must do is load up the OBO file. In > practice, such a software system will need to bundle an OBO in some > fashion, in the extremely likely event that the OBO used by the mzML > file in question is not present. Don't forget to update your distro > each time the OBO gets updated, and make sure that in the event the > OBO used by the mzML file IS present, you use that intead. > Then, read: > > > <cvParam cvLabel="MS" accession="MS:1000554" name="LCQ Deca" value=""/> > > then ask yourself, "whazzat?", and look up: > > id: MS:1000554 > name: LCQ Deca > def: "ThermoFinnigan LCQ Deca." [PSI:MS] > is_a: MS:1000125 ! thermo finnigan > > which leads you to: > > id: MS:1000125 > name: thermo finnigan > def: "ThermoFinnigan from Thermo Electron Corporation" [PSI:MS] > is_a: MS:1000483 ! thermo fisher scientific > > which leads you to: > > id: MS:1000483 > name: thermo fisher scientific > def: "Thermo Fisher Scientific. Also known as Thermo Finnigan > corporation." [PSI:MS] > related_synonym: "Thermo Scientific" [] > is_a: MS:1000031 ! model by vendor > > which leads you to: > > id: MS:1000031 > name: model by vendor > def: "Instrument's model name (everything but the vendor's name) > ---Free text ?" [PSI:MS] > relationship: part_of MS:1000463 ! instrument description > > which leads you to: > > id: MS:1000463 > name: instrument description > def: "Device which performs a measurement." [PSI:MS] > relationship: part_of MS:0000000 ! mzOntology > > aha! now populate the "instrument description" element in your database. > So the main category is MS:1000463, but MS:1000463 is not the parent of MS:1000554 (it is an ancestor, but more specifically it is the root). Intuitively, the category accession number should of course be the root in this case, but will that always be the case? -Matt ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Matthew C. <mat...@va...> - 2007-10-04 21:27:51
|
I am starting to agree with Brian in that it seems that some of our requirements are mutually exclusive: - we want a schema that doesn't change -> thus we cannot represent the ever-changing semantics in the schema - we want a semantic validation tool -> thus we need the tool to keep up with the ever-changing semantics somehow, be it in the schema or some external mapping file, I don't see the difference! And what is the point of the schema itself if it doesn't capture the semantics of the specification? -Matt Brian Pratt wrote: > Quite right, attribute order ought not to matter syntactically. Just a > convention suggestion. > > I was thinking that the parentAcession would be the immediate parent in the > inheritance tree so you could begin finding your way up to something you > recognize (the root of the tree might be higher than you wanted to go, and > finding your way down is even more annoying than finding your way up). > > Of course having the immediate parent's accession number is not much help if > the parent isn't in the CV, but all we're really hoping to guard against > here is failing in the case of things like the new "LCQ Deca Turbo" model > coming out, when the data looks otherwise the same as that from the "LCQ > Deca" model. There's no magic bullet for dealing with radical additions to > the syntax - I think we're really just wrangling about how to deal with new > enum values. It still kind of amazes me that this is a problem we're > solving from scratch in a world with W3C schema in it, but I'm trying to > play nice since the cvParam thing seems to have unstoppable inertia. I'd > much prefer this: > <InstrumentType name="LCQ Deca" accession="MS:1000554" /> > - that's proper XML, to my mind, as opposed to merely valid XML, and it > still leverages the power of the CV. A schema generated from and referring > to the CV just doesn't seem like a problem - there's a schema in the CV > crying to get out, in the form of the is_a and part_of data (and if there > isn't, the CV is probably broken, so it's a useful exercise either way). > > - Brian > |
From: Angel P. <an...@ma...> - 2007-10-05 00:16:59
|
On 10/4/07, Brian Pratt <bri...@in...> wrote: > > It still kind of amazes me that this is a problem we're > solving from scratch in a world with W3C schema in it, but I'm trying to > play nice since the cvParam thing seems to have unstoppable inertia. I'd > much prefer this: > <InstrumentType name="LCQ Deca" accession="MS:1000554" /> > - that's proper XML, to my mind, as opposed to merely valid XML, and it > still leverages the power of the CV. Actually I would prefer that structure as well and asked on the list for folks to specifically outline places in the schema where this could happen: http://sourceforge.net/mailarchive/message.php?msg_name=e38f4b170708071310m76356fe5g3f81b5eff44ce2c6%40mail.gmail.com See the threads from 8/7 - 8/9 for the full discussion, but let me just put it out there that it is not too late to have these types of changes! That's what the public review process is for! I don't think we did a good enough job of communicating to folks that this type of typed CV structure was an option for schema change proposals. -angel |
From: Matt C. <mat...@va...> - 2007-10-05 01:39:12
|
Two potential problems with this structure: it drops either the value accession number or the category accession number, given that Brian suggested it I expect he intended the latter to be dropped and that the element name becomes the unique category name. It also eliminates the possibility of having synonyms for the category names, and we can't change the element/category name without breaking backward compatibility. I don't really mind about either of these problems, but I'm under the impression that others do mind. So what you're asking Angel is what places in the schema have a category cvParam that could be set in stone and not allowed to have synonym category names and thus converted into this structure instead? -Matt Angel Pizarro wrote: > On 10/4/07, *Brian Pratt* <bri...@in... > <mailto:bri...@in...>> wrote: > > It still kind of amazes me that this is a problem we're > solving from scratch in a world with W3C schema in it, but I'm > trying to > play nice since the cvParam thing seems to have unstoppable > inertia. I'd > much prefer this: > <InstrumentType name="LCQ Deca" accession="MS:1000554" /> > - that's proper XML, to my mind, as opposed to merely valid XML, > and it > still leverages the power of the CV. > > > Actually I would prefer that structure as well and asked on the list > for folks to specifically outline places in the schema where this > could happen: > > http://sourceforge.net/mailarchive/message.php?msg_name=e38f4b170708071310m76356fe5g3f81b5eff44ce2c6%40mail.gmail.com > > See the threads from 8/7 - 8/9 for the full discussion, but let me > just put it out there that it is not too late to have these types of > changes! That's what the public review process is for! I don't think > we did a good enough job of communicating to folks that this type of > typed CV structure was an option for schema change proposals. > > -angel |
From: Brian P. <bri...@in...> - 2007-10-05 01:46:46
|
I'll take a shot at auto-generating a schema from the OBO tomorrow. I'm curious to know if I'm just blowing smoke or not.. - Brian _____ From: psi...@li... [mailto:psi...@li...] On Behalf Of Angel Pizarro Sent: Thursday, October 04, 2007 5:17 PM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] Option A, B, or C On 10/4/07, Brian Pratt <bri...@in...> wrote: It still kind of amazes me that this is a problem we're solving from scratch in a world with W3C schema in it, but I'm trying to play nice since the cvParam thing seems to have unstoppable inertia. I'd much prefer this: <InstrumentType name="LCQ Deca" accession="MS:1000554" /> - that's proper XML, to my mind, as opposed to merely valid XML, and it still leverages the power of the CV. Actually I would prefer that structure as well and asked on the list for folks to specifically outline places in the schema where this could happen: http://sourceforge.net/mailarchive/message.php?msg_name=e38f4b170708071310m7 6356fe5g3f81b5eff44ce2c6%40mail.gmail.com See the threads from 8/7 - 8/9 for the full discussion, but let me just put it out there that it is not too late to have these types of changes! That's what the public review process is for! I don't think we did a good enough job of communicating to folks that this type of typed CV structure was an option for schema change proposals. -angel |
From: Brian P. <bri...@in...> - 2007-10-05 19:43:07
|
I think we have some early fruit from my messing around with OBO->W3C schema conversion. In the CV file <http://psidev.cvs.sourceforge.net/*checkout*/psidev/psi/psi-ms/mzML/control ledVocabulary/psi-ms.obo> http://psidev.cvs.sourceforge.net/*checkout*/psidev/psi/psi-ms/mzML/controll edVocabulary/psi-ms.obo there is exactly one term that claims both an is_a and part_of relationship: [Term] id: MS:1000246 name: delayed extraction def: "The application of the accelerating voltage pulse after a time delay in desorption ionization from a surface. The extraction delay can produce energy focusing in a time-of-flight mass spectrometer." [PSI:MS] exact_synonym: "DE" [] is_a: MS:1000462 ! ion optics relationship: part_of MS:1000456 ! precursor activation description Let's follow the inheritance chains: MS:1000246 "delayed extraction" is_a MS:1000462 "ion optics" part_of MS:1000463 "instrument description" part_of MS:0000000 "MZ controlled vocabularies" And also, MS:1000246 "delayed extraction" part_of MS:1000456 "precursor activation description" part_of MS:1000442 "spectrum" part_of MS:0000000 "MZ controlled vocabularies" So: A is a kind of B A is a part of C B is not a part of C This would appear to violate the transitive property of the is_a and part_of relationships. Normally in discussing inheritance one views "is a" and "has a" (or in the topsy-turvy world of OBO, "part of") as being distinct and mutually exclusive ideas. Actually the format itself is a bit of a surprise, I had anticipated "is_a" being an enumerated type of "relationship" as "part_of" is. If this MS:1000246 is simply a victim of a clerical error, as I suspect it is, then a tidier representation of inheritance would have helped catch the problem sooner. - Brian |
From: Chris T. <chr...@eb...> - 2007-10-06 00:11:18
|
Actually I think the problem here is overloading of a term -- the thing is used in two different ways -- there is a description of the physical reality of the ion source (it does DE) and there is a term in a description -- really the problem here is that what is implied is that either a datum is part of the ion optics of the physical instance of a mass spec, or that a description (an abstract that can be manifest in files or whatever) contains a physical entity (DE-source bits). I think that's it anyway. So really the issue is the combination of two related but different things in one concept. Am I right? Brian Pratt wrote: > I think we have some early fruit from my messing around with OBO->W3C > schema conversion. > > > > In the CV file > http://psidev.cvs.sourceforge.net/*checkout*/psidev/psi/psi-ms/mzML/controlledVocabulary/psi-ms.obo > there is exactly one term that claims both an is_a and part_of relationship: > > > > [Term] > > id: MS:1000246 > > name: delayed extraction > > def: "The application of the accelerating voltage pulse after a time > delay in desorption ionization from a surface. The extraction delay can > produce energy focusing in a time-of-flight mass spectrometer." [PSI:MS] > > exact_synonym: "DE" [] > > is_a: MS:1000462 ! ion optics > > relationship: part_of MS:1000456 ! precursor activation description > > > > Let's follow the inheritance chains: > > > > MS:1000246 "delayed extraction" is_a > > MS:1000462 "ion optics" part_of > > MS:1000463 "instrument description" part_of > > MS:0000000 "MZ controlled vocabularies" > > > > And also, > > > > MS:1000246 "delayed extraction" part_of > > MS:1000456 "precursor activation description" part_of > > MS:1000442 "spectrum" part_of > > MS:0000000 "MZ controlled vocabularies" > > > > So: > > A is a kind of B > > A is a part of C > > B is not a part of C > > > > This would appear to violate the transitive property of the is_a and > part_of relationships. Normally in discussing inheritance one views “is > a” and “has a” (or in the topsy-turvy world of OBO, “part of”) as being > distinct and mutually exclusive ideas. > > > > Actually the format itself is a bit of a surprise, I had anticipated > “is_a” being an enumerated type of “relationship” as “part_of” is. If > this MS:1000246 is simply a victim of a clerical error, as I suspect it > is, then a tidier representation of inheritance would have helped catch > the problem sooner. > > > > - Brian > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > > ------------------------------------------------------------------------ > > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev -- ~~~~~~~~~~~~~~~~~~~~~~~~ chr...@eb... http://mibbi.sf.net/ ~~~~~~~~~~~~~~~~~~~~~~~~ |