Thread: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

Status: Beta

Brought to you by: aceol, baranda, cccolinc, chrisftaylor, and 22 others

psidev-ms-dev

[Psidev-ms-dev] mzML 0.99.0 submitted to document process

From: Eric D. <ede...@sy...> - 2007-10-02 22:32:26

Hi everyone, I am happy to announce that the mzML 0.99.0 specification
document has been submitted to the PSI document process. This is an
important milestone in the completion of mzML, but it is most certainly
not the end of development and feedback.

=20

The specification document and all related materials are publicly
available at:

=20

http://psidev.info/index.php?q=3Dnode/257

=20

There are various kits of instance documents, xsds, the controlled
vocabulary, validators, etc. listed at that site. Please examine and
respond.

=20

The actual specification document is posted at:

=20

http://psidev.info/index.php?q=3Dnode/300

=20

You may post comments at that site, or you may send them to this list.
We addressed nearly all issues brought up in the preview period in
August. The one main issue that remains unresolved is the problem of
cvParams and how to handle the inevitable scenario of new terms and
older software. This is an important issue. There is a discussion of it
in the specification document. Your input is sought.

=20

We encourage you to begin developing (or adapting) software that
implements the format if you are comfortable knowing that there will be
changes before the 1.0.0 release. I believe that it is primarily by
attempting to implement the format that the community will test the
format most rigorously and reveal issues that still need to be resolved;
this is far more effective than gazing at the specification document.

=20

Regards,

Eric

=20

=20

----------------------------------

Eric Deutsch, Ph.D.
Institute for Systems Biology
1441 North 34th Street
Seattle WA 98103
Tel: 206-732-1397
Fax: 206-732-1260
Email: ede...@sy...
WWW:
http://www.systemsbiology.org/Senior_Research_Scientists/Eric_Deutsch

=20

Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

From: Brian P. <bri...@in...> - 2007-10-03 16:11:27

Looks like most commenting happens on this list, so here goes:

 

>From the spec:

 

"The mzData format was a far more flexible format than mzXML. The support of
new technologies could be added to mzData files by adding new controlled
vocabulary terms, while mzXML often required a full schema revision. This is
evidenced by mzData still at version 1.05 while mzXML is currently at
version 3.1. However, mzData did suffer from a problem of inconsistently
used vocabulary terms and there appeared several different dialects of
mzData, encoding the same information in subtly different ways. This was not
usually a problem for human inspection of the file, but caused difficulty
writing and maintaining reader software."

 

This is specious.  The fact that mzData hasn't revved only says to me that
it's badly underspecified, which the paragraph in fact goes on to
illustrate.  The occasional revision of the mzXML schema, to my mind,
indicates a well maintained standard*.  A stable schema and evolving
ontology produce as much or more reader/writer code maintenance work as an
evolving schema-only does.  It's not like mzData readers don't have to be
updated every time something gets added to the ontology.  At least with a
schema there are ways to generate code for these kinds of changes
automatically, and to easily validate the results.  Frankly when it comes to
data formats I think the term "flexible" is synonymous for "trouble" -
convenient for the writers, hell for the readers, and often a dead end for
that reason.

 

I really think mzML will just perpetuate the issues mzData presented.
Better we should figure out a way to generate a proper XML schema based on
the ontology document.  The rest of the world uses proper XML, I really
don't see what makes us special.

 

Well, hey, you asked.

 

- Brian

 

*note that most of the mzXML revisions had to do with things like adding
data compression to peaklists.  It wasn't getting banged around every time
somebody came out with a new mass spec, like the ontology will.

 

  _____  

From: psi...@li...
[mailto:psi...@li...] On Behalf Of Eric
Deutsch
Sent: Tuesday, October 02, 2007 3:32 PM
To: psi...@li...
Cc: Eric Deutsch
Subject: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

 

Hi everyone, I am happy to announce that the mzML 0.99.0 specification
document has been submitted to the PSI document process. This is an
important milestone in the completion of mzML, but it is most certainly not
the end of development and feedback.

 

The specification document and all related materials are publicly available
at:

 

http://psidev.info/index.php?q=node/257

 

There are various kits of instance documents, xsds, the controlled
vocabulary, validators, etc. listed at that site. Please examine and
respond.

 

The actual specification document is posted at:

 

http://psidev.info/index.php?q=node/300

 

You may post comments at that site, or you may send them to this list. We
addressed nearly all issues brought up in the preview period in August. The
one main issue that remains unresolved is the problem of cvParams and how to
handle the inevitable scenario of new terms and older software. This is an
important issue. There is a discussion of it in the specification document.
Your input is sought.

 

We encourage you to begin developing (or adapting) software that implements
the format if you are comfortable knowing that there will be changes before
the 1.0.0 release. I believe that it is primarily by attempting to implement
the format that the community will test the format most rigorously and
reveal issues that still need to be resolved; this is far more effective
than gazing at the specification document.

 

Regards,

Eric

 

 

----------------------------------

Eric Deutsch, Ph.D.
Institute for Systems Biology
1441 North 34th Street
Seattle WA 98103
Tel: 206-732-1397
Fax: 206-732-1260
Email: ede...@sy...
WWW: http://www.systemsbiology.org/Senior_Research_Scientists/Eric_Deutsch

Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

From: Lennart M. <len...@gm...> - 2007-10-04 10:36:06

Hi Brian,


> This is specious.  The fact that mzData hasn’t revved only says to me 
> that it’s badly underspecified, which the paragraph in fact goes on to 
> illustrate.  The occasional revision of the mzXML schema, to my mind, 
> indicates a well maintained standard*.  A stable schema and evolving 
> ontology produce as much or more reader/writer code maintenance work as 
> an evolving schema-only does.

PRIDE has a stable schema, yet a rapidly evolving CV. We did not need to 
recode PRIDE whenever we changed the CV. So from experience: a stable 
schema + evolving (but initially well-organized) CV is not a problem in 
terms of maintenance. Having to redo the schema every other month is 
also possible, but nevertheless more hassle.

> It’s not like mzData readers don’t have 
> to be updated every time something gets added to the ontology.  At least 
> with a schema there are ways to generate code for these kinds of changes 
> automatically, and to easily validate the results.  Frankly when it 
> comes to data formats I think the term “flexible” is synonymous for 
> “trouble” – convenient for the writers, hell for the readers, and often 
> a dead end for that reason.

Let me make a black and white scenario for you - you have everything as 
attributes in the schema, and you auto-generate parsing code every week 
since you keep adding or changing attributes. Fine, no worries. Zero 
backwards compatibility, but hey - who cares about yesterdays data, 
right? And your generated code will swallow anything that is remotely 
using the right glyphs in those attributes (e.g.: 'I'm not providing 
sensible information here' as the value for the 'instrument_name' 
attribute). If your objective is convenience for the programmers (whose 
job it should be to program), you choose the 'everything in schema' 
path. If your objective is to transmit meaningful and 
validated/validatable data, you go the current mzML path. Now which one 
would make the most sense for a standard?

> I really think mzML will just perpetuate the issues mzData presented.  
> Better we should figure out a way to generate a proper XML schema based 
> on the ontology document.  The rest of the world uses proper XML, I 
> really don’t see what makes us special.

I do not believe that (a) mzData presents more issues than uses, (b) 
even if (a) were true, that mzML blatantly propagates these, and (c) 
that starting from scratch with a far too rigid, implicitly 
non-backwards compatible and unvalidatable (content-wise, which is where 
it matters) data transmission format is the way to go forward.


> *note that most of the mzXML revisions had to do with things like adding 
> data compression to peaklists.  It wasn’t getting banged around every 
> time somebody came out with a new mass spec, like the ontology will.

mzML will not get 'banged about' every time a new mass spec is added. 
That is the whole point. Please do try to understand the relatively 
simple concept - an addition to the instruments is completely and 
utterly transparant.


Cheers,

lnnrt.

Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

From: Marc S. <st...@in...> - 2007-10-04 08:06:21

Hi all,

first of all i would like to thank Eric and all the others in the 
working group for their effort.
Here are my comments:

(1) The new CV term problem
A is clear and simple.
B is simply a bad idea in my opinion. Why not use the child accession if 
we have it?
C helps the software to know where the new term belongs, but the 
software does not know what to do with it in most cases. I think most of 
software implements these enum-like CV terms as enum types and thus 
cannot handle new values anyway. Additionally it is error prone 
(mismatching parent and child).

As C is an extension of A, i vote for A or C, but i don't think that C 
helps very much.

(2) Semantic validator
The semantic validator is a nice feature, but i think you must publish a 
file that defines the mapping of CV terms to the schema.
This file must answer questions like: Where can i use which term? How 
often can i repeat a term? etc.
With the heavy use of CV terms such a file is a non-optional part of the 
format definition.
What happened to that format Luisa proposed?

(3) Comments to CV / Schema
- The term MS:1000543 "data processing action" is missing some child 
terms i think. What about smoothing, baseline reduction and  removal low 
intensity data points?
- Putting the software name in a CV will cause much trouble i think. 
Where are way to many upcoming tools and you will be constantly updating 
that obo file. I really think we should put that into a string attribute
- I would add a new optional and unbounded element "parameter" with 
attributes "name", "type", value" to the dx:dataProcessing element to 
store the parameters of the software that were used for processing.

(4) General
Finally i'd like to say that i agree with Brian Pratt. There is too much 
CV and too little XML in the format for my taste.
I don't argue against CV in general it's a nice technique that allows 
the schema to be stable for a long time.
But now everything is in the CV and there are hardly any XML attributes 
left. This makes the format hard to implement and impossible to check 
with an XML validator.
And i don't see the advantage in most cases: I have to adapt the 
software to new terms just as i would adapt it to new XML elements.

Best regards,
  Marc

Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

From: Lennart M. <len...@gm...> - 2007-10-04 10:45:24

Hi Marc,


> (2) Semantic validator
> The semantic validator is a nice feature, but i think you must publish a 
> file that defines the mapping of CV terms to the schema.
> This file must answer questions like: Where can i use which term? How 
> often can i repeat a term? etc.
> With the heavy use of CV terms such a file is a non-optional part of the 
> format definition.
> What happened to that format Luisa proposed?

It is included :). Look in the 'ms-mapping.xml' file. It is (quite 
literally so) Luisa's file. The whole validator relies on a role-based 
'separation of concerns', so that the application is nearly 100% 
dynamically configured. It is a nice piece of work that we are currently 
writing up in order to publish it. Meanwhile, I'd be happy to provide 
more information on how the whole thing works. Just let me know what you 
want to learn.

> (4) General
> Finally i'd like to say that i agree with Brian Pratt. There is too much 
> CV and too little XML in the format for my taste.
> I don't argue against CV in general it's a nice technique that allows 
> the schema to be stable for a long time.
> But now everything is in the CV and there are hardly any XML attributes 
> left. This makes the format hard to implement and impossible to check 
> with an XML validator.
> And i don't see the advantage in most cases: I have to adapt the 
> software to new terms just as i would adapt it to new XML elements.

If you could use software that answered simple CV questions like 'what 
is the parent of X', or 'get children for X', or 'is X one of the 
children of Y (optionally with maximum Z generations)' (for instance); 
and if this software is on the net and always up-to-date, would that 
still mean you always have to redo everything?
I at least wouldn't expect so. It just requires a new way of dealing 
with the content of the file (which again, is what matters). Also 
remember that the semantic validator, in series after a schema 
validator, provides maximum validation for a file like an mzML file - 
both structure and content are thoroughly verfied (and nearly 100% 
dynamically configured - zero recoding necessary when new children get 
added, for instance).


Cheers,

lnnrt.

Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

From: Jones, A. [jonesar] <And...@li...> - 2007-10-04 10:40:05

Hi all,
The decision about how to implement CV terms is pretty important and we =
should try to come up with a coherent policy across PSI if possible. =
Here are my thoughts:

A while back Luisa and myself drafted a proposal for mapping model =
elements to CV terms that may simplify some of the problems currently =
being worked through. The draft and sample instance are here: =
http://www.psidev.info/index.php?q=3Dnode/159 (see Mapping between =
exchange schema and CVs).

I would strongly vote for option A, and in addition maintain a mapping =
file. This is more work for the CV coordinators (but hopefully can be =
mainly automated), and would force software implementers to interact =
with the CV WG when they need new terms, but given the heavy reliance on =
CV terms in the mzML schema I see no way around this.=20

If a mapping file is kept updated in parallel to the CV, software can =
check whether a valid term has been provided for a particular model =
element. In the example of spectrumType, the mapping file would specify =
that only child terms of spectrumType are allowed (e.g. for the model =
element fileContent). If a vendor publishes a file with:

<fileContent>
	<cvParam cvLabel=3D"MS" accession=3D"MS:9999999" name=3D"SRM spectrum" =
value=3D""/>
</fileContent>

This would automatically be rejected by the validator (or at least a =
warning output), as it should be, since there's no point having a CV =
where the terms are not controlled! =20

Option B <cvParam cvLabel=3D"MS" accession=3D"MS:1000035" =
name=3D"spectrum type" value=3D"SRM spectrum"/> looks particular bad to =
me, since there is no check that correct values are given. As was =
mentioned elsewhere on the list, you run into problems with upper/lower =
case, spacing etc. If software is going to rely on particular values =
being present, those values must be in the CV with persistent =
identifiers.=20

I believe OBO does not have the ability to distinguish between =
ontological classes (i.e. there as branch structure) and =
instances/individuals (i.e. leaf nodes used as values to annotate data). =
Again, this could be handled by the mapping file that specifies which =
terms can be used to annotate model elements.

A related point, in mzData, there is inconsistent usage of the value =
slot, since the specification has no ability to say whether a value (and =
a unit) should be given or not e.g. for term "sample mass (MS:1000004)" =
software should know that a value and unit must be given. It is =
reasonable that software should be able to check whether to expect a =
value or not for particular CV terms. Logically, this should be part of =
the CV itself, but as far as I'm aware OBO does not have this =
capability. One solution would be to add this to the mapping file as two =
Booleans on the cvTerm (allowsValue =3D "true/false" and requiresUnit =
=3D "true/false").

Cheers
Andy







> -----Original Message-----
> From: psi...@li... =
[mailto:psidev-ms-dev-
> bo...@li...] On Behalf Of Marc Sturm
> Sent: 04 October 2007 09:06
> To: psi...@li...
> Subject: Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process
>=20
> Hi all,
>=20
> first of all i would like to thank Eric and all the others in the
> working group for their effort.
> Here are my comments:
>=20
> (1) The new CV term problem
> A is clear and simple.
> B is simply a bad idea in my opinion. Why not use the child accession =
if
> we have it?
> C helps the software to know where the new term belongs, but the
> software does not know what to do with it in most cases. I think most =
of
> software implements these enum-like CV terms as enum types and thus
> cannot handle new values anyway. Additionally it is error prone
> (mismatching parent and child).
>=20
> As C is an extension of A, i vote for A or C, but i don't think that C
> helps very much.
>=20
> (2) Semantic validator
> The semantic validator is a nice feature, but i think you must publish =
a
> file that defines the mapping of CV terms to the schema.
> This file must answer questions like: Where can i use which term? How
> often can i repeat a term? etc.
> With the heavy use of CV terms such a file is a non-optional part of =
the
> format definition.
> What happened to that format Luisa proposed?
>=20
> (3) Comments to CV / Schema
> - The term MS:1000543 "data processing action" is missing some child
> terms i think. What about smoothing, baseline reduction and  removal =
low
> intensity data points?
> - Putting the software name in a CV will cause much trouble i think.
> Where are way to many upcoming tools and you will be constantly =
updating
> that obo file. I really think we should put that into a string =
attribute
> - I would add a new optional and unbounded element "parameter" with
> attributes "name", "type", value" to the dx:dataProcessing element to
> store the parameters of the software that were used for processing.
>=20
> (4) General
> Finally i'd like to say that i agree with Brian Pratt. There is too =
much
> CV and too little XML in the format for my taste.
> I don't argue against CV in general it's a nice technique that allows
> the schema to be stable for a long time.
> But now everything is in the CV and there are hardly any XML =
attributes
> left. This makes the format hard to implement and impossible to check
> with an XML validator.
> And i don't see the advantage in most cases: I have to adapt the
> software to new terms just as i would adapt it to new XML elements.
>=20
> Best regards,
>   Marc
>=20
> =
-------------------------------------------------------------------------=

> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a =
browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> _______________________________________________
> Psidev-ms-dev mailing list
> Psi...@li...
> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev

Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

From: Lennart M. <len...@gm...> - 2007-10-04 10:53:35

Hi Andy,


> The decision about how to implement CV terms is pretty important and we should try to come up with a coherent policy across PSI if possible. Here are my thoughts:
> 
> A while back Luisa and myself drafted a proposal for mapping model elements to CV terms that may simplify some of the problems currently being worked through. The draft and sample instance are here: http://www.psidev.info/index.php?q=node/159 (see Mapping between exchange schema and CVs).
> 
> I would strongly vote for option A, and in addition maintain a mapping file. This is more work for the CV coordinators (but hopefully can be mainly automated), and would force software implementers to interact with the CV WG when they need new terms, but given the heavy reliance on CV terms in the mzML schema I see no way around this. 
> 
> If a mapping file is kept updated in parallel to the CV, software can check whether a valid term has been provided for a particular model element. In the example of spectrumType, the mapping file would specify that only child terms of spectrumType are allowed (e.g. for the model element fileContent). If a vendor publishes a file with:
> 
> <fileContent>
> 	<cvParam cvLabel="MS" accession="MS:9999999" name="SRM spectrum" value=""/>
> </fileContent>
> 
> This would automatically be rejected by the validator (or at least a warning output), as it should be, since there's no point having a CV where the terms are not controlled!  

That mapping file is effectively in use by our mzML semantic validator, 
for exactly the reasons you outlined above!
So yes - this has been made available in the larger mzML kit and has 
also been implemented online (your above example indeed does not validate).


Cheers,

lnnrt.

Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

From: Angel P. <an...@ma...> - 2007-10-04 13:22:05

so where this  mzML kit that you mention? With the OLS? -angel

On 10/4/07, Lennart Martens <len...@gm...> wrote:
>
>
> So yes - this has been made available in the larger mzML kit and has
> also been implemented online (your above example indeed does not
> validate).
>
>

Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

From: Matthew C. <mat...@va...> - 2007-10-04 17:05:45

I'll comment here on the mzML schema and validation of mzML instances.  
I do not see why a proper XML schema with semantic significance could 
not be generated for mzML.  XML schema have the capability to provide 
robust restrictions on both elements and attributes, and such a schema 
could be automatically generated from the CV itself (when combined with 
a skeleton model of mzML).  Some people complain that mzML is not true 
XML.  That's rather misleading.  Others say it needs a special 
"semantic" validator with its own mapping file.  I say that is 
duplicative and even overkill.  Existing schema technology can handle 
the format specified here, but I grant that the schema WILL have to be 
very complicated (you won't just have a single cvParam type or 
ParamGroupType, each part of the schema will have its own cvParam 
elements with semantically relevant restrictions on the accession 
numbers) and almost certainly should be machine-generated.  I see 
nothing wrong with a complicated schema though, because the variety of 
data that we are intending to represent is also very complicated!  I 
don't know if existing automatic code generators work for very 
complicated schema, but the automatic XML validators definitely should 
and thus the need for a separate "semantic" validator is unclear to me 
when the semantic relationships can be encapsulated in an automatically 
generated XML schema.  For example, the <contact> element could be 
defined semantically in XML schema like this:

<xs:complexType name="ContactParamGroupType">
	<xs:sequence>
		<xs:element name="paramGroupRef" type="dx:ContactParamGroupRefType" minOccurs="0" maxOccurs="unbounded"/>

		<xs:element name="cvParam" minOccurs="0" maxOccurs="1">
			<xs:complexType>
				<xs:attribute name="cvLabel" type="xs:string">
					<xs:restriction base="xs:string">
						<xs:pattern value="MS"/>
					</xs:restriction>
				</xs:attribute>
				<xs:attribute name="accession" type="xs:string">
					<xs:restriction base="xs:string">
						<xs:pattern value="MS:1000586"/>
					</xs:restriction>
				</xs:attribute>
				<xs:attribute name="name" type="xs:string">
					<xs:restriction base="xs:string">
						<xs:pattern value="contact name"/>
					</xs:restriction>
				</xs:attribute>
				<xs:attribute name="value" type="xs:string"/>
			</xs:complexType>
		</xs:element>

		<xs:element name="cvParam" minOccurs="0" maxOccurs="1">
			<xs:complexType>
				<xs:attribute name="cvLabel" type="xs:string">
					<xs:restriction base="xs:string">
						<xs:pattern value="MS"/>
					</xs:restriction>
				</xs:attribute>
				<xs:attribute name="accession" type="xs:string">
					<xs:restriction base="xs:string">
						<xs:pattern value="MS:1000587"/>
					</xs:restriction>
				</xs:attribute>
				<xs:attribute name="name" type="xs:string">
					<xs:restriction base="xs:string">
						<xs:pattern value="contact address"/>
					</xs:restriction>
				</xs:attribute>
				<xs:attribute name="value" type="xs:string"/>
			</xs:complexType>
		</xs:element>

		<xs:element name="cvParam" minOccurs="0" maxOccurs="1">
			<xs:complexType>
				<xs:attribute name="cvLabel" type="xs:string">
					<xs:restriction base="xs:string">
						<xs:pattern value="MS"/>
					</xs:restriction>
				</xs:attribute>
				<xs:attribute name="accession" type="xs:string">
					<xs:restriction base="xs:string">
						<xs:pattern value="MS:1000588"/>
					</xs:restriction>
				</xs:attribute>
				<xs:attribute name="name" type="xs:string">
					<xs:restriction base="xs:string">
						<xs:pattern value="contact URL"/>
					</xs:restriction>
				</xs:attribute>
				<xs:attribute name="value" type="xs:anyURI"/>
			</xs:complexType>
		</xs:element>

		<xs:element name="cvParam" minOccurs="0" maxOccurs="1">
			<xs:complexType>
				<xs:attribute name="cvLabel" type="xs:string">
					<xs:restriction base="xs:string">
						<xs:pattern value="MS"/>
					</xs:restriction>
				</xs:attribute>
				<xs:attribute name="accession" type="xs:string">
					<xs:restriction base="xs:string">
						<xs:pattern value="MS:1000589"/>
					</xs:restriction>
				</xs:attribute>
				<xs:attribute name="name" type="xs:string">
					<xs:restriction base="xs:string">
						<xs:pattern value="contact email"/>
					</xs:restriction>
				</xs:attribute>
				<xs:attribute name="value" type="dx:email"/>
			</xs:complexType>
		</xs:element>

		<xs:element name="userParam" type="dx:UserParamType" minOccurs="0" maxOccurs="unbounded"/>
	</xs:sequence>
</xs:complexType>

<xs:element name="contact" type="dx:ContactParamGroupType" minOccurs="0" maxOccurs="unbounded"/>

Like I said, this needs to be machine generated, but it would create a 
XML schema that removes the need for any other kind of semantic mapping 
and any new tool to do the validation with that mapping.

Now that I think about it again, this kind of often-updated schema would 
violate the unchangedness requirement from the specification: "It was 
hoped that the actual xsd schema could remain stable for many years 
while the accompanying controlled vocabulary could be frequently updated 
to support new technologies, instruments, and methods of acquiring 
data."  But what is the different between a frequently updated mapping 
file which is REQUIRED to get semantic validation, and a frequently 
updated primary schema which is REQUIRED to get semantic validation?

-Matt

Lennart Martens wrote:
> That mapping file is effectively in use by our mzML semantic validator, 
> for exactly the reasons you outlined above!
> So yes - this has been made available in the larger mzML kit and has 
> also been implemented online (your above example indeed does not validate).
>
>

Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

From: Angel P. <an...@ma...> - 2007-10-04 17:59:32

On 10/4/07, Matthew Chambers <mat...@va...> wrote:
>
> I'll comment here on the mzML schema and validation of mzML instances.
> I do not see why a proper XML schema with semantic significance could
> not be generated for mzML.  XML schema have the capability to provide
> robust restrictions on both elements and attributes, and such a schema
> could be automatically generated from the CV itself (when combined with
> a skeleton model of mzML).



This is an interesting idea, but as you mention below there are no tools for
doing this, so if you have a CS masters student available .... ;)

Some people complain that mzML is not true
> XML.  That's rather misleading.


+1 on that. mzML is valid and real XML. It just isn't using the enumerated
values of XML.

-angel

Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

From: Lennart M. <len...@gm...> - 2007-10-04 22:21:07

Hi Matt,

> But what is the different between a frequently updated mapping 
> file which is REQUIRED to get semantic validation, and a frequently 
> updated primary schema which is REQUIRED to get semantic validation?

The fact that the mapping file most often does not need to be updated to 
operate correctly after CV changes, since it is based on the CV 
structure (term-to-term links) rather than the actual accession numbers. 
Indeed, for many CV param elements, the required (allowed) accession 
numbers for that alement are not even in the cv mapping.


Cheers,

lnnrt.

Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

From: Brian P. <bri...@in...> - 2007-10-04 22:47:39

Hi Lennart,

I'm not sure I understand, but my guess is that what's being said here is
that most CV additions are just leaves on the inheritance tree, along the
lines of our example of the introduction of "Super Ion Trap Turbo", and are
minimally disruptive.  Such additions would be minimally disruptive to a W3C
schema as well, as long as it doesn't bother with restriction elements for
things like instrument names, which it really shouldn't (it's not an error
to come up with a new instrument name value).  Thus the addition of
instrument type "Super Ion Trap Turbo" to the CV would not provoke a rev of
the the W3C schema, so that's nothing to worry about if we went that route.


Come to think of it, it sounds a bit like that mapping file is just another
dialect of schema?  Maybe we're nearly there already.

But I'm pretty sure I didn't understand... perhaps an example would help?

Thanks,

Brian



-----Original Message-----
From: psi...@li...
[mailto:psi...@li...] On Behalf Of Lennart
Martens
Sent: Thursday, October 04, 2007 3:21 PM
To: Matthew Chambers
Cc: psi...@li...
Subject: Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

Hi Matt,

> But what is the different between a frequently updated mapping 
> file which is REQUIRED to get semantic validation, and a frequently 
> updated primary schema which is REQUIRED to get semantic validation?

The fact that the mapping file most often does not need to be updated to 
operate correctly after CV changes, since it is based on the CV 
structure (term-to-term links) rather than the actual accession numbers. 
Indeed, for many CV param elements, the required (allowed) accession 
numbers for that alement are not even in the cv mapping.


Cheers,

lnnrt.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Psidev-ms-dev mailing list
Psi...@li...
https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev

Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

From: Matt C. <mat...@va...> - 2007-10-04 23:20:20

I think I may understand him.  However, as far as I know there ARE 
supposed to be restriction elements for instrument names (otherwise you 
wouldn't have a valid accession number; although like I've already 
suggested, we could have a special accession number to mean 'not yet in 
CV' or 'CV entry pending').

With the external mapping file, they've got the following logic:
 > Given our current parser state in the "spectrum description" section 
of a spectrum, make sure all cvParams in this section have an accession 
number in the CV that pertains to describing the spectrum, e.g. the 
accession number for "SRM Spectrum."

It can get more specific than that, of course.  So the mapping file 
could stay the same when terms are added, it would only need to be 
changed when the schema's structure changed.  As far as I know, with an 
XML schema, there is no way to create an enumeration dynamically, i.e. 
for a cvParam in the spectrum description section:
<xs:restriction><-- dynamically restrict to accession numbers in CV 
related to spectrum description --></xs:restriction>
If I understand this right, I still don't get the advantage.  What do we 
gain by having a stable mapping file which dynamically restricts by 
looking up to the CV, versus a machine-generated schema which is 
automatically updated every time the CV changes?  In both cases, you 
can't remove terms from the CV without breaking backward compatibility, 
but otherwise you should be fine.  The only changes between schema 
versions would be changes to the <xs:restriction> enumerations that 
define which accession numbers can appear where.

-Matt

Brian Pratt wrote:
> Hi Lennart,
>
> I'm not sure I understand, but my guess is that what's being said here is
> that most CV additions are just leaves on the inheritance tree, along the
> lines of our example of the introduction of "Super Ion Trap Turbo", and are
> minimally disruptive.  Such additions would be minimally disruptive to a W3C
> schema as well, as long as it doesn't bother with restriction elements for
> things like instrument names, which it really shouldn't (it's not an error
> to come up with a new instrument name value).  Thus the addition of
> instrument type "Super Ion Trap Turbo" to the CV would not provoke a rev of
> the the W3C schema, so that's nothing to worry about if we went that route.
>
>
> Come to think of it, it sounds a bit like that mapping file is just another
> dialect of schema?  Maybe we're nearly there already.
>
> But I'm pretty sure I didn't understand... perhaps an example would help?
>
> Thanks,
>
> Brian
>
>
>
> -----Original Message-----
> From: psi...@li...
> [mailto:psi...@li...] On Behalf Of Lennart
> Martens
> Sent: Thursday, October 04, 2007 3:21 PM
> To: Matthew Chambers
> Cc: psi...@li...
> Subject: Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process
>
> Hi Matt,
>
>   
>> But what is the different between a frequently updated mapping 
>> file which is REQUIRED to get semantic validation, and a frequently 
>> updated primary schema which is REQUIRED to get semantic validation?
>>     
>
> The fact that the mapping file most often does not need to be updated to 
> operate correctly after CV changes, since it is based on the CV 
> structure (term-to-term links) rather than the actual accession numbers. 
> Indeed, for many CV param elements, the required (allowed) accession 
> numbers for that alement are not even in the cv mapping.
>
>
> Cheers,
>
> lnnrt.
>

Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

From: Brian P. <bri...@in...> - 2007-10-05 00:05:46

Hi Matt,

You're right, to get complete automated validation from standard XML
handling tools you'd want to employ restriction elements in the schema.  So,
yes, the schema would officially rev every time the CV officially did, which
makes sense as it's a tool for checking CV conformance.  And in the end,
stability of the schema isn't the goal - stability of the code that deals
with the data format is the goal.  For the kind of leaf-level CV changes
we're talking about, most parsers would *not* change since they do not in
general bother with validating against restriction lists for performance
reasons.  As such parsers would also function perfectly well on most data
that anticipate official CV+schema updates.  And, the mzML format would be
more compact and much more human readable.

This external CV mapping file sounds like an artifact that could just as
readily be derived on the fly by examining the is_a and part_of fields in
the CV itself, yes?  Have you got a URL for an example?

Brian

-----Original Message-----
From: psi...@li...
[mailto:psi...@li...] On Behalf Of Matt
Chambers
Sent: Thursday, October 04, 2007 4:18 PM
To: Mass spectrometry standard development
Subject: Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

I think I may understand him.  However, as far as I know there ARE 
supposed to be restriction elements for instrument names (otherwise you 
wouldn't have a valid accession number; although like I've already 
suggested, we could have a special accession number to mean 'not yet in 
CV' or 'CV entry pending').

With the external mapping file, they've got the following logic:
 > Given our current parser state in the "spectrum description" section 
of a spectrum, make sure all cvParams in this section have an accession 
number in the CV that pertains to describing the spectrum, e.g. the 
accession number for "SRM Spectrum."

It can get more specific than that, of course.  So the mapping file 
could stay the same when terms are added, it would only need to be 
changed when the schema's structure changed.  As far as I know, with an 
XML schema, there is no way to create an enumeration dynamically, i.e. 
for a cvParam in the spectrum description section:
<xs:restriction><-- dynamically restrict to accession numbers in CV 
related to spectrum description --></xs:restriction>
If I understand this right, I still don't get the advantage.  What do we 
gain by having a stable mapping file which dynamically restricts by 
looking up to the CV, versus a machine-generated schema which is 
automatically updated every time the CV changes?  In both cases, you 
can't remove terms from the CV without breaking backward compatibility, 
but otherwise you should be fine.  The only changes between schema 
versions would be changes to the <xs:restriction> enumerations that 
define which accession numbers can appear where.

-Matt

Brian Pratt wrote:
> Hi Lennart,
>
> I'm not sure I understand, but my guess is that what's being said here is
> that most CV additions are just leaves on the inheritance tree, along the
> lines of our example of the introduction of "Super Ion Trap Turbo", and
are
> minimally disruptive.  Such additions would be minimally disruptive to a
W3C
> schema as well, as long as it doesn't bother with restriction elements for
> things like instrument names, which it really shouldn't (it's not an error
> to come up with a new instrument name value).  Thus the addition of
> instrument type "Super Ion Trap Turbo" to the CV would not provoke a rev
of
> the the W3C schema, so that's nothing to worry about if we went that
route.
>
>
> Come to think of it, it sounds a bit like that mapping file is just
another
> dialect of schema?  Maybe we're nearly there already.
>
> But I'm pretty sure I didn't understand... perhaps an example would help?
>
> Thanks,
>
> Brian
>
>
>
> -----Original Message-----
> From: psi...@li...
> [mailto:psi...@li...] On Behalf Of Lennart
> Martens
> Sent: Thursday, October 04, 2007 3:21 PM
> To: Matthew Chambers
> Cc: psi...@li...
> Subject: Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process
>
> Hi Matt,
>
>   
>> But what is the different between a frequently updated mapping 
>> file which is REQUIRED to get semantic validation, and a frequently 
>> updated primary schema which is REQUIRED to get semantic validation?
>>     
>
> The fact that the mapping file most often does not need to be updated to 
> operate correctly after CV changes, since it is based on the CV 
> structure (term-to-term links) rather than the actual accession numbers. 
> Indeed, for many CV param elements, the required (allowed) accession 
> numbers for that alement are not even in the cv mapping.
>
>
> Cheers,
>
> lnnrt.
>   

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Psidev-ms-dev mailing list
Psi...@li...
https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev

[Psidev-ms-dev] Option A, B, or C

From: Brian P. <bri...@in...> - 2007-10-04 19:05:12

To review:

A) <cvParam cvLabel="MS" accession="MS:1000583" name="SRM spectrum"
value=""/>

 

B) <cvParam cvLabel="MS" accession="MS:1000035" name="spectrum type"
value="SRM spectrum"/>

 

C) <cvParam cvLabel="MS" categoryAccession="MS:1000035"
categoryName="spectrum type" accession="MS:1000583" name="SRM spectrum"
value=""/>

 

I'd propose option D (or C+ if you prefer):

 

<cvParam cvLabel="MS" categoryAccession="MS:1000035" accession="MS:1000583"
name="SRM spectrum" />

 

The category (I'd prefer "parent") name is redundant - the parser is going
to use the accession number, and the human is going to get meaning from the
name itself with the CV as a fallback.  The value for "value" should be
defaulted to "", it's just taking up space.  

 

Also, for eyeballing purposes it would be nice if the human readable part
came first rather than last, if it's all the same to the parsers.  And, I'd
move the parent to the end since it's likely it won't be needed.  So,

 

<cvParam name="SRM spectrum" cvLabel="MS" accession="MS:1000583"
parentAccession="MS:1000035"/>

 

- Brian

Re: [Psidev-ms-dev] Option A, B, or C

From: Mike C. <tu...@gm...> - 2007-10-04 19:34:25

F) <cvParam cvLabel="MS" categoryName="spectrum type" name="SRM spectrum">

?


That is, can the accession number be uniquely determined from the
name?  If so, could these be looked up later if needed?

Mike

Re: [Psidev-ms-dev] Option A, B, or C

From: Matthew C. <mat...@va...> - 2007-10-04 19:37:31

Brian Pratt wrote:
> To review:
> A) <cvParam cvLabel="MS" accession="MS:1000583" name="SRM spectrum" 
> value=""/>
>
> B) <cvParam cvLabel="MS" accession="MS:1000035" name="spectrum type" 
> value="SRM spectrum"/>
>
> C) <cvParam cvLabel="MS" categoryAccession=”MS:1000035” 
> categoryName=”spectrum type” accession="MS:1000583" name="SRM 
> spectrum" value=""/>
>
> I'd propose option D (or C+ if you prefer):
>
> <cvParam cvLabel="MS" categoryAccession=”MS:1000035” 
> accession="MS:1000583" name="SRM spectrum" />
>
> The category (I'd prefer "parent") name is redundant - the parser is 
> going to use the accession number, and the human is going to get 
> meaning from the name itself with the CV as a fallback. The value for 
> "value" should be defaulted to "", it's just taking up space.
>
> Also, for eyeballing purposes it would be nice if the human readable 
> part came first rather than last, if it's all the same to the parsers. 
> And, I'd move the parent to the end since it's likely it won't be 
> needed. So,
>
> <cvParam name="SRM spectrum" cvLabel="MS" accession="MS:1000583" 
> parentAccession=”MS:1000035”/>
>
> - Brian
I agree that ordering the attributes the way you have them might be good 
for convention and they should be that way in the examples, there's no 
reason to actually require them to be in the order, is there? Also, to 
add my proposal from the other post, I'll call it:
E) <cvParam cvLabel="MS" accession=”MS:1000035” name=”spectrum type” 
valueAccession="MS:1000583" valueName="SRM spectrum"/>

I feel rather strongly that the "name" of a "parameter" should not ever 
be interpreted as a value. A "valueName" on the other hand, can be a 
text description of the valueAccession which is what the parser will 
usually care about. Additionally, this proposal allows the "accession" 
attribute to consistently refer to a category, instead of sometimes 
referring to a category and sometimes referring to a value, which is 
counter-intuitive.

Another thing to discuss with either C, D, or E, is what exactly is the 
"category" accession going to refer to? In a previous post of yours 
Brian, you wrote:
> Piling on with Mike, here:
> So the first thing any parser must do is load up the OBO file. In 
> practice, such a software system will need to bundle an OBO in some 
> fashion, in the extremely likely event that the OBO used by the mzML 
> file in question is not present. Don't forget to update your distro 
> each time the OBO gets updated, and make sure that in the event the 
> OBO used by the mzML file IS present, you use that intead.
> Then, read:
>
>
> <cvParam cvLabel="MS" accession="MS:1000554" name="LCQ Deca" value=""/>
>
> then ask yourself, "whazzat?", and look up:
>
> id: MS:1000554
> name: LCQ Deca
> def: "ThermoFinnigan LCQ Deca." [PSI:MS]
> is_a: MS:1000125 ! thermo finnigan
>
> which leads you to:
>
> id: MS:1000125
> name: thermo finnigan
> def: "ThermoFinnigan from Thermo Electron Corporation" [PSI:MS]
> is_a: MS:1000483 ! thermo fisher scientific
>
> which leads you to:
>
> id: MS:1000483
> name: thermo fisher scientific
> def: "Thermo Fisher Scientific. Also known as Thermo Finnigan 
> corporation." [PSI:MS]
> related_synonym: "Thermo Scientific" []
> is_a: MS:1000031 ! model by vendor
>
> which leads you to:
>
> id: MS:1000031
> name: model by vendor
> def: "Instrument's model name (everything but the vendor's name) 
> ---Free text ?" [PSI:MS]
> relationship: part_of MS:1000463 ! instrument description
>
> which leads you to:
>
> id: MS:1000463
> name: instrument description
> def: "Device which performs a measurement." [PSI:MS]
> relationship: part_of MS:0000000 ! mzOntology
>
> aha! now populate the "instrument description" element in your database.
>
So the main category is MS:1000463, but MS:1000463 is not the parent of 
MS:1000554 (it is an ancestor, but more specifically it is the root). 
Intuitively, the category accession number should of course be the root 
in this case, but will that always be the case?

-Matt

Re: [Psidev-ms-dev] Option A, B, or C

From: Chris T. <chr...@eb...> - 2007-10-04 19:53:27

There is another reason for numerical accessions in 
classifications (that I may have missed someone else offering in 
the flood today) be it a CV or a DB like GenBank or whatever, 
which is kind of trivial but nonetheless worth keeping in mind 
(and regardless, let us remember that not only the PSI's CVs 
constitute use cases for whatever structure is agreed -- while 
the MS CV is under PSI control, little else is): The reason is a 
simple one -- accession _numbers_ are most usually used because 
they are assigned like tickets for people waiting in line at the 
store -- whatever turns up gets the next available number from 
the stack basically. Using meaningful strings makes this much 
more of a pain as the space of 'nice' names will get used up and 
you can guess the rest -- names will ultimately get less 
intuitive (and remember a good CV can take a paragraph to 
_define_ a concept to avoid misinterpretation, so a word/phrase 
is not enough to achieve interpretability in many cases anyway); 
it'll be an increasing pain checking uniqueness before assigning 
new labels; case-sensitivity issues may even arise in some 
contexts perhaps (although I know you will tell me that lookup 
and other processing is unaffacted).

A nice contrast can be had by comparing DeltaMass (term = 
accession -- worst case scenario) to say RESID or Unimod. 
Another thought occurs -- would one need to agree a naming 
convention for names as accessions? No white space -- 
underscores versus CamelHump versus camelHump etc. A world of 
hurt as Jesse Ventura once put it  ;)

Cheers, Chris.

P.S. I know none of the above are killer arguments, but maybe 
strawsForTheCamelsBack?


Matthew Chambers wrote:
> Brian Pratt wrote:
>> To review:
>> A) <cvParam cvLabel="MS" accession="MS:1000583" name="SRM spectrum" 
>> value=""/>
>>
>> B) <cvParam cvLabel="MS" accession="MS:1000035" name="spectrum type" 
>> value="SRM spectrum"/>
>>
>> C) <cvParam cvLabel="MS" categoryAccession=”MS:1000035” 
>> categoryName=”spectrum type” accession="MS:1000583" name="SRM 
>> spectrum" value=""/>
>>
>> I'd propose option D (or C+ if you prefer):
>>
>> <cvParam cvLabel="MS" categoryAccession=”MS:1000035” 
>> accession="MS:1000583" name="SRM spectrum" />
>>
>> The category (I'd prefer "parent") name is redundant - the parser is 
>> going to use the accession number, and the human is going to get 
>> meaning from the name itself with the CV as a fallback. The value for 
>> "value" should be defaulted to "", it's just taking up space.
>>
>> Also, for eyeballing purposes it would be nice if the human readable 
>> part came first rather than last, if it's all the same to the parsers. 
>> And, I'd move the parent to the end since it's likely it won't be 
>> needed. So,
>>
>> <cvParam name="SRM spectrum" cvLabel="MS" accession="MS:1000583" 
>> parentAccession=”MS:1000035”/>
>>
>> - Brian
> I agree that ordering the attributes the way you have them might be good 
> for convention and they should be that way in the examples, there's no 
> reason to actually require them to be in the order, is there? Also, to 
> add my proposal from the other post, I'll call it:
> E) <cvParam cvLabel="MS" accession=”MS:1000035” name=”spectrum type” 
> valueAccession="MS:1000583" valueName="SRM spectrum"/>
> 
> I feel rather strongly that the "name" of a "parameter" should not ever 
> be interpreted as a value. A "valueName" on the other hand, can be a 
> text description of the valueAccession which is what the parser will 
> usually care about. Additionally, this proposal allows the "accession" 
> attribute to consistently refer to a category, instead of sometimes 
> referring to a category and sometimes referring to a value, which is 
> counter-intuitive.
> 
> Another thing to discuss with either C, D, or E, is what exactly is the 
> "category" accession going to refer to? In a previous post of yours 
> Brian, you wrote:
>> Piling on with Mike, here:
>> So the first thing any parser must do is load up the OBO file. In 
>> practice, such a software system will need to bundle an OBO in some 
>> fashion, in the extremely likely event that the OBO used by the mzML 
>> file in question is not present. Don't forget to update your distro 
>> each time the OBO gets updated, and make sure that in the event the 
>> OBO used by the mzML file IS present, you use that intead.
>> Then, read:
>>
>>
>> <cvParam cvLabel="MS" accession="MS:1000554" name="LCQ Deca" value=""/>
>>
>> then ask yourself, "whazzat?", and look up:
>>
>> id: MS:1000554
>> name: LCQ Deca
>> def: "ThermoFinnigan LCQ Deca." [PSI:MS]
>> is_a: MS:1000125 ! thermo finnigan
>>
>> which leads you to:
>>
>> id: MS:1000125
>> name: thermo finnigan
>> def: "ThermoFinnigan from Thermo Electron Corporation" [PSI:MS]
>> is_a: MS:1000483 ! thermo fisher scientific
>>
>> which leads you to:
>>
>> id: MS:1000483
>> name: thermo fisher scientific
>> def: "Thermo Fisher Scientific. Also known as Thermo Finnigan 
>> corporation." [PSI:MS]
>> related_synonym: "Thermo Scientific" []
>> is_a: MS:1000031 ! model by vendor
>>
>> which leads you to:
>>
>> id: MS:1000031
>> name: model by vendor
>> def: "Instrument's model name (everything but the vendor's name) 
>> ---Free text ?" [PSI:MS]
>> relationship: part_of MS:1000463 ! instrument description
>>
>> which leads you to:
>>
>> id: MS:1000463
>> name: instrument description
>> def: "Device which performs a measurement." [PSI:MS]
>> relationship: part_of MS:0000000 ! mzOntology
>>
>> aha! now populate the "instrument description" element in your database.
>>
> So the main category is MS:1000463, but MS:1000463 is not the parent of 
> MS:1000554 (it is an ancestor, but more specifically it is the root). 
> Intuitively, the category accession number should of course be the root 
> in this case, but will that always be the case?
> 
> -Matt
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> _______________________________________________
> Psidev-ms-dev mailing list
> Psi...@li...
> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev
> 

-- 
~~~~~~~~~~~~~~~~~~~~~~~~
  chr...@eb...
  http://mibbi.sf.net/
~~~~~~~~~~~~~~~~~~~~~~~~

Re: [Psidev-ms-dev] Option A, B, or C

From: Brian P. <bri...@in...> - 2007-10-04 21:11:34

Quite right, attribute order ought not to matter syntactically.  Just a
convention suggestion.

I was thinking that the parentAcession would be the immediate parent in the
inheritance tree so you could begin finding your way up to something you
recognize (the root of the tree might be higher than you wanted to go, and
finding your way down is even more annoying than finding your way up).

Of course having the immediate parent's accession number is not much help if
the parent isn't in the CV, but all we're really hoping to guard against
here is failing in the case of things like the new "LCQ Deca Turbo" model
coming out, when the data looks otherwise the same as that from the "LCQ
Deca" model.  There's no magic bullet for dealing with radical additions to
the syntax - I think we're really just wrangling about how to deal with new
enum values.  It still kind of amazes me that this is a problem we're
solving from scratch in a world with W3C schema in it, but I'm trying to
play nice since the cvParam thing seems to have unstoppable inertia.  I'd
much prefer this: 
<InstrumentType name="LCQ Deca" accession="MS:1000554" /> 
- that's proper XML, to my mind, as opposed to merely valid XML, and it
still leverages the power of the CV.  A schema generated from and referring
to the CV just doesn't seem like a problem - there's a schema in the CV
crying to get out, in the form of the is_a and part_of data (and if there
isn't, the CV is probably broken, so it's a useful exercise either way).  

- Brian


-----Original Message-----
From: psi...@li...
[mailto:psi...@li...] On Behalf Of Matthew
Chambers
Sent: Thursday, October 04, 2007 12:38 PM
To: Mass spectrometry standard development
Subject: Re: [Psidev-ms-dev] Option A, B, or C

Brian Pratt wrote:
> To review:
> A) <cvParam cvLabel="MS" accession="MS:1000583" name="SRM spectrum" 
> value=""/>
>
> B) <cvParam cvLabel="MS" accession="MS:1000035" name="spectrum type" 
> value="SRM spectrum"/>
>
> C) <cvParam cvLabel="MS" categoryAccession="MS:1000035" 
> categoryName="spectrum type" accession="MS:1000583" name="SRM 
> spectrum" value=""/>
>
> I'd propose option D (or C+ if you prefer):
>
> <cvParam cvLabel="MS" categoryAccession="MS:1000035" 
> accession="MS:1000583" name="SRM spectrum" />
>
> The category (I'd prefer "parent") name is redundant - the parser is 
> going to use the accession number, and the human is going to get 
> meaning from the name itself with the CV as a fallback. The value for 
> "value" should be defaulted to "", it's just taking up space.
>
> Also, for eyeballing purposes it would be nice if the human readable 
> part came first rather than last, if it's all the same to the parsers. 
> And, I'd move the parent to the end since it's likely it won't be 
> needed. So,
>
> <cvParam name="SRM spectrum" cvLabel="MS" accession="MS:1000583" 
> parentAccession="MS:1000035"/>
>
> - Brian
I agree that ordering the attributes the way you have them might be good 
for convention and they should be that way in the examples, there's no 
reason to actually require them to be in the order, is there? Also, to 
add my proposal from the other post, I'll call it:
E) <cvParam cvLabel="MS" accession="MS:1000035" name="spectrum type" 
valueAccession="MS:1000583" valueName="SRM spectrum"/>

I feel rather strongly that the "name" of a "parameter" should not ever 
be interpreted as a value. A "valueName" on the other hand, can be a 
text description of the valueAccession which is what the parser will 
usually care about. Additionally, this proposal allows the "accession" 
attribute to consistently refer to a category, instead of sometimes 
referring to a category and sometimes referring to a value, which is 
counter-intuitive.

Another thing to discuss with either C, D, or E, is what exactly is the 
"category" accession going to refer to? In a previous post of yours 
Brian, you wrote:
> Piling on with Mike, here:
> So the first thing any parser must do is load up the OBO file. In 
> practice, such a software system will need to bundle an OBO in some 
> fashion, in the extremely likely event that the OBO used by the mzML 
> file in question is not present. Don't forget to update your distro 
> each time the OBO gets updated, and make sure that in the event the 
> OBO used by the mzML file IS present, you use that intead.
> Then, read:
>
>
> <cvParam cvLabel="MS" accession="MS:1000554" name="LCQ Deca" value=""/>
>
> then ask yourself, "whazzat?", and look up:
>
> id: MS:1000554
> name: LCQ Deca
> def: "ThermoFinnigan LCQ Deca." [PSI:MS]
> is_a: MS:1000125 ! thermo finnigan
>
> which leads you to:
>
> id: MS:1000125
> name: thermo finnigan
> def: "ThermoFinnigan from Thermo Electron Corporation" [PSI:MS]
> is_a: MS:1000483 ! thermo fisher scientific
>
> which leads you to:
>
> id: MS:1000483
> name: thermo fisher scientific
> def: "Thermo Fisher Scientific. Also known as Thermo Finnigan 
> corporation." [PSI:MS]
> related_synonym: "Thermo Scientific" []
> is_a: MS:1000031 ! model by vendor
>
> which leads you to:
>
> id: MS:1000031
> name: model by vendor
> def: "Instrument's model name (everything but the vendor's name) 
> ---Free text ?" [PSI:MS]
> relationship: part_of MS:1000463 ! instrument description
>
> which leads you to:
>
> id: MS:1000463
> name: instrument description
> def: "Device which performs a measurement." [PSI:MS]
> relationship: part_of MS:0000000 ! mzOntology
>
> aha! now populate the "instrument description" element in your database.
>
So the main category is MS:1000463, but MS:1000463 is not the parent of 
MS:1000554 (it is an ancestor, but more specifically it is the root). 
Intuitively, the category accession number should of course be the root 
in this case, but will that always be the case?

-Matt

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Psidev-ms-dev mailing list
Psi...@li...
https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev

Re: [Psidev-ms-dev] Option A, B, or C

From: Matthew C. <mat...@va...> - 2007-10-04 21:27:51

I am starting to agree with Brian in that it seems that some of our 
requirements are mutually exclusive:
- we want a schema that doesn't change -> thus we cannot represent the 
ever-changing semantics in the schema
- we want a semantic validation tool -> thus we need the tool to keep up 
with the ever-changing semantics somehow, be it in the schema or some 
external mapping file, I don't see the difference!

And what is the point of the schema itself if it doesn't capture the 
semantics of the specification?

-Matt

Brian Pratt wrote:
> Quite right, attribute order ought not to matter syntactically.  Just a
> convention suggestion.
>
> I was thinking that the parentAcession would be the immediate parent in the
> inheritance tree so you could begin finding your way up to something you
> recognize (the root of the tree might be higher than you wanted to go, and
> finding your way down is even more annoying than finding your way up).
>
> Of course having the immediate parent's accession number is not much help if
> the parent isn't in the CV, but all we're really hoping to guard against
> here is failing in the case of things like the new "LCQ Deca Turbo" model
> coming out, when the data looks otherwise the same as that from the "LCQ
> Deca" model.  There's no magic bullet for dealing with radical additions to
> the syntax - I think we're really just wrangling about how to deal with new
> enum values.  It still kind of amazes me that this is a problem we're
> solving from scratch in a world with W3C schema in it, but I'm trying to
> play nice since the cvParam thing seems to have unstoppable inertia.  I'd
> much prefer this: 
> <InstrumentType name="LCQ Deca" accession="MS:1000554" /> 
> - that's proper XML, to my mind, as opposed to merely valid XML, and it
> still leverages the power of the CV.  A schema generated from and referring
> to the CV just doesn't seem like a problem - there's a schema in the CV
> crying to get out, in the form of the is_a and part_of data (and if there
> isn't, the CV is probably broken, so it's a useful exercise either way).  
>
> - Brian
>

Re: [Psidev-ms-dev] Option A, B, or C

From: Angel P. <an...@ma...> - 2007-10-05 00:16:59

On 10/4/07, Brian Pratt <bri...@in...> wrote:
>
>   It still kind of amazes me that this is a problem we're
> solving from scratch in a world with W3C schema in it, but I'm trying to
> play nice since the cvParam thing seems to have unstoppable inertia.  I'd
> much prefer this:
> <InstrumentType name="LCQ Deca" accession="MS:1000554" />
> - that's proper XML, to my mind, as opposed to merely valid XML, and it
> still leverages the power of the CV.

Actually I would prefer that structure as well and asked on the list for
folks to specifically outline places in the schema where this could happen:

http://sourceforge.net/mailarchive/message.php?msg_name=e38f4b170708071310m76356fe5g3f81b5eff44ce2c6%40mail.gmail.com

See the threads from 8/7 - 8/9 for the full discussion, but let me just put
it out there that it is not too late to have these types of changes! That's
what the public review process is for! I don't think we did a good enough
job of communicating to folks that this type of typed CV structure was an
option for schema change proposals.

-angel

Re: [Psidev-ms-dev] Option A, B, or C

From: Matt C. <mat...@va...> - 2007-10-05 01:39:12

Two potential problems with this structure: it drops either the value 
accession number or the category accession number, given that Brian 
suggested it I expect he intended the latter to be dropped and that the 
element name becomes the unique category name.  It also eliminates the 
possibility of having synonyms for the category names, and we can't 
change the element/category name without breaking backward 
compatibility.  I don't really mind about either of these problems, but 
I'm under the impression that others do mind.  So what you're asking 
Angel is what places in the schema have a category cvParam that could be 
set in stone and not allowed to have synonym category names and thus 
converted into this structure instead?

-Matt

Angel Pizarro wrote:
> On 10/4/07, *Brian Pratt* <bri...@in... 
> <mailto:bri...@in...>> wrote:
>
>       It still kind of amazes me that this is a problem we're
>     solving from scratch in a world with W3C schema in it, but I'm
>     trying to
>     play nice since the cvParam thing seems to have unstoppable
>     inertia.  I'd
>     much prefer this:
>     <InstrumentType name="LCQ Deca" accession="MS:1000554" />
>     - that's proper XML, to my mind, as opposed to merely valid XML,
>     and it
>     still leverages the power of the CV.   
>
>
> Actually I would prefer that structure as well and asked on the list 
> for folks to specifically outline places in the schema where this 
> could happen:
>
> http://sourceforge.net/mailarchive/message.php?msg_name=e38f4b170708071310m76356fe5g3f81b5eff44ce2c6%40mail.gmail.com
>
> See the threads from 8/7 - 8/9 for the full discussion, but let me 
> just put it out there that it is not too late to have these types of 
> changes! That's what the public review process is for! I don't think 
> we did a good enough job of communicating to folks that this type of 
> typed CV structure was an option for schema change proposals.
>
> -angel

Re: [Psidev-ms-dev] Option A, B, or C

From: Brian P. <bri...@in...> - 2007-10-05 01:46:46

I'll take a shot at auto-generating a schema from the OBO tomorrow.  I'm
curious to know if I'm just blowing smoke or not..

 

- Brian

 

  _____  

From: psi...@li...
[mailto:psi...@li...] On Behalf Of Angel
Pizarro
Sent: Thursday, October 04, 2007 5:17 PM
To: Mass spectrometry standard development
Subject: Re: [Psidev-ms-dev] Option A, B, or C

 

On 10/4/07, Brian Pratt <bri...@in...> wrote:

  It still kind of amazes me that this is a problem we're
solving from scratch in a world with W3C schema in it, but I'm trying to
play nice since the cvParam thing seems to have unstoppable inertia.  I'd 
much prefer this:
<InstrumentType name="LCQ Deca" accession="MS:1000554" />
- that's proper XML, to my mind, as opposed to merely valid XML, and it
still leverages the power of the CV.   


Actually I would prefer that structure as well and asked on the list for
folks to specifically outline places in the schema where this could happen:

http://sourceforge.net/mailarchive/message.php?msg_name=e38f4b170708071310m7
6356fe5g3f81b5eff44ce2c6%40mail.gmail.com

See the threads from 8/7 - 8/9 for the full discussion, but let me just put
it out there that it is not too late to have these types of changes! That's
what the public review process is for! I don't think we did a good enough
job of communicating to folks that this type of typed CV structure was an
option for schema change proposals. 

-angel

[Psidev-ms-dev] CV is broken?

From: Brian P. <bri...@in...> - 2007-10-05 19:43:07

I think we have some early fruit from my messing around with OBO->W3C schema
conversion.

 

In the CV file
<http://psidev.cvs.sourceforge.net/*checkout*/psidev/psi/psi-ms/mzML/control
ledVocabulary/psi-ms.obo>
http://psidev.cvs.sourceforge.net/*checkout*/psidev/psi/psi-ms/mzML/controll
edVocabulary/psi-ms.obo there is exactly one term that claims both an is_a
and part_of relationship:

 

[Term]

id: MS:1000246

name: delayed extraction

def: "The application of the accelerating voltage pulse after a time delay
in desorption ionization from a surface. The extraction delay can produce
energy focusing in a time-of-flight mass spectrometer." [PSI:MS]

exact_synonym: "DE" []

is_a: MS:1000462 ! ion optics

relationship: part_of MS:1000456 ! precursor activation description

 

Let's follow the inheritance chains:

 

MS:1000246 "delayed extraction"  is_a 

MS:1000462 "ion optics" part_of 

MS:1000463 "instrument description"  part_of 

MS:0000000 "MZ controlled vocabularies"

 

And also,

 

MS:1000246 "delayed extraction" part_of 

MS:1000456 "precursor activation description" part_of 

MS:1000442 "spectrum" part_of

MS:0000000 "MZ controlled vocabularies"

 

So:

A is a kind of B

A is a part of C

B is not a part of C

 

This would appear to violate the transitive property of the is_a and part_of
relationships.  Normally in discussing inheritance one views "is a" and "has
a" (or in the topsy-turvy world of OBO, "part of") as being distinct and
mutually exclusive ideas.

 

Actually the format itself is a bit of a surprise, I had anticipated "is_a"
being an enumerated type of "relationship" as "part_of" is.  If this
MS:1000246  is simply a victim of a clerical error, as I suspect it is, then
a tidier representation of inheritance would have helped catch the problem
sooner.

 

- Brian

Re: [Psidev-ms-dev] CV is broken?

From: Chris T. <chr...@eb...> - 2007-10-06 00:11:18

Actually I think the problem here is overloading of a term -- 
the thing is used in two different ways -- there is a 
description of the physical reality of the ion source (it does 
DE) and there is a term in a description -- really the problem 
here is that what is implied is that either a datum is part of 
the ion optics of the physical instance of a mass spec, or that 
a description (an abstract that can be manifest in files or 
whatever) contains a physical entity (DE-source bits). I think 
that's it anyway. So really the issue is the combination of two 
related but different things in one concept. Am I right?

Brian Pratt wrote:
> I think we have some early fruit from my messing around with OBO->W3C 
> schema conversion.
> 
>  
> 
> In the CV file 
> http://psidev.cvs.sourceforge.net/*checkout*/psidev/psi/psi-ms/mzML/controlledVocabulary/psi-ms.obo 
> there is exactly one term that claims both an is_a and part_of relationship:
> 
>  
> 
> [Term]
> 
> id: MS:1000246
> 
> name: delayed extraction
> 
> def: "The application of the accelerating voltage pulse after a time 
> delay in desorption ionization from a surface. The extraction delay can 
> produce energy focusing in a time-of-flight mass spectrometer." [PSI:MS]
> 
> exact_synonym: "DE" []
> 
> is_a: MS:1000462 ! ion optics
> 
> relationship: part_of MS:1000456 ! precursor activation description
> 
>  
> 
> Let's follow the inheritance chains:
> 
>  
> 
> MS:1000246 "delayed extraction"  is_a
> 
> MS:1000462 "ion optics" part_of
> 
> MS:1000463 "instrument description"  part_of
> 
> MS:0000000 "MZ controlled vocabularies"
> 
>  
> 
> And also,
> 
>  
> 
> MS:1000246 "delayed extraction" part_of
> 
> MS:1000456 "precursor activation description" part_of
> 
> MS:1000442 "spectrum" part_of
> 
> MS:0000000 "MZ controlled vocabularies"
> 
>  
> 
> So:
> 
> A is a kind of B
> 
> A is a part of C
> 
> B is not a part of C
> 
>  
> 
> This would appear to violate the transitive property of the is_a and 
> part_of relationships.  Normally in discussing inheritance one views “is 
> a” and “has a” (or in the topsy-turvy world of OBO, “part of”) as being 
> distinct and mutually exclusive ideas.
> 
>  
> 
> Actually the format itself is a bit of a surprise, I had anticipated 
> “is_a” being an enumerated type of “relationship” as “part_of” is.  If 
> this MS:1000246  is simply a victim of a clerical error, as I suspect it 
> is, then a tidier representation of inheritance would have helped catch 
> the problem sooner.
> 
>  
> 
> - Brian
> 
> 
> ------------------------------------------------------------------------
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Psidev-ms-dev mailing list
> Psi...@li...
> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev

-- 
~~~~~~~~~~~~~~~~~~~~~~~~
  chr...@eb...
  http://mibbi.sf.net/
~~~~~~~~~~~~~~~~~~~~~~~~

1 2 > >> (Page 1 of 2)