psidev-ms-dev Mailing List for Proteomics Standards Initiative (Page 110)

Status: Beta

Brought to you by: aceol, baranda, cccolinc, chrisftaylor, and 22 others

psidev-ms-dev — Mass spectroscopy standard development

You can subscribe to this list here.

2002	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (3)	Nov	Dec
2003	Jan	Feb	Mar	Apr (1)	May	Jun	Jul (1)	Aug	Sep	Oct	Nov (3)	Dec
2004	Jan	Feb	Mar	Apr	May (2)	Jun	Jul (1)	Aug (5)	Sep	Oct (5)	Nov (1)	Dec (2)
2005	Jan (2)	Feb (5)	Mar	Apr (1)	May (5)	Jun (2)	Jul (3)	Aug (7)	Sep (18)	Oct (22)	Nov (10)	Dec (15)
2006	Jan (15)	Feb (8)	Mar (16)	Apr (8)	May (2)	Jun (5)	Jul (3)	Aug (1)	Sep (34)	Oct (21)	Nov (14)	Dec (2)
2007	Jan	Feb (17)	Mar (10)	Apr (25)	May (11)	Jun (30)	Jul (1)	Aug (38)	Sep	Oct (119)	Nov (18)	Dec (3)
2008	Jan (34)	Feb (202)	Mar (57)	Apr (76)	May (44)	Jun (33)	Jul (33)	Aug (32)	Sep (41)	Oct (49)	Nov (84)	Dec (216)
2009	Jan (102)	Feb (126)	Mar (112)	Apr (26)	May (91)	Jun (54)	Jul (39)	Aug (29)	Sep (16)	Oct (18)	Nov (12)	Dec (23)
2010	Jan (29)	Feb (7)	Mar (11)	Apr (22)	May (9)	Jun (13)	Jul (7)	Aug (10)	Sep (9)	Oct (20)	Nov (1)	Dec
2011	Jan	Feb (4)	Mar (27)	Apr (15)	May (23)	Jun (13)	Jul (15)	Aug (11)	Sep (23)	Oct (18)	Nov (10)	Dec (7)
2012	Jan (23)	Feb (19)	Mar (7)	Apr (20)	May (16)	Jun (4)	Jul (6)	Aug (6)	Sep (14)	Oct (16)	Nov (31)	Dec (23)
2013	Jan (14)	Feb (19)	Mar (7)	Apr (25)	May (8)	Jun (5)	Jul (5)	Aug (6)	Sep (20)	Oct (19)	Nov (10)	Dec (12)
2014	Jan (6)	Feb (15)	Mar (6)	Apr (4)	May (16)	Jun (6)	Jul (4)	Aug (2)	Sep (3)	Oct (3)	Nov (7)	Dec (3)
2015	Jan (3)	Feb (8)	Mar (14)	Apr (3)	May (17)	Jun (9)	Jul (4)	Aug (2)	Sep	Oct (13)	Nov	Dec (6)
2016	Jan (8)	Feb (1)	Mar (20)	Apr (16)	May (11)	Jun (6)	Jul (5)	Aug	Sep (2)	Oct (5)	Nov (7)	Dec (2)
2017	Jan (10)	Feb (3)	Mar (17)	Apr (7)	May (5)	Jun (11)	Jul (4)	Aug (12)	Sep (9)	Oct (7)	Nov (2)	Dec (4)
2018	Jan (7)	Feb (2)	Mar (5)	Apr (6)	May (7)	Jun (7)	Jul (7)	Aug (1)	Sep (9)	Oct (5)	Nov (3)	Dec (5)
2019	Jan (10)	Feb	Mar (4)	Apr (4)	May (2)	Jun (8)	Jul (2)	Aug (2)	Sep	Oct (2)	Nov (9)	Dec (1)
2020	Jan (3)	Feb (1)	Mar (2)	Apr	May (3)	Jun	Jul (2)	Aug	Sep	Oct (1)	Nov	Dec (1)
2021	Jan	Feb	Mar	Apr (5)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2022	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec
2023	Jan	Feb	Mar (1)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2024	Jan	Feb (1)	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec (2)
2025	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 108 109 110 111 112 .. 125 > >> (Page 110 of 125)

Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

From: Angel P. <an...@ma...> - 2007-10-04 17:59:32

On 10/4/07, Matthew Chambers <mat...@va...> wrote:
>
> I'll comment here on the mzML schema and validation of mzML instances.
> I do not see why a proper XML schema with semantic significance could
> not be generated for mzML.  XML schema have the capability to provide
> robust restrictions on both elements and attributes, and such a schema
> could be automatically generated from the CV itself (when combined with
> a skeleton model of mzML).



This is an interesting idea, but as you mention below there are no tools for
doing this, so if you have a CS masters student available .... ;)

Some people complain that mzML is not true
> XML.  That's rather misleading.


+1 on that. mzML is valid and real XML. It just isn't using the enumerated
values of XML.

-angel

Re: [Psidev-ms-dev] attributes vs cvParams

From: Angel P. <an...@ma...> - 2007-10-04 17:56:12

On 10/4/07, Mike Coleman <tu...@gm...> wrote:
>
> On 10/4/07, Matthew Chambers <mat...@va...> wrote:
> > Oh yes, the userParam.  A synonym for the <comment> element ;).  Please
> > tell me how to use such an element in a meaningful and deterministic
> > way. If I write a value into a cvParam with the category "instrument
> > model" where the value text is "Super Duper Ion Trap" and the value's
> > accession number is a special accession number which means "not yet in
> > CV", ANY reader software should be able to interpret that parameter and
> > ultimately say that it has no idea what to do with data from such an
> > instrument.
>
> I agree with Matt here.  In particular, if I encounter this new "Super
> Duper Ion Trap" for the first time, it would be completely
> unacceptable for my software to indicate this by saying that my mzML
> file is invalid.  My software needs to be able to parse this file and
> tell me that the data came from a new instrument called "Super Duper
> Ion Trap" that it doesn't know how to deal with.


WRT to my point about operational vs. repository data formats. For a
repository, it is completely valid (and desirable) for the software to parse
this new value and add it to the list of possible values for the ontology
category.

-angel

Re: [Psidev-ms-dev] attributes vs cvParams

From: Mike C. <tu...@gm...> - 2007-10-04 17:08:07

On 10/4/07, Matthew Chambers <mat...@va...> wrote:
> Oh yes, the userParam.  A synonym for the <comment> element ;).  Please
> tell me how to use such an element in a meaningful and deterministic
> way. If I write a value into a cvParam with the category "instrument
> model" where the value text is "Super Duper Ion Trap" and the value's
> accession number is a special accession number which means "not yet in
> CV", ANY reader software should be able to interpret that parameter and
> ultimately say that it has no idea what to do with data from such an
> instrument.

I agree with Matt here.  In particular, if I encounter this new "Super
Duper Ion Trap" for the first time, it would be completely
unacceptable for my software to indicate this by saying that my mzML
file is invalid.  My software needs to be able to parse this file and
tell me that the data came from a new instrument called "Super Duper
Ion Trap" that it doesn't know how to deal with.

Mike

Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

From: Matthew C. <mat...@va...> - 2007-10-04 17:05:45

I'll comment here on the mzML schema and validation of mzML instances.  
I do not see why a proper XML schema with semantic significance could 
not be generated for mzML.  XML schema have the capability to provide 
robust restrictions on both elements and attributes, and such a schema 
could be automatically generated from the CV itself (when combined with 
a skeleton model of mzML).  Some people complain that mzML is not true 
XML.  That's rather misleading.  Others say it needs a special 
"semantic" validator with its own mapping file.  I say that is 
duplicative and even overkill.  Existing schema technology can handle 
the format specified here, but I grant that the schema WILL have to be 
very complicated (you won't just have a single cvParam type or 
ParamGroupType, each part of the schema will have its own cvParam 
elements with semantically relevant restrictions on the accession 
numbers) and almost certainly should be machine-generated.  I see 
nothing wrong with a complicated schema though, because the variety of 
data that we are intending to represent is also very complicated!  I 
don't know if existing automatic code generators work for very 
complicated schema, but the automatic XML validators definitely should 
and thus the need for a separate "semantic" validator is unclear to me 
when the semantic relationships can be encapsulated in an automatically 
generated XML schema.  For example, the <contact> element could be 
defined semantically in XML schema like this:

<xs:complexType name="ContactParamGroupType">
	<xs:sequence>
		<xs:element name="paramGroupRef" type="dx:ContactParamGroupRefType" minOccurs="0" maxOccurs="unbounded"/>

		<xs:element name="cvParam" minOccurs="0" maxOccurs="1">
			<xs:complexType>
				<xs:attribute name="cvLabel" type="xs:string">
					<xs:restriction base="xs:string">
						<xs:pattern value="MS"/>
					</xs:restriction>
				</xs:attribute>
				<xs:attribute name="accession" type="xs:string">
					<xs:restriction base="xs:string">
						<xs:pattern value="MS:1000586"/>
					</xs:restriction>
				</xs:attribute>
				<xs:attribute name="name" type="xs:string">
					<xs:restriction base="xs:string">
						<xs:pattern value="contact name"/>
					</xs:restriction>
				</xs:attribute>
				<xs:attribute name="value" type="xs:string"/>
			</xs:complexType>
		</xs:element>

		<xs:element name="cvParam" minOccurs="0" maxOccurs="1">
			<xs:complexType>
				<xs:attribute name="cvLabel" type="xs:string">
					<xs:restriction base="xs:string">
						<xs:pattern value="MS"/>
					</xs:restriction>
				</xs:attribute>
				<xs:attribute name="accession" type="xs:string">
					<xs:restriction base="xs:string">
						<xs:pattern value="MS:1000587"/>
					</xs:restriction>
				</xs:attribute>
				<xs:attribute name="name" type="xs:string">
					<xs:restriction base="xs:string">
						<xs:pattern value="contact address"/>
					</xs:restriction>
				</xs:attribute>
				<xs:attribute name="value" type="xs:string"/>
			</xs:complexType>
		</xs:element>

		<xs:element name="cvParam" minOccurs="0" maxOccurs="1">
			<xs:complexType>
				<xs:attribute name="cvLabel" type="xs:string">
					<xs:restriction base="xs:string">
						<xs:pattern value="MS"/>
					</xs:restriction>
				</xs:attribute>
				<xs:attribute name="accession" type="xs:string">
					<xs:restriction base="xs:string">
						<xs:pattern value="MS:1000588"/>
					</xs:restriction>
				</xs:attribute>
				<xs:attribute name="name" type="xs:string">
					<xs:restriction base="xs:string">
						<xs:pattern value="contact URL"/>
					</xs:restriction>
				</xs:attribute>
				<xs:attribute name="value" type="xs:anyURI"/>
			</xs:complexType>
		</xs:element>

		<xs:element name="cvParam" minOccurs="0" maxOccurs="1">
			<xs:complexType>
				<xs:attribute name="cvLabel" type="xs:string">
					<xs:restriction base="xs:string">
						<xs:pattern value="MS"/>
					</xs:restriction>
				</xs:attribute>
				<xs:attribute name="accession" type="xs:string">
					<xs:restriction base="xs:string">
						<xs:pattern value="MS:1000589"/>
					</xs:restriction>
				</xs:attribute>
				<xs:attribute name="name" type="xs:string">
					<xs:restriction base="xs:string">
						<xs:pattern value="contact email"/>
					</xs:restriction>
				</xs:attribute>
				<xs:attribute name="value" type="dx:email"/>
			</xs:complexType>
		</xs:element>

		<xs:element name="userParam" type="dx:UserParamType" minOccurs="0" maxOccurs="unbounded"/>
	</xs:sequence>
</xs:complexType>

<xs:element name="contact" type="dx:ContactParamGroupType" minOccurs="0" maxOccurs="unbounded"/>

Like I said, this needs to be machine generated, but it would create a 
XML schema that removes the need for any other kind of semantic mapping 
and any new tool to do the validation with that mapping.

Now that I think about it again, this kind of often-updated schema would 
violate the unchangedness requirement from the specification: "It was 
hoped that the actual xsd schema could remain stable for many years 
while the accompanying controlled vocabulary could be frequently updated 
to support new technologies, instruments, and methods of acquiring 
data."  But what is the different between a frequently updated mapping 
file which is REQUIRED to get semantic validation, and a frequently 
updated primary schema which is REQUIRED to get semantic validation?

-Matt

Lennart Martens wrote:
> That mapping file is effectively in use by our mzML semantic validator, 
> for exactly the reasons you outlined above!
> So yes - this has been made available in the larger mzML kit and has 
> also been implemented online (your above example indeed does not validate).
>
>

[Psidev-ms-dev] Sean L Seymour/FOS/PEC is out of the office.

From: Sean L S. <Sey...@ap...> - 2007-10-04 17:02:05

I will be out of the office starting  09/13/2007 and will not return until
10/12/2007.

I will have no access to email.

Re: [Psidev-ms-dev] attributes vs cvParams

From: Matthew C. <mat...@va...> - 2007-10-04 16:57:25

Thanks Angel, I didn't intend for the discussion to get heated, it just 
seemed to me that Lennart didn't understand what I posted (which may be 
my fault, it's hard to know without other replies).  Remember I posted 
that I agree with cvParams and appreciate the flexibility they provide.  
But there is a difference between cvParams that have meaning without the 
CV and cvParams that aren't.  I much prefer the latter.  So neither of 
us are arguing for cvParams to go away.  You must be talking to somebody 
else. :)

-Matt

Angel Pizarro wrote:
> Lennert and Matt,
>
> While I appreciate that this is a topic of great interest to everyone 
> in the community, let's turn the heat down a bit. Let me see if I can 
> play the arbiter here:
>
> cvParams since their introduction have always been contentious. Given 
> the choice for design of a data formate where attributes (or sub 
> elements or inner text) could be encoded with a tight set of 
> enumerated sets of values vs. empty slots, a developer will always 
> choose the former.
>
> Why then did the mzML group choose cvParams? The answer is two fold: 
> 1) the audience,  and 2) the intent of the standard
>
> 1) Name one standard that has received industry support across 
> multiple vendors/tools/institutions that is tightly controlled with 
> enumerated values. Prove me wrong, but I can't think of any.
>
> The reasons for this is that consensus building is a slow process and 
> approval of any change in a data format can take months if not years. 
> You need flexible data formats for standards. This already rules out 
> enumerated values, but you can also make the case that vendors are 
> unwilling to tie their development efforts to projects that are not 
> under their complete control (essentially motivated by risk 
> management). As a vendor, if you officially support even on release of 
> a fast moving data format, customer expectations are such that you are 
> now expected to support all future releases of that format.
>
> 2) The intent of mzML is data transfer and vendor independent storage 
> of mass spec experimental data. It is not (officially) meant to be an 
> operational format. Operational formats would put much more weight on 
> the side of enumerated values.
>
>
> So for theses reasons (there are more though) cvParams are not going 
> to go away. As for actually doing work with mzML files, Matt is 
> absolutely right, this is going to be way more difficult than working 
> with mzXML 2.x (as a developer) While OLS is a fine andd dandy 
> project, it is not the end-all be-all solution to our problems. It 
> assumes network connectivity, which is a dubious assumption. Even 
> assuming very fast connectivity, the overhead of SOAP protocols are 
> waaaayyy too big to except in your typical use of mzML files, which 
> are signal processing and searches.  Please stop equating OLS with 
> mzML (or any other ML) since for most uses outside of a repository it 
> just won't work. -a

Re: [Psidev-ms-dev] attributes vs cvParams

From: Mike C. <tu...@gm...> - 2007-10-04 16:52:41

On 10/4/07, Lennart Martens <len...@gm...> wrote:
> This is no use. It imnmediately breaks down in the face of synonyms.
> Accession numbers are the way to go. Everybody in the life sciences
> knows and understands this principle ('9606' is 'human' or 'Homo
> sapiens' or 'man' or ...)

Hmm.  I think what you are saying is that end users are not always
able to properly distinguish between canonical *identifiers* (e.g.,
'9606' or 'human') and descriptive text unless the former happens to
look a meaningless string, such as a string of digits.

That may be, but strings of digits have their own problems.  It's a
lot easier to see that 'humaZ' is probably an invalid identifier than
that '9607' is, when looking for (the inevitable) problems.

I think that biologists understand the value of having semi-meaningful
identifiers.  They don't use digit strings for gene identifiers, for
example.

> That would make for very poor mzML documents then, as we semantically
> validate these files now (see the semantic validator in the beefier mzML
> kit). Your CV-less files would surely not validate, and would NOT be
> mzML files.

Hmm.  How complex is a minimal valid mzML file?  If they're not fairly
easy to generate, without knowing much about CV, this seems like a
problem.

> Sorry, but you are erroneously jumping to conclusions. The CV allows
> children to be added dynamically, correct usage of these can be
> validated and the list of children can be updated on-the-fly from web
> resources like the OLS (which auto-update every night).

I'm not sure what this means.  A nightly update of terms from the web
cannot be on our critical path for processing of spectra.  We need to
be able to proceed even if the OLS disappears forever.

> Again, you fail to see the point. The corrrect usage of CV terms can be
> validated. So if you mistype a number or its prefix, this will be
> considered an error. We need numbers because we want to be able to deal
> with synonyms (or even outright changes in the term names; it has
> happened before). Numbers are robust, numbers are convenient, numbers
> are strong. Text is not.

Actually, it's the other way around.  Characters strings are robust
and convenient, numbers are not.  The string 'human' is clearly not
equal to 'humaZ'.  The string '123' is clearly not equal to the string
'0123'.  Is the number 123 the same or different than 0123?  How about
0 and -0, not to mention 123.4 and 123.40 or 0.999999999999 and 1.0?

The use of numbers in a context like this seems to be mostly due to
history.  They may be a little more convenient for programmers, but
that's negligible.

> Remember that powerful and extremely user-friendly tools like the OLS
> take care of updating new terms for you fully automatically.

This phrase "powerful and extremely user-friendly tools" is a little
scary.  This implies having to learn, debug, etc., another piece of
software--one not necessarily under our control.  To be truly useful,
the spec really has to stand on its own (possibly referencing other
specs and data).

> I seem to read in your comments so far that there is a certain
> reluctance to the use of CV terms because this is new, and doesn't fit
> well with what you are good at right now. I would ask that you have a
> look at CV's on OLS (http://www.ebi.ac.uk/ols), and readthe developer
> documentation on how to access the OLS web services using your favourite
> programming language. After playing with it a bit, you'll notice that
> incorporating CV's into the parsing is not that much work, yet yields
> very clear benefits.

I don't even have time to keep up with this list, and the benefits of
OLS are far from clear.

Mike

Re: [Psidev-ms-dev] attributes vs cvParams

From: Angel P. <an...@ma...> - 2007-10-04 16:44:05

Lennert and Matt,

While I appreciate that this is a topic of great interest to everyone in the
community, let's turn the heat down a bit. Let me see if I can play the
arbiter here:

cvParams since their introduction have always been contentious. Given the
choice for design of a data formate where attributes (or sub elements or
inner text) could be encoded with a tight set of enumerated sets of values
vs. empty slots, a developer will always choose the former.

Why then did the mzML group choose cvParams? The answer is two fold: 1) the
audience,  and 2) the intent of the standard

1) Name one standard that has received industry support across multiple
vendors/tools/institutions that is tightly controlled with enumerated
values. Prove me wrong, but I can't think of any.

The reasons for this is that consensus building is a slow process and
approval of any change in a data format can take months if not years. You
need flexible data formats for standards. This already rules out enumerated
values, but you can also make the case that vendors are unwilling to tie
their development efforts to projects that are not under their complete
control (essentially motivated by risk management). As a vendor, if you
officially support even on release of a fast moving data format, customer
expectations are such that you are now expected to support all future
releases of that format.

2) The intent of mzML is data transfer and vendor independent storage of
mass spec experimental data. It is not (officially) meant to be an
operational format. Operational formats would put much more weight on the
side of enumerated values.


So for theses reasons (there are more though) cvParams are not going to go
away. As for actually doing work with mzML files, Matt is absolutely right,
this is going to be way more difficult than working with mzXML 2.x (as a
developer) While OLS is a fine andd dandy project, it is not the end-all
be-all solution to our problems. It assumes network connectivity, which is a
dubious assumption. Even assuming very fast connectivity, the overhead of
SOAP protocols are waaaayyy too big to except in your typical use of mzML
files, which are signal processing and searches.  Please stop equating OLS
with mzML (or any other ML) since for most uses outside of a repository it
just won't work. -a

Re: [Psidev-ms-dev] attributes vs cvParams

From: Matthew C. <mat...@va...> - 2007-10-04 16:12:23

Hi Lennart,

Lennart Martens wrote:
>> As for attributes vs. cvParams, I have a compromise to propose 
>> between methods A, B and C. I earlier proposed an extension to the 
>> structure of the CV which would be intended to force format writers 
>> to use certain well-defined values instead of whatever kind of 
>> capitalization and spacing they wish.  That proposal still stands and 
>> I'd like to hear feedback on it.
>
> This is no use. It imnmediately breaks down in the face of synonyms. 
> Accession numbers are the way to go. Everybody in the life sciences 
> knows and understands this principle ('9606' is 'human' or 'Homo 
> sapiens' or 'man' or ...)
>
I am a mere computer scientist, and to me all characters on computers 
are numbers. ;)  But I know what you are saying, and I have taken that 
into consideration.  That is why my suggestion was for the CV to CONTROL 
the synonyms and not let the synonyms be written but one way in VALID 
mzML.  From a technical perspective, this is no different than 
controlling the accession numbers.  From a practical perspective, I 
appreciate that some users might not be comfortable with having their 
options for text-based value attributes be controlled like they are for 
accession numbers, and if that's the majority perspective then I'm fine 
with using accession numbers for values.

>> But I think we should agree on some basic requirements and then 
>> evaluate proposals from there (this was probably done in one of your 
>> meetings or teleconferences, but I don't recall such a requirements 
>> list being posted on this mailing list). According to the 
>> specification document, there is a requirement to have a long-term, 
>> unchanging specification, mainly due to vendor interests it seems, 
>> which of course in the changing field of MS also means a requirement 
>> of a companion CV. I happen to agree with the idea of having a 
>> long-term, unchanging specification with a CV, even though I don't 
>> intend to use the CV very much, if at all.
>
> That would make for very poor mzML documents then, as we semantically 
> validate these files now (see the semantic validator in the beefier 
> mzML kit). Your CV-less files would surely not validate, and would NOT 
> be mzML files.
>
Um, excuse me but I'm perfectly capable of writing and reading valid 
mzML without using a CV web service or any kind of external validation. 
It may take a /bit/ of manual effort, but it's entirely possible. Of 
course, if you go with method A for the cvParams and the manual parser 
has to have an else/if for every possible value's accession number, then 
you're talking about a LOT of manual effort.  But with method B or C, 
not much at all.

>>  From a previous post by Eric Deutsch in this thread:
>>     <cvParam cvLabel="MS" accession="MS:1000031" name="instrument 
>> model" value="LCQ Deca"/>
>>     <cvParam cvLabel="MS" accession="MS:1000031" name="instrument 
>> model" value="LCQ DECA"/>
>>     <cvParam cvLabel="MS" accession="MS:1000031" name="instrument 
>> model" value="LTQ FT"/>
>>     <cvParam cvLabel="MS" accession="MS:1000031" name="instrument 
>> model" value="LTQ-FT"/>
>>     <cvParam cvLabel="MS" accession="MS:1000031" name="instrument 
>> model" value="LTQFT"/>
>> OK, so because of this legitimate concern we have another 
>> requirement: the spec must allow defining a restricted value set for 
>> categories like "instrument model." 
>
> Sorry, but you are erroneously jumping to conclusions. The CV allows 
> children to be added dynamically, correct usage of these can be 
> validated and the list of children can be updated on-the-fly from web 
> resources like the OLS (which auto-update every night).
>
I don't understand what you're saying here. Are you saying that we do 
NOT have a requirement of the spec needing to restrict the values set 
for a given cvParam category?  I don't understand the relevance of the 
updatability of the CV in this context.

>> I do not see a reason for a requirement that the spec must use 
>> accession numbers to enumerate those values. Consider, for example, 
>> that we have not specified whether the cvLabel parameter is case 
>> sensitive or not.  Suppose a naughty writer starts using lowercase 
>> instead of uppercase for the cvLabel, or for the cvLabel prefix on 
>> the accession number. Even worse, suppose the case sensitivity 
>> between the accession number's prefix and the cvLabel don't match. 
>> The best we can do is specify things like case sensitivity for these 
>> issues or force a certain case in certain contexts. We can't prevent 
>> people from writing broken instances of the specification.
>
> Again, you fail to see the point. The corrrect usage of CV terms can 
> be validated. So if you mistype a number or its prefix, this will be 
> considered an error. We need numbers because we want to be able to 
> deal with synonyms (or even outright changes in the term names; it has 
> happened before). Numbers are robust, numbers are convenient, numbers 
> are strong. Text is not.
>
Ah, so you WANT support for synonyms.  I don't really understand that in 
the context of writing a standard data-representation format, but ok.

>> Based on the above requirement, one concern that I have (and I think 
>> many others do too, because frankly I get a strong impression that 
>> many people who want to use this spec don't care about being CV 
>> aware) is that a writer should be able to write a cvParam with a 
>> value that is not in the allowed value set of the CV without making 
>> readers have no clue what the value is actually indicating. In other 
>> words, regardless of whether a reader is CV aware or not, a (machine 
>> OR human) reader should be able to glean the purpose of an unknown 
>> value in a cvParam via some kind of category specification (e.g. 
>> "instrument model", or by the category's accession number). If this 
>> is accepted as a requirement, it practically eliminates method A as 
>> an option because it provides no indication of what category the 
>> unknown cvParam's value belongs to.
>
> There is the option to include userparams. Alternatively, you take the 
> productive approach and signal the need to add the term to the CV. 
> Remember that powerful and extremely user-friendly tools like the OLS 
> take care of updating new terms for you fully automatically. If you 
> need to know the context of a term, referring to the CV should be your 
> first and most prominent approach.
Oh yes, the userParam.  A synonym for the <comment> element ;).  Please 
tell me how to use such an element in a meaningful and deterministic 
way. If I write a value into a cvParam with the category "instrument 
model" where the value text is "Super Duper Ion Trap" and the value's 
accession number is a special accession number which means "not yet in 
CV", ANY reader software should be able to interpret that parameter and 
ultimately say that it has no idea what to do with data from such an 
instrument.  The reader software can even be updated to know how to deal 
with that instrument by its value text instead of the value accession 
number, and once that's done some usable data already exists.  Nobody 
had to wait for that instrument model to be added to the CV for the data 
to be usable.  After that instrument model is added to the CV, of 
course, the writer should be updated to use the proper accession 
number.  If a reader is using the CV tools, their parser will be capable 
of reading such data automatically, and any reader that chose to 
manually update in order to deal with the value text while the value 
accession indicated "not yet in CV" can then choose whether to keep that 
support intact in order to deal with the data that was already 
generated, or it can remove it and return to using the pure CV.  If a 
primary goal is "flexibility," then forcing people to add a web service 
to their XML parser in order to get the CV is seriously stretching that 
goal.

>> There are perhaps other requirements for the cvParam, but I'll let 
>> others fill them in. My new proposed compromise is to split values 
>> into a valueAccession and a valueName, just like the optional 
>> unitAccession and unitName. The two value attributes would not be 
>> optional like the unit attributes, though. A special CV accession 
>> number could be allocated to indicate an "unrestricted" value, in 
>> which case the reader would use the valueName as the value.  
>> Alternatively, the reader could read the accession attribute (which 
>> in this compromise would always indicate a category's accession 
>> number) and choose based on that whether to look up the 
>> valueAccession in the CV or to use the valueName verbatim. So the SRM 
>> spectrum example would become:
>>     <cvParam cvLabel="MS" accession="MS:1000035" name="spectrum type" 
>> valueAccession="MS:1000583" valueName="SRM spectrum"/>
>
> For various complex reasons, this will wreck havoc. Because now the 
> two (accession and value accession) run the (unnecessary!) risk of 
> being able to go out of sync.
>
I see you have not elected to enumerate these various complex reasons, 
or describe what on earth you mean by having the accession numbers go 
out of sync.  I think you failed to notice that this compromise is very 
similar to method C, which in a recent post you put in your (tied) vote 
for.  In my opinion, it looks better and is more intuitive than the 
syntax in method C, but the semantics are exactly the same.  In method C 
it would look like:
<cvParam cvLabel="MS" categoryAccession="MS:1000035" 
categoryName="spectrum type" accession="MS:1000583" name="SRM spectrum"/>
You see?  Straight out of the specification document. Were you perhaps 
referring to the special accession numbers?  I proposed one that would 
mean that the value is "unrestricted" and another that would mean that 
the current value is not yet added to the cvParam but has been (or soon 
will be) submitted for adding (patent pending, if you will).

> I seem to read in your comments so far that there is a certain 
> reluctance to the use of CV terms because this is new, and doesn't fit 
> well with what you are good at right now. I would ask that you have a 
> look at CV's on OLS (http://www.ebi.ac.uk/ols), and readthe developer 
> documentation on how to access the OLS web services using your 
> favourite programming language. After playing with it a bit, you'll 
> notice that incorporating CV's into the parsing is not that much work, 
> yet yields very clear benefits.
>
You read correctly.  The clear benefits that the CV provides are not 
having to update the parser manually to deal with new CV terms and 
having a unified set of categories and values from which to generate 
data models.  Excuse my rudeness, but: Whoopdeedoo!  The vast majority 
of development effort is NOT in the parser, regardless of whether the 
parser is automatically or manually written.  The vast majority of 
development is in the PROCESSING of the data that gets parsed, and 
unless I'm missing something big, the CV provides no benefit at all for 
processing new kinds of data.  I'm NOT suggesting that the CV should 
provide such a benefit, of course, only trying to convey the reason for 
my reluctance.  In other words, I have no qualms about writing a new 
"else if" block to my parser every time a new kind of data comes out, 
considering that I will always have to add 500 other lines of code 
elsewhere in my software to actually process the new kind of data in a 
meaningful way.

-Matt

Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

From: Angel P. <an...@ma...> - 2007-10-04 13:22:05

so where this  mzML kit that you mention? With the OLS? -angel

On 10/4/07, Lennart Martens <len...@gm...> wrote:
>
>
> So yes - this has been made available in the larger mzML kit and has
> also been implemented online (your above example indeed does not
> validate).
>
>

Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

From: Lennart M. <len...@gm...> - 2007-10-04 10:53:35

Hi Andy,


> The decision about how to implement CV terms is pretty important and we should try to come up with a coherent policy across PSI if possible. Here are my thoughts:
> 
> A while back Luisa and myself drafted a proposal for mapping model elements to CV terms that may simplify some of the problems currently being worked through. The draft and sample instance are here: http://www.psidev.info/index.php?q=node/159 (see Mapping between exchange schema and CVs).
> 
> I would strongly vote for option A, and in addition maintain a mapping file. This is more work for the CV coordinators (but hopefully can be mainly automated), and would force software implementers to interact with the CV WG when they need new terms, but given the heavy reliance on CV terms in the mzML schema I see no way around this. 
> 
> If a mapping file is kept updated in parallel to the CV, software can check whether a valid term has been provided for a particular model element. In the example of spectrumType, the mapping file would specify that only child terms of spectrumType are allowed (e.g. for the model element fileContent). If a vendor publishes a file with:
> 
> <fileContent>
> 	<cvParam cvLabel="MS" accession="MS:9999999" name="SRM spectrum" value=""/>
> </fileContent>
> 
> This would automatically be rejected by the validator (or at least a warning output), as it should be, since there's no point having a CV where the terms are not controlled!  

That mapping file is effectively in use by our mzML semantic validator, 
for exactly the reasons you outlined above!
So yes - this has been made available in the larger mzML kit and has 
also been implemented online (your above example indeed does not validate).


Cheers,

lnnrt.

Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

From: Lennart M. <len...@gm...> - 2007-10-04 10:45:24

Hi Marc,


> (2) Semantic validator
> The semantic validator is a nice feature, but i think you must publish a 
> file that defines the mapping of CV terms to the schema.
> This file must answer questions like: Where can i use which term? How 
> often can i repeat a term? etc.
> With the heavy use of CV terms such a file is a non-optional part of the 
> format definition.
> What happened to that format Luisa proposed?

It is included :). Look in the 'ms-mapping.xml' file. It is (quite 
literally so) Luisa's file. The whole validator relies on a role-based 
'separation of concerns', so that the application is nearly 100% 
dynamically configured. It is a nice piece of work that we are currently 
writing up in order to publish it. Meanwhile, I'd be happy to provide 
more information on how the whole thing works. Just let me know what you 
want to learn.

> (4) General
> Finally i'd like to say that i agree with Brian Pratt. There is too much 
> CV and too little XML in the format for my taste.
> I don't argue against CV in general it's a nice technique that allows 
> the schema to be stable for a long time.
> But now everything is in the CV and there are hardly any XML attributes 
> left. This makes the format hard to implement and impossible to check 
> with an XML validator.
> And i don't see the advantage in most cases: I have to adapt the 
> software to new terms just as i would adapt it to new XML elements.

If you could use software that answered simple CV questions like 'what 
is the parent of X', or 'get children for X', or 'is X one of the 
children of Y (optionally with maximum Z generations)' (for instance); 
and if this software is on the net and always up-to-date, would that 
still mean you always have to redo everything?
I at least wouldn't expect so. It just requires a new way of dealing 
with the content of the file (which again, is what matters). Also 
remember that the semantic validator, in series after a schema 
validator, provides maximum validation for a file like an mzML file - 
both structure and content are thoroughly verfied (and nearly 100% 
dynamically configured - zero recoding necessary when new children get 
added, for instance).


Cheers,

lnnrt.

Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

From: Jones, A. [jonesar] <And...@li...> - 2007-10-04 10:40:05

Hi all,
The decision about how to implement CV terms is pretty important and we =
should try to come up with a coherent policy across PSI if possible. =
Here are my thoughts:

A while back Luisa and myself drafted a proposal for mapping model =
elements to CV terms that may simplify some of the problems currently =
being worked through. The draft and sample instance are here: =
http://www.psidev.info/index.php?q=3Dnode/159 (see Mapping between =
exchange schema and CVs).

I would strongly vote for option A, and in addition maintain a mapping =
file. This is more work for the CV coordinators (but hopefully can be =
mainly automated), and would force software implementers to interact =
with the CV WG when they need new terms, but given the heavy reliance on =
CV terms in the mzML schema I see no way around this.=20

If a mapping file is kept updated in parallel to the CV, software can =
check whether a valid term has been provided for a particular model =
element. In the example of spectrumType, the mapping file would specify =
that only child terms of spectrumType are allowed (e.g. for the model =
element fileContent). If a vendor publishes a file with:

<fileContent>
	<cvParam cvLabel=3D"MS" accession=3D"MS:9999999" name=3D"SRM spectrum" =
value=3D""/>
</fileContent>

This would automatically be rejected by the validator (or at least a =
warning output), as it should be, since there's no point having a CV =
where the terms are not controlled! =20

Option B <cvParam cvLabel=3D"MS" accession=3D"MS:1000035" =
name=3D"spectrum type" value=3D"SRM spectrum"/> looks particular bad to =
me, since there is no check that correct values are given. As was =
mentioned elsewhere on the list, you run into problems with upper/lower =
case, spacing etc. If software is going to rely on particular values =
being present, those values must be in the CV with persistent =
identifiers.=20

I believe OBO does not have the ability to distinguish between =
ontological classes (i.e. there as branch structure) and =
instances/individuals (i.e. leaf nodes used as values to annotate data). =
Again, this could be handled by the mapping file that specifies which =
terms can be used to annotate model elements.

A related point, in mzData, there is inconsistent usage of the value =
slot, since the specification has no ability to say whether a value (and =
a unit) should be given or not e.g. for term "sample mass (MS:1000004)" =
software should know that a value and unit must be given. It is =
reasonable that software should be able to check whether to expect a =
value or not for particular CV terms. Logically, this should be part of =
the CV itself, but as far as I'm aware OBO does not have this =
capability. One solution would be to add this to the mapping file as two =
Booleans on the cvTerm (allowsValue =3D "true/false" and requiresUnit =
=3D "true/false").

Cheers
Andy







> -----Original Message-----
> From: psi...@li... =
[mailto:psidev-ms-dev-
> bo...@li...] On Behalf Of Marc Sturm
> Sent: 04 October 2007 09:06
> To: psi...@li...
> Subject: Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process
>=20
> Hi all,
>=20
> first of all i would like to thank Eric and all the others in the
> working group for their effort.
> Here are my comments:
>=20
> (1) The new CV term problem
> A is clear and simple.
> B is simply a bad idea in my opinion. Why not use the child accession =
if
> we have it?
> C helps the software to know where the new term belongs, but the
> software does not know what to do with it in most cases. I think most =
of
> software implements these enum-like CV terms as enum types and thus
> cannot handle new values anyway. Additionally it is error prone
> (mismatching parent and child).
>=20
> As C is an extension of A, i vote for A or C, but i don't think that C
> helps very much.
>=20
> (2) Semantic validator
> The semantic validator is a nice feature, but i think you must publish =
a
> file that defines the mapping of CV terms to the schema.
> This file must answer questions like: Where can i use which term? How
> often can i repeat a term? etc.
> With the heavy use of CV terms such a file is a non-optional part of =
the
> format definition.
> What happened to that format Luisa proposed?
>=20
> (3) Comments to CV / Schema
> - The term MS:1000543 "data processing action" is missing some child
> terms i think. What about smoothing, baseline reduction and  removal =
low
> intensity data points?
> - Putting the software name in a CV will cause much trouble i think.
> Where are way to many upcoming tools and you will be constantly =
updating
> that obo file. I really think we should put that into a string =
attribute
> - I would add a new optional and unbounded element "parameter" with
> attributes "name", "type", value" to the dx:dataProcessing element to
> store the parameters of the software that were used for processing.
>=20
> (4) General
> Finally i'd like to say that i agree with Brian Pratt. There is too =
much
> CV and too little XML in the format for my taste.
> I don't argue against CV in general it's a nice technique that allows
> the schema to be stable for a long time.
> But now everything is in the CV and there are hardly any XML =
attributes
> left. This makes the format hard to implement and impossible to check
> with an XML validator.
> And i don't see the advantage in most cases: I have to adapt the
> software to new terms just as i would adapt it to new XML elements.
>=20
> Best regards,
>   Marc
>=20
> =
-------------------------------------------------------------------------=

> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a =
browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> _______________________________________________
> Psidev-ms-dev mailing list
> Psi...@li...
> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev

Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

From: Lennart M. <len...@gm...> - 2007-10-04 10:36:06

Hi Brian,


> This is specious.  The fact that mzData hasn’t revved only says to me 
> that it’s badly underspecified, which the paragraph in fact goes on to 
> illustrate.  The occasional revision of the mzXML schema, to my mind, 
> indicates a well maintained standard*.  A stable schema and evolving 
> ontology produce as much or more reader/writer code maintenance work as 
> an evolving schema-only does.

PRIDE has a stable schema, yet a rapidly evolving CV. We did not need to 
recode PRIDE whenever we changed the CV. So from experience: a stable 
schema + evolving (but initially well-organized) CV is not a problem in 
terms of maintenance. Having to redo the schema every other month is 
also possible, but nevertheless more hassle.

> It’s not like mzData readers don’t have 
> to be updated every time something gets added to the ontology.  At least 
> with a schema there are ways to generate code for these kinds of changes 
> automatically, and to easily validate the results.  Frankly when it 
> comes to data formats I think the term “flexible” is synonymous for 
> “trouble” – convenient for the writers, hell for the readers, and often 
> a dead end for that reason.

Let me make a black and white scenario for you - you have everything as 
attributes in the schema, and you auto-generate parsing code every week 
since you keep adding or changing attributes. Fine, no worries. Zero 
backwards compatibility, but hey - who cares about yesterdays data, 
right? And your generated code will swallow anything that is remotely 
using the right glyphs in those attributes (e.g.: 'I'm not providing 
sensible information here' as the value for the 'instrument_name' 
attribute). If your objective is convenience for the programmers (whose 
job it should be to program), you choose the 'everything in schema' 
path. If your objective is to transmit meaningful and 
validated/validatable data, you go the current mzML path. Now which one 
would make the most sense for a standard?

> I really think mzML will just perpetuate the issues mzData presented.  
> Better we should figure out a way to generate a proper XML schema based 
> on the ontology document.  The rest of the world uses proper XML, I 
> really don’t see what makes us special.

I do not believe that (a) mzData presents more issues than uses, (b) 
even if (a) were true, that mzML blatantly propagates these, and (c) 
that starting from scratch with a far too rigid, implicitly 
non-backwards compatible and unvalidatable (content-wise, which is where 
it matters) data transmission format is the way to go forward.


> *note that most of the mzXML revisions had to do with things like adding 
> data compression to peaklists.  It wasn’t getting banged around every 
> time somebody came out with a new mass spec, like the ontology will.

mzML will not get 'banged about' every time a new mass spec is added. 
That is the whole point. Please do try to understand the relatively 
simple concept - an addition to the instruments is completely and 
utterly transparant.


Cheers,

lnnrt.

Re: [Psidev-ms-dev] attributes vs cvParams

From: Lennart M. <len...@gm...> - 2007-10-04 10:24:52

Hi Matt,


> Time to reopen this can of worms! I like the specification document. 
> It's clearly written.  Unfortunately there is no clear way that I know 
> of to capture the semantically valid cvParam relationships in a flat 
> written document, but that can be done externally and it doesn't bother 
> me. I have one comment before discussing cvParams though: where is the 
> rationale for having "referenceable" paramGroups? I'm not disagreeing 
> with the idea, I think it's good, but it does need a rationale because 
> it's not typical XML practice. For example, why not use the xlink 
> standard to do the referencing? Also, do we guarantee the order of the 
> elements so that "referenceableParamGroupList" is always known to come 
> before the first "run" element (which if I read correctly is the first 
> element to make use of "paramGroupRef"s)?

The order of the elements is fixed. ReferenceableParamGroups can be 
referenced from any 'normal' paramgroup (which consists of any number of 
such refs, user params and cv params), as is clearly evident from the 
schema and schemadoc.


> As for attributes vs. cvParams, I have a compromise to propose between 
> methods A, B and C. I earlier proposed an extension to the structure of 
> the CV which would be intended to force format writers to use certain 
> well-defined values instead of whatever kind of capitalization and 
> spacing they wish.  That proposal still stands and I'd like to hear 
> feedback on it.

This is no use. It imnmediately breaks down in the face of synonyms. 
Accession numbers are the way to go. Everybody in the life sciences 
knows and understands this principle ('9606' is 'human' or 'Homo 
sapiens' or 'man' or ...)


> But I think we should agree on some basic requirements and then evaluate 
> proposals from there (this was probably done in one of your meetings or 
> teleconferences, but I don't recall such a requirements list being 
> posted on this mailing list). According to the specification document, 
> there is a requirement to have a long-term, unchanging specification, 
> mainly due to vendor interests it seems, which of course in the changing 
> field of MS also means a requirement of a companion CV. I happen to 
> agree with the idea of having a long-term, unchanging specification with 
> a CV, even though I don't intend to use the CV very much, if at all.

That would make for very poor mzML documents then, as we semantically 
validate these files now (see the semantic validator in the beefier mzML 
kit). Your CV-less files would surely not validate, and would NOT be 
mzML files.


>  From a previous post by Eric Deutsch in this thread:
>     <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" 
> value="LCQ Deca"/>
>     <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" 
> value="LCQ DECA"/>
>     <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" 
> value="LTQ FT"/>
>     <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" 
> value="LTQ-FT"/>
>     <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" 
> value="LTQFT"/>
> OK, so because of this legitimate concern we have another requirement: 
> the spec must allow defining a restricted value set for categories like 
> "instrument model." 

Sorry, but you are erroneously jumping to conclusions. The CV allows 
children to be added dynamically, correct usage of these can be 
validated and the list of children can be updated on-the-fly from web 
resources like the OLS (which auto-update every night).


> I do not see a reason for a requirement that the 
> spec must use accession numbers to enumerate those values. Consider, for 
> example, that we have not specified whether the cvLabel parameter is 
> case sensitive or not.  Suppose a naughty writer starts using lowercase 
> instead of uppercase for the cvLabel, or for the cvLabel prefix on the 
> accession number. Even worse, suppose the case sensitivity between the 
> accession number's prefix and the cvLabel don't match. The best we can 
> do is specify things like case sensitivity for these issues or force a 
> certain case in certain contexts. We can't prevent people from writing 
> broken instances of the specification.

Again, you fail to see the point. The corrrect usage of CV terms can be 
validated. So if you mistype a number or its prefix, this will be 
considered an error. We need numbers because we want to be able to deal 
with synonyms (or even outright changes in the term names; it has 
happened before). Numbers are robust, numbers are convenient, numbers 
are strong. Text is not.


> Based on the above requirement, one concern that I have (and I think 
> many others do too, because frankly I get a strong impression that many 
> people who want to use this spec don't care about being CV aware) is 
> that a writer should be able to write a cvParam with a value that is not 
> in the allowed value set of the CV without making readers have no clue 
> what the value is actually indicating. In other words, regardless of 
> whether a reader is CV aware or not, a (machine OR human) reader should 
> be able to glean the purpose of an unknown value in a cvParam via some 
> kind of category specification (e.g. "instrument model", or by the 
> category's accession number). If this is accepted as a requirement, it 
> practically eliminates method A as an option because it provides no 
> indication of what category the unknown cvParam's value belongs to.

There is the option to include userparams. Alternatively, you take the 
productive approach and signal the need to add the term to the CV. 
Remember that powerful and extremely user-friendly tools like the OLS 
take care of updating new terms for you fully automatically. If you need 
to know the context of a term, referring to the CV should be your first 
and most prominent approach.

> There are perhaps other requirements for the cvParam, but I'll let 
> others fill them in. My new proposed compromise is to split values into 
> a valueAccession and a valueName, just like the optional unitAccession 
> and unitName. The two value attributes would not be optional like the 
> unit attributes, though. A special CV accession number could be 
> allocated to indicate an "unrestricted" value, in which case the reader 
> would use the valueName as the value.  Alternatively, the reader could 
> read the accession attribute (which in this compromise would always 
> indicate a category's accession number) and choose based on that whether 
> to look up the valueAccession in the CV or to use the valueName 
> verbatim. So the SRM spectrum example would become:
>     <cvParam cvLabel="MS" accession="MS:1000035" name="spectrum type" 
> valueAccession="MS:1000583" valueName="SRM spectrum"/>

For various complex reasons, this will wreck havoc. Because now the two 
(accession and value accession) run the (unnecessary!) risk of being 
able to go out of sync.

I seem to read in your comments so far that there is a certain 
reluctance to the use of CV terms because this is new, and doesn't fit 
well with what you are good at right now. I would ask that you have a 
look at CV's on OLS (http://www.ebi.ac.uk/ols), and readthe developer 
documentation on how to access the OLS web services using your favourite 
programming language. After playing with it a bit, you'll notice that 
incorporating CV's into the parsing is not that much work, yet yields 
very clear benefits.


Cheers,

lnnrt.

Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

From: Marc S. <st...@in...> - 2007-10-04 08:06:21

Hi all,

first of all i would like to thank Eric and all the others in the 
working group for their effort.
Here are my comments:

(1) The new CV term problem
A is clear and simple.
B is simply a bad idea in my opinion. Why not use the child accession if 
we have it?
C helps the software to know where the new term belongs, but the 
software does not know what to do with it in most cases. I think most of 
software implements these enum-like CV terms as enum types and thus 
cannot handle new values anyway. Additionally it is error prone 
(mismatching parent and child).

As C is an extension of A, i vote for A or C, but i don't think that C 
helps very much.

(2) Semantic validator
The semantic validator is a nice feature, but i think you must publish a 
file that defines the mapping of CV terms to the schema.
This file must answer questions like: Where can i use which term? How 
often can i repeat a term? etc.
With the heavy use of CV terms such a file is a non-optional part of the 
format definition.
What happened to that format Luisa proposed?

(3) Comments to CV / Schema
- The term MS:1000543 "data processing action" is missing some child 
terms i think. What about smoothing, baseline reduction and  removal low 
intensity data points?
- Putting the software name in a CV will cause much trouble i think. 
Where are way to many upcoming tools and you will be constantly updating 
that obo file. I really think we should put that into a string attribute
- I would add a new optional and unbounded element "parameter" with 
attributes "name", "type", value" to the dx:dataProcessing element to 
store the parameters of the software that were used for processing.

(4) General
Finally i'd like to say that i agree with Brian Pratt. There is too much 
CV and too little XML in the format for my taste.
I don't argue against CV in general it's a nice technique that allows 
the schema to be stable for a long time.
But now everything is in the CV and there are hardly any XML attributes 
left. This makes the format hard to implement and impossible to check 
with an XML validator.
And i don't see the advantage in most cases: I have to adapt the 
software to new terms just as i would adapt it to new XML elements.

Best regards,
  Marc

Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

From: Brian P. <bri...@in...> - 2007-10-03 16:11:27

Looks like most commenting happens on this list, so here goes:

 

>From the spec:

 

"The mzData format was a far more flexible format than mzXML. The support of
new technologies could be added to mzData files by adding new controlled
vocabulary terms, while mzXML often required a full schema revision. This is
evidenced by mzData still at version 1.05 while mzXML is currently at
version 3.1. However, mzData did suffer from a problem of inconsistently
used vocabulary terms and there appeared several different dialects of
mzData, encoding the same information in subtly different ways. This was not
usually a problem for human inspection of the file, but caused difficulty
writing and maintaining reader software."

 

This is specious.  The fact that mzData hasn't revved only says to me that
it's badly underspecified, which the paragraph in fact goes on to
illustrate.  The occasional revision of the mzXML schema, to my mind,
indicates a well maintained standard*.  A stable schema and evolving
ontology produce as much or more reader/writer code maintenance work as an
evolving schema-only does.  It's not like mzData readers don't have to be
updated every time something gets added to the ontology.  At least with a
schema there are ways to generate code for these kinds of changes
automatically, and to easily validate the results.  Frankly when it comes to
data formats I think the term "flexible" is synonymous for "trouble" -
convenient for the writers, hell for the readers, and often a dead end for
that reason.

 

I really think mzML will just perpetuate the issues mzData presented.
Better we should figure out a way to generate a proper XML schema based on
the ontology document.  The rest of the world uses proper XML, I really
don't see what makes us special.

 

Well, hey, you asked.

 

- Brian

 

*note that most of the mzXML revisions had to do with things like adding
data compression to peaklists.  It wasn't getting banged around every time
somebody came out with a new mass spec, like the ontology will.

 

  _____  

From: psi...@li...
[mailto:psi...@li...] On Behalf Of Eric
Deutsch
Sent: Tuesday, October 02, 2007 3:32 PM
To: psi...@li...
Cc: Eric Deutsch
Subject: [Psidev-ms-dev] mzML 0.99.0 submitted to document process

 

Hi everyone, I am happy to announce that the mzML 0.99.0 specification
document has been submitted to the PSI document process. This is an
important milestone in the completion of mzML, but it is most certainly not
the end of development and feedback.

 

The specification document and all related materials are publicly available
at:

 

http://psidev.info/index.php?q=node/257

 

There are various kits of instance documents, xsds, the controlled
vocabulary, validators, etc. listed at that site. Please examine and
respond.

 

The actual specification document is posted at:

 

http://psidev.info/index.php?q=node/300

 

You may post comments at that site, or you may send them to this list. We
addressed nearly all issues brought up in the preview period in August. The
one main issue that remains unresolved is the problem of cvParams and how to
handle the inevitable scenario of new terms and older software. This is an
important issue. There is a discussion of it in the specification document.
Your input is sought.

 

We encourage you to begin developing (or adapting) software that implements
the format if you are comfortable knowing that there will be changes before
the 1.0.0 release. I believe that it is primarily by attempting to implement
the format that the community will test the format most rigorously and
reveal issues that still need to be resolved; this is far more effective
than gazing at the specification document.

 

Regards,

Eric

 

 

----------------------------------

Eric Deutsch, Ph.D.
Institute for Systems Biology
1441 North 34th Street
Seattle WA 98103
Tel: 206-732-1397
Fax: 206-732-1260
Email: ede...@sy...
WWW: http://www.systemsbiology.org/Senior_Research_Scientists/Eric_Deutsch

Re: [Psidev-ms-dev] attributes vs cvParams

From: Matthew C. <mat...@va...> - 2007-10-03 15:34:18

Hi all,

Time to reopen this can of worms! I like the specification document. 
It's clearly written.  Unfortunately there is no clear way that I know 
of to capture the semantically valid cvParam relationships in a flat 
written document, but that can be done externally and it doesn't bother 
me. I have one comment before discussing cvParams though: where is the 
rationale for having "referenceable" paramGroups? I'm not disagreeing 
with the idea, I think it's good, but it does need a rationale because 
it's not typical XML practice. For example, why not use the xlink 
standard to do the referencing? Also, do we guarantee the order of the 
elements so that "referenceableParamGroupList" is always known to come 
before the first "run" element (which if I read correctly is the first 
element to make use of "paramGroupRef"s)?

As for attributes vs. cvParams, I have a compromise to propose between 
methods A, B and C. I earlier proposed an extension to the structure of 
the CV which would be intended to force format writers to use certain 
well-defined values instead of whatever kind of capitalization and 
spacing they wish.  That proposal still stands and I'd like to hear 
feedback on it.

But I think we should agree on some basic requirements and then evaluate 
proposals from there (this was probably done in one of your meetings or 
teleconferences, but I don't recall such a requirements list being 
posted on this mailing list). According to the specification document, 
there is a requirement to have a long-term, unchanging specification, 
mainly due to vendor interests it seems, which of course in the changing 
field of MS also means a requirement of a companion CV. I happen to 
agree with the idea of having a long-term, unchanging specification with 
a CV, even though I don't intend to use the CV very much, if at all.

 From a previous post by Eric Deutsch in this thread:
    <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" 
value="LCQ Deca"/>
    <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" 
value="LCQ DECA"/>
    <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" 
value="LTQ FT"/>
    <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" 
value="LTQ-FT"/>
    <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" 
value="LTQFT"/>
OK, so because of this legitimate concern we have another requirement: 
the spec must allow defining a restricted value set for categories like 
"instrument model."  I do not see a reason for a requirement that the 
spec must use accession numbers to enumerate those values. Consider, for 
example, that we have not specified whether the cvLabel parameter is 
case sensitive or not.  Suppose a naughty writer starts using lowercase 
instead of uppercase for the cvLabel, or for the cvLabel prefix on the 
accession number. Even worse, suppose the case sensitivity between the 
accession number's prefix and the cvLabel don't match. The best we can 
do is specify things like case sensitivity for these issues or force a 
certain case in certain contexts. We can't prevent people from writing 
broken instances of the specification.

Based on the above requirement, one concern that I have (and I think 
many others do too, because frankly I get a strong impression that many 
people who want to use this spec don't care about being CV aware) is 
that a writer should be able to write a cvParam with a value that is not 
in the allowed value set of the CV without making readers have no clue 
what the value is actually indicating. In other words, regardless of 
whether a reader is CV aware or not, a (machine OR human) reader should 
be able to glean the purpose of an unknown value in a cvParam via some 
kind of category specification (e.g. "instrument model", or by the 
category's accession number). If this is accepted as a requirement, it 
practically eliminates method A as an option because it provides no 
indication of what category the unknown cvParam's value belongs to.

There are perhaps other requirements for the cvParam, but I'll let 
others fill them in. My new proposed compromise is to split values into 
a valueAccession and a valueName, just like the optional unitAccession 
and unitName. The two value attributes would not be optional like the 
unit attributes, though. A special CV accession number could be 
allocated to indicate an "unrestricted" value, in which case the reader 
would use the valueName as the value.  Alternatively, the reader could 
read the accession attribute (which in this compromise would always 
indicate a category's accession number) and choose based on that whether 
to look up the valueAccession in the CV or to use the valueName 
verbatim. So the SRM spectrum example would become:
    <cvParam cvLabel="MS" accession="MS:1000035" name="spectrum type" 
valueAccession="MS:1000583" valueName="SRM spectrum"/>

I like ketchup on my worms, how bout you?
-Matt Chambers
Vanderbilt MSRC

For reference, AFAIK this is the last post in this thread:

Joshua Tasman wrote:
> Hi all,
>
> Actually, I agree that we'd be better served if more structure was 
> applied at the xml schema level, but since design decisions have 
> already been made and it seems we're past the point of changing them, 
> I think we should stick to a consistent flavor.
>
> I'd propose finding most instances in the schema where attributes and 
> values are defined by the xml schema and replacing them with cvParams. 
> If we're reliant on the OBO, let's completely get away from any 
> parsing of human-readable elements.  In the OBO, we already have 
> inconsistent capitalization for source file types: "mzData File" vs 
> "wiff file". Let's simplify things and rely on the nice clean accession.
>
>  From a look through the instance document, some examples:
>
> I'd like to see soureFileType as a sub cvParam with a specific 
> accession reference, vs attribute:
> <sourceFile id="1" sourceFileName="tiny1.RAW" 
> sourceFileLocation="file://F:/data/Exp01" sourceFileType="Xcalibur RAW 
> file">
>
> contactInfo could use value'd cvParams for name, institution, etc, or 
> any other added features like email, phone, etc.
>
> fileChecksum's type should be a cv accession, instead of:
> <fileChecksum type="Sha1">
>
> In spectrum, spectrumType should be an cvParam, not attribute:
> <spectrum id="S19" scanNumber="19" spectrumType="MSn" msLevel="1">
>
> In binaryDataArray, attributes compressionType and dataType should be 
> cvParams:
> <binaryDataArray dataType="64-bit float" compressionType="none" 
> arrayLength="43" encodedLength="5000" dataProcessingRef="Xcalibur 
> Processing">
>
>
> Josh
>

[Psidev-ms-dev] mzML 0.99.0 submitted to document process

From: Eric D. <ede...@sy...> - 2007-10-02 22:32:26

Hi everyone, I am happy to announce that the mzML 0.99.0 specification
document has been submitted to the PSI document process. This is an
important milestone in the completion of mzML, but it is most certainly
not the end of development and feedback.

=20

The specification document and all related materials are publicly
available at:

=20

http://psidev.info/index.php?q=3Dnode/257

=20

There are various kits of instance documents, xsds, the controlled
vocabulary, validators, etc. listed at that site. Please examine and
respond.

=20

The actual specification document is posted at:

=20

http://psidev.info/index.php?q=3Dnode/300

=20

You may post comments at that site, or you may send them to this list.
We addressed nearly all issues brought up in the preview period in
August. The one main issue that remains unresolved is the problem of
cvParams and how to handle the inevitable scenario of new terms and
older software. This is an important issue. There is a discussion of it
in the specification document. Your input is sought.

=20

We encourage you to begin developing (or adapting) software that
implements the format if you are comfortable knowing that there will be
changes before the 1.0.0 release. I believe that it is primarily by
attempting to implement the format that the community will test the
format most rigorously and reveal issues that still need to be resolved;
this is far more effective than gazing at the specification document.

=20

Regards,

Eric

=20

=20

----------------------------------

Eric Deutsch, Ph.D.
Institute for Systems Biology
1441 North 34th Street
Seattle WA 98103
Tel: 206-732-1397
Fax: 206-732-1260
Email: ede...@sy...
WWW:
http://www.systemsbiology.org/Senior_Research_Scientists/Eric_Deutsch

=20

Re: [Psidev-ms-dev] attributes vs cvParams

From: Matthew C. <mat...@va...> - 2007-08-08 19:36:25


> -----Original Message-----
> From: psi...@li... [mailto:psidev-ms-dev-
> bo...@li...] On Behalf Of Joshua Tasman
> Sent: Wednesday, August 08, 2007 1:58 PM
> To: psi...@li...
> Subject: [Psidev-ms-dev] attributes vs cvParams
> 
> Hi all,
> 
> Actually, I agree that we'd be better served if more structure was
> applied at the xml schema level, but since design decisions have already
> been made and it seems we're past the point of changing them, I think we
> should stick to a consistent flavor.

I'm not terribly concerned about flavor of the XML I consume, but I don't
feel strongly one way or the other about most of the cvParam/schema issues.
I do feel strongly that parsers should not be required to look at the CV to
get basic meaning out of the file.


> I'd propose finding most instances in the schema where attributes and
> values are defined by the xml schema and replacing them with cvParams.
> If we're reliant on the OBO, let's completely get away from any parsing
> of human-readable elements.  In the OBO, we already have inconsistent
> capitalization for source file types: "mzData File" vs "wiff file".
> Let's simplify things and rely on the nice clean accession.
> 
>  From a look through the instance document, some examples:
> 
> I'd like to see soureFileType as a sub cvParam with a specific accession
> reference, vs attribute:
> <sourceFile id="1" sourceFileName="tiny1.RAW"
> sourceFileLocation="file://F:/data/Exp01" sourceFileType="Xcalibur RAW
> file">
I'm happy with:
<sourceFile id="1" sourceFileName="tiny1.RAW"
sourceFileLocation="file://F:/data/Exp01">
   <cvParam cvLabel="MS" accession="MS:xxxxxxx" name="Source file type"
value="Xcalibur RAW file" />
</sourceFile>

This must be accompanied by adding specific valid values to the ontology,
not just unique accession numbers.

I am not happy with:
<sourceFile id="1" sourceFileName="tiny1.RAW"
sourceFileLocation="file://F:/data/Exp01">
   <cvParam cvLabel="MS" accession="MS:xxxxxxx" name="Xcalibur RAW file"
value="" />
</sourceFile>

The idea of values being represented as unique accession numbers is against
common sense and possibly carcinogenic. ;)

> contactInfo could use value'd cvParams for name, institution, etc, or
> any other added features like email, phone, etc.
> 
> fileChecksum's type should be a cv accession, instead of:
> <fileChecksum type="Sha1">
What exactly are you suggesting here?
<fileChecksum
accession="MS:xx(sha1)xx">71be39fb2700ab2f3c8b2234b91274968b6899b1</fileChec
ksum>
Or
<fileChecksum>71be39fb2700ab2f3c8b2234b91274968b6899b1<cvParam cvLabel="MS"
accession="MS:xx(checksumType)xx" name="Checksum type" value="Sha1"
/></fileChecksum> <!-- ewwww -->
Or
<fileChecksum>71be39fb2700ab2f3c8b2234b91274968b6899b1<cvParam cvLabel="MS"
accession="MS:xx(sha1)xx" name="Sha1" value="" /></fileChecksum><!-- double
ewww! -->

I don't think any of these is better than leaving it as an attribute (and
possibly giving the checksum type attribute a schema type instead of putting
it in the ontology.  I don't think the cvParam paradigm works well on
elements which only have text nodes for children or which have no children
at all.

> In spectrum, spectrumType should be an cvParam, not attribute:
> <spectrum id="S19" scanNumber="19" spectrumType="MSn" msLevel="1">
I agree with this one.

> In binaryDataArray, attributes compressionType and dataType should be
> cvParams:
> <binaryDataArray dataType="64-bit float" compressionType="none"
> arrayLength="43" encodedLength="5000" dataProcessingRef="Xcalibur
> Processing">
I agree with this as well.

-Matt

[Psidev-ms-dev] attributes vs cvParams

From: Joshua T. <jt...@sy...> - 2007-08-08 18:58:23

Hi all,

Actually, I agree that we'd be better served if more structure was 
applied at the xml schema level, but since design decisions have already 
been made and it seems we're past the point of changing them, I think we 
should stick to a consistent flavor.

I'd propose finding most instances in the schema where attributes and 
values are defined by the xml schema and replacing them with cvParams. 
If we're reliant on the OBO, let's completely get away from any parsing 
of human-readable elements.  In the OBO, we already have inconsistent 
capitalization for source file types: "mzData File" vs "wiff file". 
Let's simplify things and rely on the nice clean accession.

 From a look through the instance document, some examples:

I'd like to see soureFileType as a sub cvParam with a specific accession 
reference, vs attribute:
<sourceFile id="1" sourceFileName="tiny1.RAW" 
sourceFileLocation="file://F:/data/Exp01" sourceFileType="Xcalibur RAW 
file">

contactInfo could use value'd cvParams for name, institution, etc, or 
any other added features like email, phone, etc.

fileChecksum's type should be a cv accession, instead of:
<fileChecksum type="Sha1">

In spectrum, spectrumType should be an cvParam, not attribute:
<spectrum id="S19" scanNumber="19" spectrumType="MSn" msLevel="1">

In binaryDataArray, attributes compressionType and dataType should be 
cvParams:
<binaryDataArray dataType="64-bit float" compressionType="none" 
arrayLength="43" encodedLength="5000" dataProcessingRef="Xcalibur 
Processing">


Josh

Re: [Psidev-ms-dev] cvParams using name attribute as value

From: Mike C. <tu...@gm...> - 2007-08-08 16:24:34

On 8/8/07, Matt Chambers <mat...@va...> wrote:
> > This does require a CV class and some methods:
> > cv.loadFromFile()
> > cv.isChildOf()
> > cv.getName()
> >
> > but this is not really complicated.
>
> But it is really relatively complicated. It is more conceptually and
> computationally complicated than simple string comparison (with the
> OPTION of checking the CV to see if the value is a controlled one). And
> worse, it's a complication I don't see a justification for unless there
> is a better reason than the one you gave above which has a more simple
> solution.

I agree with Matt.  A call like "isChildOf" looks simple, but what's
entailed in that call is that the *correct* CV is available and has
been parsed into a tree in memory.  There are good reasons to think
that this will be fairly difficult to do correctly in practice.

But on top of that, it just seems needlessly difficult.  It'd be a
little like having products in your grocery store marked with their
trademark name, but not a succinct description of what they
*are*--which you can only find out with a stock list lookup.
("Shimmer?  Is that a floor polish or a dessert topping?  Hope my
stock list is up to date...")

The alternative here would appear to be very simple.  Something like
the previously mentioned

   <cvParam cvLabel="MS" accession="MS:1000031" name="instrument
model" value="LTQ-FT"/>

would work fine.  As for the differing spellings of "LTQ-FT", there's
a canonical spelling available in the CV, and anyone that can't get
that right will probably find the complexity of multiple CV versions
insurmountable.

Consider also, how should newly created instruments be handled?  If
our lab invents the "MassMaster2000", do we need to create our own
augmented CV in order to handle this?  Does everyone who wants to read
MassMaster2000 mzML files need a copy of this augmented CV?  What if
they have twenty other augmented CVs?  How are those to be managed?

Mike

Re: [Psidev-ms-dev] cvParams using name attribute as value

From: Brian P. <bri...@in...> - 2007-08-08 15:49:39

If ionSelection is just one of many things that are too complicated and
varied and dynamic to actually specify, then just off the top of my head I
think it's going to be pretty hard to do a good job of parsing mzML.  I take
your point about mzXML being too specific, but there's such a thing as too
general as well.  My fear is that we'll see it balkanized, with most parsers
only really able to deal with the mode of mzML usage that the author really
cares about, which just leaves us with a bunch of ad hoc standards.  The
instrument name example (wherein a parser cannot be made robust enough to
read future versions) makes me think that not enough mental energy has gone
into considering the practicalities of being a consumer of mzML.  I've seen
this in other standards efforts I've been involved with in other industries
(internet security, circuit board manufacturing) - writers (mostly hardware
vendors) love the fexibility because they can just do it their way, but
readers (software vendors) bear the brunt of what amounts to one format per
vendor, and finally just fall back onto the per-vendor solutions they have
already invested in.
 
>> it is the same amount of work as if everything was in the schema. 
There actually *is* an advantage of specifying via schema instead of
ontology, which I've already pointed out - W3C schema is itself a standard
with a host of tools built up around it that will generate readers and
writers from properly formed schemas.  If mzML just used elements for
everything and each element had an attribute pointing at the ontololgy I
think we'd be better off.  The schema and the ontology would need to evolve
together, of course.
 
But, as you say, this thing is more or less nailed down at this point, so
I'm wasting the list's time with this schema talk, and I do apologise.  I
don't blame anyone for being annoyed at me dredging up these fundamental
objections yet again so late in the process.
 
Anyway, off for vacation until the end of next week.  Sorry to start a flame
then abandon it.
 
Cheers,
 
Brian

  _____  

From: del...@gm... [mailto:del...@gm...] On Behalf Of Angel
Pizarro
Sent: Wednesday, August 08, 2007 6:01 AM
To: Brian Pratt
Cc: psi...@li...
Subject: Re: [Psidev-ms-dev] cvParams using name attribute as value


On 8/7/07, Brian Pratt <bri...@in...> wrote: 

Hi Angel,
 
If I understand your question to be about identifying current mismatches
between terminology in the schema and the ontology, I'm not sure there are
any - but probably only because the schema has so little actual terminology
in it.


My question was more of a pragmatic one, about where would you add
specificity into the mzML schema. Your selecitonWindow example below is a
good one, in that the specification of of selectWindow is probably a range
value and we should  have two sub-elements that corresponding to type the
cvParam values to define the window (or just a well defined range
sub-element, skipping cvParam altogether).  

I don't think your second example is a good one tho, since there are so many
permutations of an ionSelection protocol and that more are certainly one the
way, t is better handled by an ontology specification. Yes this does make
parsers slightly harder, since now you must pay attention to the incoming
ontology, but it is the same amount of work as if everything was in the
schema. 

mzXML could get away with tight specification of these complex and changing
annotations, since its sole purpose was support of the ISB pipeline. Its
open source status only served to increase the user base, but the schema
changes were solely driven by the needs of that pipeline and solely by the
community that used it. Tryin to build consensus across many different
groups has led to the current version of mzML and that major structure of
mzML will not change at this point, so please let's just get to the
specifics of going through the schema and identifying where you think an
annotation should be promoted to the level of a schema element, and we'll
discuss as a group. 

-angel
 



Consider this example:
 
<xs:element name="selectionWindow" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="cvParam" type="dx:CVParamType" minOccurs="2"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
 
which says absolutely nothing at all about what a selectionWindow element
can be expected to contain when you encounter it.  It just says it will
contain at least two "parameters".  Not much of an aid to software
development.
 
The schema, if we can call it that, doesn't even specify what some of the
most fundamental information about a scan looks like.  For example, it
specifies that a scan may have a list of precursors, each of which will
contain an ionSelection, but stops short of telling you what an ionSelection
looks like:
<xs:element name="ionSelection" type="dx:ParamGroupType">
<xs:annotation>
<xs:documentation>This captures the type of ion selection being performed,
and trigger m/z (or m/z's), neutral loss criteria etc. for tandem-MS or data
dependent scans.</xs:documentation>
</xs:annotation>
</xs:element>

 Nearly all the details of nearly all the elements are just unspecified
blobs.  Normally with an XML format you can expect to at least start your
work by running it through something like XMLSpy that will autogenerate a
reader and a writer that you can then polish up (to handle, for example, the
necessary weirdness of base64+zlib in the peaklists).  But with this, you
get no kind of a head start at all, since the vast majority of the syntax is
hidden behind blobs like dx:CVParamType and dx:ParamGroupType .  It's just
not a specification.
 
The statement that led to your question, I think, was just me saying that if
we *did* create an actual schema, we'd want its terminology to agree with
the ontology where ever possible.  But it has to actually contain some
terminology, unlike the current schema.
 
Brian
 
 
  _____  

From: del...@gm... [mailto:del...@gm...] On Behalf Of Angel
Pizarro
Sent: Tuesday, August 07, 2007 1:10 PM
To: Brian Pratt
Cc: psi...@li...
Subject: Re: [Psidev-ms-dev] cvParams using name attribute as value





On 8/7/07, Brian Pratt <bri...@in...> wrote: 


Hey, the horse just twitched:  by placing CVparam information in attributes
of the elements of a conventionally structured XML schema (ala mzXML) we can
make use of the OBO work without adding a lot of unwanted complexity to
software systems that aren't really interested in it.  An mzML that
integrates well with OBO-aware systems is an excellent idea, but an mzML
that demands you BE an OBO-aware system seems less likely to achieve
widespread adoption.


Can you name specific attributes that you want to have cv terms be the value
for that are currently not in the schema? 
-angel

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser. 
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________ 
Psidev-ms-dev mailing list
Psi...@li...
https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev






-- 
Angel Pizarro
Director, Bioinformatics Facility
Institute for Translational Medicine and Therapeutics 
University of Pennsylvania
806 BRB II/III
421 Curie Blvd.
Philadelphia, PA 19104-6160

P: 215-573-3736
F: 215-573-9004

Re: [Psidev-ms-dev] cvParams using name attribute as value

From: Matt C. <mat...@va...> - 2007-08-08 13:24:53

Eric Deutsch wrote:
>
> The decision was made to make individual models cv terms to avoid 
> problems like:
>
> <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" 
> value="LCQ Deca"/>
>
> <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" 
> value="LCQ DECA"/>
>
> <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" 
> value="LTQ FT"/>
>
> <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" 
> value="LTQ-FT"/>
>
> <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" 
> value="LTQFT"/>
>
Is this the main/only reason for this usage of terms? This just seems 
like a great argument for having the ontology control the values of the 
terms and not just the terms themselves. That way, the simple 
term/name->value relationship is always maintained, and this problem is 
eliminated. I am not advocating changing the structure of mzML at this 
point, I see this as a rather minor change.

> I would argue that your code snippet below would better look like:
>
> #define MS_CV_POLARITY_TYPE “MS:1000037”
>
> if( element.parent == “spectrumDescription” ) {
>
> for each child {
>
> if (child.name=="cvParam") then {
>
> if( cv.isChildOf(child.attrs[‘accession], MS_CV_POLARITY_TYPE) ) // if 
> a polarity type
>
> spectrum.polarity = cv.getName(child.attrs[‘accession’]);
>
> }
>
> }
>
> Note that the cvParam name (should that be “positive” or “Positive” or 
> “positive polarity” or “Polarity” or “polarity”?) is not in the code, 
> just MS:1000037 which can be considered final.
>
> This does require a CV class and some methods:
>
> cv.loadFromFile()
>
> cv.isChildOf()
>
> cv.getName()
>
> but this is not really complicated.
>

But it is really relatively complicated. It is more conceptually and 
computationally complicated than simple string comparison (with the 
OPTION of checking the CV to see if the value is a controlled one). And 
worse, it's a complication I don't see a justification for unless there 
is a better reason than the one you gave above which has a more simple 
solution. Why force parsers to create a CV class and methods just to 
ensure that "LCQ Deca" is spelled right (or that it's given its proper 
accession number)?

-Matt

Re: [Psidev-ms-dev] cvParams using name attribute as value

From: Angel P. <an...@ma...> - 2007-08-08 13:04:21

On 8/8/07, Eric Deutsch <ede...@sy...> wrote:
>
>  Thank you all for the lively discussion.
>
>
>
> One proposal I once made in Lyon (which was roundly dismissed I believe)
> was something like this: instead of:
>
>
>
> <cvParam cvLabel="MS" accession="MS:1000554" name="LCQ Deca" value=""/>
>
>
>
> Have:
>
>
>
> <cvParam cvLabel="MS" parentAccession="MS:1000031" accession="MS:1000554"
> name="LCQ Deca" value=""/>
>
>
>
> Thus the parser can easily be coded to know that any cvParam with a
> parentAccession="MS:1000031" is going to be an instrument model whether or
> not it's in the CV. The mzML semantic validator tool would, of course, check
> all this. The main argument against this was the potential for
> inconsistency, I seem to recall.
>

The argument was that MAGE v1 did cv terms this way and caused  tremendous
amount of confusion for the MAGE producers and array express annotation
checking team alike.  It is infinitely easier to deal with nested cvParams
than trying to output a term and a parent at the same time.

49 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 108 109 110 111 112 .. 125 > >> (Page 110 of 125)