Thread: [Psidev-ms-dev] cvParams using name attribute as value

Status: Beta

Brought to you by: aceol, baranda, cccolinc, chrisftaylor, and 22 others

psidev-ms-dev

[Psidev-ms-dev] cvParams using name attribute as value

From: Matthew C. <mat...@va...> - 2007-08-07 16:40:18

I'm a little confused about the parameters which use the accession number as
a kind of value instead of the accession number identifying a variable and
then using the value attribute to assign the value.  I don't understand why:

<cvParam cvLabel="MS" accession="MS:1000130" name="Positive Scan" value=""/>
(from mzML)

Is preferable to:

<cvParam cvLabel="psi" accession="PSI:1000037" name="Polarity"
value="positive"/> (from mzData)

 

There are other examples of this as well.  What's the logic here?

 

-Matt Chambers

Re: [Psidev-ms-dev] cvParams using name attribute as value

From: Eric D. <ede...@sy...> - 2007-08-07 17:12:46

Hi Matt, the agree-upon rule here is that the cvParams should always
refer to the most detailed concept, and the value attribute should
*only* be filled if there is a scalar value associated with the concept
that cannot be in the CV itself.  So:

=20

<cvParam cvLabel=3D"MS" accession=3D"MS:1000554" name=3D"LCQ Deca" =
value=3D""/>

<cvParam cvLabel=3D"MS" accession=3D"MS:1000529" name=3D"Instrument =
Serial
Number" value=3D"23433"/>

=20

So for the first, the term/concept is "LCQ Deca".  For the CV, one can
learn that an "LCQ Deca" IS A "instrument model", and so there's no need
(and is perhaps a little dangerous) to put "LCQ Deca" as a value of
"instrument model".

=20

However, "instrument serial number" is the most specific concept in the
CV, and thus the actual SN is the value.

=20

This was discussed at some length and this is the new way of doing
things, that will be uniform across all PSI and FuGE implementations. At
least, that is my understanding. This does mean that parsers need to be
a little smarter and be "CV-aware". The parser/interpreter can no longer
assume that there will be a term "instrument model" and look for its
value.  But rather, the parser/interpreter must now look to see if any
of the terms provided are a child of "instrument model" in the CV.

=20

Regards,

Eric

=20

=20

=20

________________________________

From: psi...@li...
[mailto:psi...@li...] On Behalf Of
Matthew Chambers
Sent: Tuesday, August 07, 2007 9:40 AM
To: psi...@li...
Subject: [Psidev-ms-dev] cvParams using name attribute as value

=20

I'm a little confused about the parameters which use the accession
number as a kind of value instead of the accession number identifying a
variable and then using the value attribute to assign the value.  I
don't understand why:

<cvParam cvLabel=3D"MS" accession=3D"MS:1000130" name=3D"Positive Scan"
value=3D""/> (from mzML)

Is preferable to:

<cvParam cvLabel=3D"psi" accession=3D"PSI:1000037" name=3D"Polarity"
value=3D"positive"/> (from mzData)

=20

There are other examples of this as well.  What's the logic here?

=20

-Matt Chambers

Re: [Psidev-ms-dev] cvParams using name attribute as value

From: Mike C. <tu...@gm...> - 2007-08-07 18:06:11

On 8/7/07, Eric Deutsch <ede...@sy...> wrote:
> <cvParam cvLabel="MS" accession="MS:1000554" name="LCQ Deca" value=""/>
>
> <cvParam cvLabel="MS" accession="MS:1000529" name="Instrument Serial Number"
> value="23433"/>
>
>
> So for the first, the term/concept is "LCQ Deca".  For the CV, one can learn
> that an "LCQ Deca" IS A "instrument model", and so there's no need (and is
> perhaps a little dangerous) to put "LCQ Deca" as a value of "instrument
> model".
>
>
> However, "instrument serial number" is the most specific concept in the CV,
> and thus the actual SN is the value.
>
>
> This was discussed at some length and this is the new way of doing things,
> that will be uniform across all PSI and FuGE implementations. At least, that
> is my understanding. This does mean that parsers need to be a little smarter
> and be "CV-aware". The parser/interpreter can no longer assume that there
> will be a term "instrument model" and look for its value.  But rather, the
> parser/interpreter must now look to see if any of the terms provided are a
> child of "instrument model" in the CV.

Actually, the parser really should not only check whether the term
provided *is* a child in the current CV, but also whether it ever
*will be* in a future version of the CV.  Unfortunately, the
technology required to make such a check is not yet available.  :-)

I'm not very familiar with how CV is supposed to work, but from this
example it appears that the namespaces for different kinds of things
have been merged together, and that there is an assumption that there
will be no collisions.  And that anything that doesn't currently have
a name basically doesn't exist.

In the example given of writing a parser, the task of extracting the
name of the instrument, given just the mzML file, is changed from
being trivial to being essentially impossible.  The mzML file becomes
meaningless in itself, and only has meaning relative to a particular
version of the CV, which the parser must have access to.

Am I misunderstanding something?

Mike

Re: [Psidev-ms-dev] cvParams using name attribute as value

From: Brian P. <bri...@in...> - 2007-08-07 18:45:49

Piling on with Mike, here:
 
So the first thing any parser must do is load up the OBO file.  In practice,
such a software system will need to bundle an OBO in some fashion, in the
extremely likely event that the OBO used by the mzML file in question is not
present.  Don't forget to update your distro each time the OBO gets updated,
and make sure that in the event the OBO used by the mzML file IS present,
you use that intead.
 
Then, read:

<cvParam cvLabel="MS" accession="MS:1000554" name="LCQ Deca" value=""/>

 

then ask yourself, "whazzat?", and look up:

id: MS:1000554
name: LCQ Deca
def: "ThermoFinnigan LCQ Deca." [PSI:MS]
is_a: MS:1000125 ! thermo finnigan

which leads you to:

id: MS:1000125
name: thermo finnigan
def: "ThermoFinnigan from Thermo Electron Corporation" [PSI:MS]
is_a: MS:1000483 ! thermo fisher scientific

which leads you to:

id: MS:1000483
name: thermo fisher scientific
def: "Thermo Fisher Scientific. Also known as Thermo Finnigan corporation."
[PSI:MS]
related_synonym: "Thermo Scientific" []
is_a: MS:1000031 ! model by vendor

which leads you to:

id: MS:1000031
name: model by vendor
def: "Instrument's model name (everything but the vendor's name) ---Free
text ?" [PSI:MS]
relationship: part_of MS:1000463 ! instrument description

which leads you to:

id: MS:1000463
name: instrument description
def: "Device which performs a measurement." [PSI:MS]
relationship: part_of MS:0000000 ! mzOntology

aha!  now populate the "instrument description" element in your database.

Which is all fine, in its way, until a new instrument "LCQ Spiff-o" comes
out and the OBO isn't immediately updated to match, in which case the parser
can't even tell that it's an instrument declaration.  This is a curiously
upside down way to write XML.  If I were designing it I'd make the CV stuff
an attribute of the instrument info, for anyone that cares to dive into the
OBO, but allow the XML to stand alone in the absence of a suitable OBO.  I'd
make an effort to use the same terminology in the XML element and attribute
names as in the OBO just to reduce confusion.  I guess what I'm describing
is something like mzXML with the addition of CV info as attributes of the
existing element types to aid those interested in using OBO to unify data
from different sources, without annoying those uninterested in unifying data
from different systems.

But, some of you will recall that the use of the CV stuff in lieu of proper
XML (in the sense that you have no real hope of making full sense of mzML
without access to an external file) is a longstanding crank of mine, and I
don't really expect to change it this late in the game.

- Brian


  _____  

From: psi...@li...
[mailto:psi...@li...] On Behalf Of Eric
Deutsch
Sent: Tuesday, August 07, 2007 10:13 AM
To: Matthew Chambers; psi...@li...
Subject: Re: [Psidev-ms-dev] cvParams using name attribute as value



Hi Matt, the agree-upon rule here is that the cvParams should always refer
to the most detailed concept, and the value attribute should *only* be
filled if there is a scalar value associated with the concept that cannot be
in the CV itself.  So:

 

<cvParam cvLabel="MS" accession="MS:1000554" name="LCQ Deca" value=""/>

<cvParam cvLabel="MS" accession="MS:1000529" name="Instrument Serial Number"
value="23433"/>

 

So for the first, the term/concept is "LCQ Deca".  For the CV, one can learn
that an "LCQ Deca" IS A "instrument model", and so there's no need (and is
perhaps a little dangerous) to put "LCQ Deca" as a value of "instrument
model".

 

However, "instrument serial number" is the most specific concept in the CV,
and thus the actual SN is the value.

 

This was discussed at some length and this is the new way of doing things,
that will be uniform across all PSI and FuGE implementations. At least, that
is my understanding. This does mean that parsers need to be a little smarter
and be "CV-aware". The parser/interpreter can no longer assume that there
will be a term "instrument model" and look for its value.  But rather, the
parser/interpreter must now look to see if any of the terms provided are a
child of "instrument model" in the CV.

 

Regards,

Eric

 

 

 

  _____  

From: psi...@li...
[mailto:psi...@li...] On Behalf Of Matthew
Chambers
Sent: Tuesday, August 07, 2007 9:40 AM
To: psi...@li...
Subject: [Psidev-ms-dev] cvParams using name attribute as value

 

I'm a little confused about the parameters which use the accession number as
a kind of value instead of the accession number identifying a variable and
then using the value attribute to assign the value.  I don't understand why:

<cvParam cvLabel="MS" accession="MS:1000130" name="Positive Scan" value=""/>
(from mzML)

Is preferable to:

<cvParam cvLabel="psi" accession="PSI:1000037" name="Polarity"
value="positive"/> (from mzData)

 

There are other examples of this as well.  What's the logic here?

 

-Matt Chambers

Re: [Psidev-ms-dev] cvParams using name attribute as value

From: Matthew C. <mat...@va...> - 2007-08-07 18:57:21

In addition to Mike's and Brian's concerns, I am wondering how "LCQ Deca" is
called a "term/concept?"  "Instrument model" is the closest relevant
term/concept as I understand those words.  Is the cvParam not capable of
controlling both the name and possible values of its definitions?  Also, why
are the different instrument models part of the CV anyway?  It seems that
the CV should support controlling both terms and the values (or instances)
of those terms:

"LCQ Deca" IS A VALID INSTANCE OF "thermo finnigan" IS A "thermo fisher
scientific" IS A "instrument model"

 I don't really understand the middle two jumps either, i.e. why are they
redundant?

 

  _____  

From: Eric Deutsch [mailto:ede...@sy...] 
Sent: Tuesday, August 07, 2007 12:13 PM
To: Matthew Chambers; psi...@li...
Subject: RE: [Psidev-ms-dev] cvParams using name attribute as value

 

Hi Matt, the agree-upon rule here is that the cvParams should always refer
to the most detailed concept, and the value attribute should *only* be
filled if there is a scalar value associated with the concept that cannot be
in the CV itself.  So:

 

<cvParam cvLabel="MS" accession="MS:1000554" name="LCQ Deca" value=""/>

<cvParam cvLabel="MS" accession="MS:1000529" name="Instrument Serial Number"
value="23433"/>

 

So for the first, the term/concept is "LCQ Deca".  For the CV, one can learn
that an "LCQ Deca" IS A "instrument model", and so there's no need (and is
perhaps a little dangerous) to put "LCQ Deca" as a value of "instrument
model".

 

However, "instrument serial number" is the most specific concept in the CV,
and thus the actual SN is the value.

 

This was discussed at some length and this is the new way of doing things,
that will be uniform across all PSI and FuGE implementations. At least, that
is my understanding. This does mean that parsers need to be a little smarter
and be "CV-aware". The parser/interpreter can no longer assume that there
will be a term "instrument model" and look for its value.  But rather, the
parser/interpreter must now look to see if any of the terms provided are a
child of "instrument model" in the CV.

 

Regards,

Eric

 

 

 

  _____  

From: psi...@li...
[mailto:psi...@li...] On Behalf Of Matthew
Chambers
Sent: Tuesday, August 07, 2007 9:40 AM
To: psi...@li...
Subject: [Psidev-ms-dev] cvParams using name attribute as value

 

I'm a little confused about the parameters which use the accession number as
a kind of value instead of the accession number identifying a variable and
then using the value attribute to assign the value.  I don't understand why:

<cvParam cvLabel="MS" accession="MS:1000130" name="Positive Scan" value=""/>
(from mzML)

Is preferable to:

<cvParam cvLabel="psi" accession="PSI:1000037" name="Polarity"
value="positive"/> (from mzData)

 

There are other examples of this as well.  What's the logic here?

 

-Matt Chambers

Re: [Psidev-ms-dev] cvParams using name attribute as value

From: Brian P. <bri...@in...> - 2007-08-07 20:00:51

Upon reflection, I realize that this is, for me, actually a new objection to
mzML.  My original problem with the reliance on CV/OBO is that an XML parser
for it looks something like this:
 
for each element {
    if (element.name=="cvParam") then {
       a whole bunch of handrolled logic to pick this apart
    } else {
      there isn't much else
    }
 }
 
That's not really an XML parser, therefore I conclude that mzML isn't really
XML.  But I have previously beaten that horse to death.  
 
Now we have something new not to like: it's impossible to write a parser
that's even remotely future-proof.  Or maybe it's not new, and I just missed
it before.  Either way, this all looks increasingly ill conceived to me.
Sorry to be such a downer.
 
Hey, the horse just twitched:  by placing CVparam information in attributes
of the elements of a conventionally structured XML schema (ala mzXML) we can
make use of the OBO work without adding a lot of unwanted complexity to
software systems that aren't really interested in it.  An mzML that
integrates well with OBO-aware systems is an excellent idea, but an mzML
that demands you BE an OBO-aware system seems less likely to achieve
widespread adoption.
 
I do understand the desire to maintain an ontology instead of an ontology
and an XML schema, but I'm not sure we can really get away with it.  By
having a schema that offloads most of its work to an external ontology,
we're just pushing the work that having a proper schema saves onto the folks
creating the readers and writers, making their job much more complicated
that it ought to be - you can't autogenerate a parser or serializer without
a fully realized schema.  I think we risk them deciding that mzXML and
mzData aren't really all that broken after all.
 
Brian

  _____  

From: psi...@li...
[mailto:psi...@li...] On Behalf Of Matthew
Chambers
Sent: Tuesday, August 07, 2007 11:57 AM
To: psi...@li...
Subject: Re: [Psidev-ms-dev] cvParams using name attribute as value



In addition to Mike's and Brian's concerns, I am wondering how "LCQ Deca" is
called a "term/concept?"  "Instrument model" is the closest relevant
term/concept as I understand those words.  Is the cvParam not capable of
controlling both the name and possible values of its definitions?  Also, why
are the different instrument models part of the CV anyway?  It seems that
the CV should support controlling both terms and the values (or instances)
of those terms:

"LCQ Deca" IS A VALID INSTANCE OF "thermo finnigan" IS A "thermo fisher
scientific" IS A "instrument model"

 I don't really understand the middle two jumps either, i.e. why are they
redundant?

 

  _____  

From: Eric Deutsch [mailto:ede...@sy...] 
Sent: Tuesday, August 07, 2007 12:13 PM
To: Matthew Chambers; psi...@li...
Subject: RE: [Psidev-ms-dev] cvParams using name attribute as value

 

Hi Matt, the agree-upon rule here is that the cvParams should always refer
to the most detailed concept, and the value attribute should *only* be
filled if there is a scalar value associated with the concept that cannot be
in the CV itself.  So:

 

<cvParam cvLabel="MS" accession="MS:1000554" name="LCQ Deca" value=""/>

<cvParam cvLabel="MS" accession="MS:1000529" name="Instrument Serial Number"
value="23433"/>

 

So for the first, the term/concept is "LCQ Deca".  For the CV, one can learn
that an "LCQ Deca" IS A "instrument model", and so there's no need (and is
perhaps a little dangerous) to put "LCQ Deca" as a value of "instrument
model".

 

However, "instrument serial number" is the most specific concept in the CV,
and thus the actual SN is the value.

 

This was discussed at some length and this is the new way of doing things,
that will be uniform across all PSI and FuGE implementations. At least, that
is my understanding. This does mean that parsers need to be a little smarter
and be "CV-aware". The parser/interpreter can no longer assume that there
will be a term "instrument model" and look for its value.  But rather, the
parser/interpreter must now look to see if any of the terms provided are a
child of "instrument model" in the CV.

 

Regards,

Eric

 

 

 

  _____  

From: psi...@li...
[mailto:psi...@li...] On Behalf Of Matthew
Chambers
Sent: Tuesday, August 07, 2007 9:40 AM
To: psi...@li...
Subject: [Psidev-ms-dev] cvParams using name attribute as value

 

I'm a little confused about the parameters which use the accession number as
a kind of value instead of the accession number identifying a variable and
then using the value attribute to assign the value.  I don't understand why:

<cvParam cvLabel="MS" accession="MS:1000130" name="Positive Scan" value=""/>
(from mzML)

Is preferable to:

<cvParam cvLabel="psi" accession="PSI:1000037" name="Polarity"
value="positive"/> (from mzData)

 

There are other examples of this as well.  What's the logic here?

 

-Matt Chambers

Re: [Psidev-ms-dev] cvParams using name attribute as value

From: Matthew C. <mat...@va...> - 2007-08-07 20:44:18

 

As long as the name/value paradigm is used, the loop doesn't get much more
complicated than:

if( element.parent == "spectrumDescription" )  {

   for each child {

      if (child.name=="cvParam") then {

         if( child.attrs['name'] == "Polarity" )

           spectrum.polarity = child.attrs['value'];

    }

 }

 

But if you have to do:

if( element.parent == "spectrumDescription" )  {

   for each child {

      if (child.name=="cvParam") then {

         if( child.attrs['name'] == "Positive" )

           spectrum.polarity = "positive";

         else if( child.attrs['name'] == "Negative" )

           spectrum.polarity = "negative";

    }

 }

...parsers will be painful to write and adoption will suffer because of it I
think.  Not to mention the fact that the idea of adding these things that
should really be values as "terms" in the vocabulary is indeed not
future-proof.  In the future, there might be another IS_A relationship for
"LCQ Deca" so that merely by seeing LCQ Deca you won't know that you're
looking at an instrument model parameter.  Of course, the accession number
would tell you uniquely, but then you'll have two accession numbers in the
vocabulary with the name "LCQ Deca."  Yuck!

 

I think values for terms should be given a special relationship in the CV,
they shouldn't be given an "IS_A" relationship and expect the parser to look
up the implication of that relationship every time a value-as-term is
encountered.

 

-Matt

 

 

  _____  

From: psi...@li...
[mailto:psi...@li...] On Behalf Of Brian
Pratt
Sent: Tuesday, August 07, 2007 3:00 PM
To: psi...@li...
Subject: Re: [Psidev-ms-dev] cvParams using name attribute as value

 

Upon reflection, I realize that this is, for me, actually a new objection to
mzML.  My original problem with the reliance on CV/OBO is that an XML parser
for it looks something like this:

 

for each element {

    if (element.name=="cvParam") then {

       a whole bunch of handrolled logic to pick this apart

    } else {

      there isn't much else

    }

 }

 

That's not really an XML parser, therefore I conclude that mzML isn't really
XML.  But I have previously beaten that horse to death.  

 

Now we have something new not to like: it's impossible to write a parser
that's even remotely future-proof.  Or maybe it's not new, and I just missed
it before.  Either way, this all looks increasingly ill conceived to me.
Sorry to be such a downer.

 

Hey, the horse just twitched:  by placing CVparam information in attributes
of the elements of a conventionally structured XML schema (ala mzXML) we can
make use of the OBO work without adding a lot of unwanted complexity to
software systems that aren't really interested in it.  An mzML that
integrates well with OBO-aware systems is an excellent idea, but an mzML
that demands you BE an OBO-aware system seems less likely to achieve
widespread adoption.

 

I do understand the desire to maintain an ontology instead of an ontology
and an XML schema, but I'm not sure we can really get away with it.  By
having a schema that offloads most of its work to an external ontology,
we're just pushing the work that having a proper schema saves onto the folks
creating the readers and writers, making their job much more complicated
that it ought to be - you can't autogenerate a parser or serializer without
a fully realized schema.  I think we risk them deciding that mzXML and
mzData aren't really all that broken after all.

 

Brian

Re: [Psidev-ms-dev] cvParams using name attribute as value

From: Eric D. <ede...@sy...> - 2007-08-08 06:35:22

Thank you all for the lively discussion.

=20

One proposal I once made in Lyon (which was roundly dismissed I believe)
was something like this: instead of:

=20

<cvParam cvLabel=3D"MS" accession=3D"MS:1000554" name=3D"LCQ Deca" =
value=3D""/>

=20

Have:

=20

<cvParam cvLabel=3D"MS" parentAccession=3D"MS:1000031"
accession=3D"MS:1000554" name=3D"LCQ Deca" value=3D""/>

=20

Thus the parser can easily be coded to know that any cvParam with a
parentAccession=3D"MS:1000031" is going to be an instrument model =
whether
or not it's in the CV. The mzML semantic validator tool would, of
course, check all this. The main argument against this was the potential
for inconsistency, I seem to recall.

=20

The decision was made to make individual models cv terms to avoid
problems like:

=20

<cvParam cvLabel=3D"MS" accession=3D"MS:1000031" name=3D"instrument =
model"
value=3D"LCQ Deca"/>

<cvParam cvLabel=3D"MS" accession=3D"MS:1000031" name=3D"instrument =
model"
value=3D"LCQ DECA"/>

<cvParam cvLabel=3D"MS" accession=3D"MS:1000031" name=3D"instrument =
model"
value=3D"LTQ FT"/>

<cvParam cvLabel=3D"MS" accession=3D"MS:1000031" name=3D"instrument =
model"
value=3D"LTQ-FT"/>

<cvParam cvLabel=3D"MS" accession=3D"MS:1000031" name=3D"instrument =
model"
value=3D"LTQFT"/>

=20

I would argue that your code snippet below would better look like:

=20

#define MS_CV_POLARITY_TYPE "MS:1000037"

if( element.parent =3D=3D "spectrumDescription" )  {

   for each child {

      if (child.name=3D=3D"cvParam") then {

         if( cv.isChildOf(child.attrs['accession], MS_CV_POLARITY_TYPE)
)    // if a polarity type

           spectrum.polarity =3D cv.getName(child.attrs['accession']);

    }

 }

=20

Note that the cvParam name (should that be "positive" or "Positive" or
"positive polarity" or "Polarity" or "polarity"?) is not in the code,
just MS:1000037 which can be considered final.

=20

This does require a CV class and some methods:

cv.loadFromFile()

cv.isChildOf()

cv.getName()

=20

but this is not really complicated.

=20

Take cover!

Eric

=20

=20

________________________________

From: psi...@li...
[mailto:psi...@li...] On Behalf Of
Matthew Chambers
Sent: Tuesday, August 07, 2007 1:43 PM
To: psi...@li...
Subject: Re: [Psidev-ms-dev] cvParams using name attribute as value

=20

=20

As long as the name/value paradigm is used, the loop doesn't get much
more complicated than:

if( element.parent =3D=3D "spectrumDescription" )  {

   for each child {

      if (child.name=3D=3D"cvParam") then {

         if( child.attrs['name'] =3D=3D "Polarity" )

           spectrum.polarity =3D child.attrs['value'];

    }

 }

=20

But if you have to do:

if( element.parent =3D=3D "spectrumDescription" )  {

   for each child {

      if (child.name=3D=3D"cvParam") then {

         if( child.attrs['name'] =3D=3D "Positive" )

           spectrum.polarity =3D "positive";

         else if( child.attrs['name'] =3D=3D "Negative" )

           spectrum.polarity =3D "negative";

    }

 }

...parsers will be painful to write and adoption will suffer because of
it I think.  Not to mention the fact that the idea of adding these
things that should really be values as "terms" in the vocabulary is
indeed not future-proof.  In the future, there might be another IS_A
relationship for "LCQ Deca" so that merely by seeing LCQ Deca you won't
know that you're looking at an instrument model parameter.  Of course,
the accession number would tell you uniquely, but then you'll have two
accession numbers in the vocabulary with the name "LCQ Deca."  Yuck!

=20

I think values for terms should be given a special relationship in the
CV, they shouldn't be given an "IS_A" relationship and expect the parser
to look up the implication of that relationship every time a
value-as-term is encountered.

=20

-Matt

=20

=20

________________________________

From: psi...@li...
[mailto:psi...@li...] On Behalf Of Brian
Pratt
Sent: Tuesday, August 07, 2007 3:00 PM
To: psi...@li...
Subject: Re: [Psidev-ms-dev] cvParams using name attribute as value

=20

Upon reflection, I realize that this is, for me, actually a new
objection to mzML.  My original problem with the reliance on CV/OBO is
that an XML parser for it looks something like this:

=20

for each element {

    if (element.name=3D=3D"cvParam") then {

       a whole bunch of handrolled logic to pick this apart

    } else {

      there isn't much else

    }

 }

=20

That's not really an XML parser, therefore I conclude that mzML isn't
really XML.  But I have previously beaten that horse to death. =20

=20

Now we have something new not to like: it's impossible to write a parser
that's even remotely future-proof.  Or maybe it's not new, and I just
missed it before.  Either way, this all looks increasingly ill conceived
to me.  Sorry to be such a downer.

=20

Hey, the horse just twitched:  by placing CVparam information in
attributes of the elements of a conventionally structured XML schema
(ala mzXML) we can make use of the OBO work without adding a lot of
unwanted complexity to software systems that aren't really interested in
it.  An mzML that integrates well with OBO-aware systems is an excellent
idea, but an mzML that demands you BE an OBO-aware system seems less
likely to achieve widespread adoption.

=20

I do understand the desire to maintain an ontology instead of an
ontology and an XML schema, but I'm not sure we can really get away with
it.  By having a schema that offloads most of its work to an external
ontology, we're just pushing the work that having a proper schema saves
onto the folks creating the readers and writers, making their job much
more complicated that it ought to be - you can't autogenerate a parser
or serializer without a fully realized schema.  I think we risk them
deciding that mzXML and mzData aren't really all that broken after all.

=20

Brian

=20

Re: [Psidev-ms-dev] cvParams using name attribute as value

From: Angel P. <an...@ma...> - 2007-08-08 13:04:21

On 8/8/07, Eric Deutsch <ede...@sy...> wrote:
>
>  Thank you all for the lively discussion.
>
>
>
> One proposal I once made in Lyon (which was roundly dismissed I believe)
> was something like this: instead of:
>
>
>
> <cvParam cvLabel="MS" accession="MS:1000554" name="LCQ Deca" value=""/>
>
>
>
> Have:
>
>
>
> <cvParam cvLabel="MS" parentAccession="MS:1000031" accession="MS:1000554"
> name="LCQ Deca" value=""/>
>
>
>
> Thus the parser can easily be coded to know that any cvParam with a
> parentAccession="MS:1000031" is going to be an instrument model whether or
> not it's in the CV. The mzML semantic validator tool would, of course, check
> all this. The main argument against this was the potential for
> inconsistency, I seem to recall.
>

The argument was that MAGE v1 did cv terms this way and caused  tremendous
amount of confusion for the MAGE producers and array express annotation
checking team alike.  It is infinitely easier to deal with nested cvParams
than trying to output a term and a parent at the same time.

Re: [Psidev-ms-dev] cvParams using name attribute as value

From: Angel P. <an...@ma...> - 2007-08-07 20:10:15

On 8/7/07, Brian Pratt <bri...@in...> wrote:
>
>
> Hey, the horse just twitched:  by placing CVparam information in
> attributes of the elements of a conventionally structured XML schema (ala
> mzXML) we can make use of the OBO work without adding a lot of unwanted
> complexity to software systems that aren't really interested in it.  An
> mzML that integrates well with OBO-aware systems is an excellent idea, but
> an mzML that demands you BE an OBO-aware system seems less likely to achieve
> widespread adoption.
>

Can you name specific attributes that you want to have cv terms be the value
for that are currently not in the schema?
-angel

Re: [Psidev-ms-dev] cvParams using name attribute as value

From: Brian P. <bri...@in...> - 2007-08-07 21:20:28

Hi Angel,
 
If I understand your question to be about identifying current mismatches
between terminology in the schema and the ontology, I'm not sure there are
any - but probably only because the schema has so little actual terminology
in it.  Consider this example:
 
<xs:element name="selectionWindow" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="cvParam" type="dx:CVParamType" minOccurs="2"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
 
which says absolutely nothing at all about what a selectionWindow element
can be expected to contain when you encounter it.  It just says it will
contain at least two "parameters".  Not much of an aid to software
development.
 
The schema, if we can call it that, doesn't even specify what some of the
most fundamental information about a scan looks like.  For example, it
specifies that a scan may have a list of precursors, each of which will
contain an ionSelection, but stops short of telling you what an ionSelection
looks like:
<xs:element name="ionSelection" type="dx:ParamGroupType">
<xs:annotation>
<xs:documentation>This captures the type of ion selection being performed,
and trigger m/z (or m/z's), neutral loss criteria etc. for tandem-MS or data
dependent scans.</xs:documentation>
</xs:annotation>
</xs:element>

 Nearly all the details of nearly all the elements are just unspecified
blobs.  Normally with an XML format you can expect to at least start your
work by running it through something like XMLSpy that will autogenerate a
reader and a writer that you can then polish up (to handle, for example, the
necessary weirdness of base64+zlib in the peaklists).  But with this, you
get no kind of a head start at all, since the vast majority of the syntax is
hidden behind blobs like dx:CVParamType and dx:ParamGroupType.  It's just
not a specification.
 
The statement that led to your question, I think, was just me saying that if
we *did* create an actual schema, we'd want its terminology to agree with
the ontology where ever possible.  But it has to actually contain some
terminology, unlike the current schema.
 
Brian
 
 
  _____  

From: del...@gm... [mailto:del...@gm...] On Behalf Of Angel
Pizarro
Sent: Tuesday, August 07, 2007 1:10 PM
To: Brian Pratt
Cc: psi...@li...
Subject: Re: [Psidev-ms-dev] cvParams using name attribute as value




On 8/7/07, Brian Pratt <bri...@in...> wrote: 


Hey, the horse just twitched:  by placing CVparam information in attributes
of the elements of a conventionally structured XML schema (ala mzXML) we can
make use of the OBO work without adding a lot of unwanted complexity to
software systems that aren't really interested in it.  An mzML that
integrates well with OBO-aware systems is an excellent idea, but an mzML
that demands you BE an OBO-aware system seems less likely to achieve
widespread adoption.


Can you name specific attributes that you want to have cv terms be the value
for that are currently not in the schema? 
-angel

Re: [Psidev-ms-dev] cvParams using name attribute as value

From: Angel P. <an...@ma...> - 2007-08-08 13:00:35

On 8/7/07, Brian Pratt <bri...@in...> wrote:
>
>  Hi Angel,
>
> If I understand your question to be about identifying current mismatches
> between terminology in the schema and the ontology, I'm not sure there are
> any - but probably only because the schema has so little actual terminology
> in it.
>

My question was more of a pragmatic one, about where would you add
specificity into the mzML schema. Your selecitonWindow example below is a
good one, in that the specification of of selectWindow is probably a range
value and we should  have two sub-elements that corresponding to type the
cvParam values to define the window (or just a well defined range
sub-element, skipping cvParam altogether).

I don't think your second example is a good one tho, since there are so many
permutations of an ionSelection protocol and that more are certainly one the
way, t is better handled by an ontology specification. Yes this does make
parsers slightly harder, since now you must pay attention to the incoming
ontology, but it is the same amount of work as if everything was in the
schema.

mzXML could get away with tight specification of these complex and changing
annotations, since its sole purpose was support of the ISB pipeline. Its
open source status only served to increase the user base, but the schema
changes were solely driven by the needs of that pipeline and solely by the
community that used it. Tryin to build consensus across many different
groups has led to the current version of mzML and that major structure of
mzML will not change at this point, so please let's just get to the
specifics of going through the schema and identifying where you think an
annotation should be promoted to the level of a schema element, and we'll
discuss as a group.

-angel


Consider this example:
>
> <xs:element name="selectionWindow" maxOccurs="unbounded">
> <xs:complexType>
> <xs:sequence>
> <xs:element name="cvParam" type="dx:CVParamType" minOccurs="2"
> maxOccurs="unbounded"/>
> </xs:sequence>
> </xs:complexType>
> </xs:element>
>
> which says absolutely nothing at all about what a selectionWindow
> element can be expected to contain when you encounter it.  It just says it
> will contain at least two "parameters".  Not much of an aid to software
> development.
>
> The schema, if we can call it that, doesn't even specify what some of the
> most fundamental information about a scan looks like.  For example, it
> specifies that a scan may have a list of precursors, each of which will
> contain an ionSelection, but stops short of telling you what an
> ionSelection looks like:
>
> <xs:element name="ionSelection" type="dx:ParamGroupType">
> <xs:annotation>
> <xs:documentation>This captures the type of ion selection being performed,
> and trigger m/z (or m/z's), neutral loss criteria etc. for tandem-MS or data
> dependent scans.</xs:documentation>
> </xs:annotation>
> </xs:element>
>  Nearly all the details of nearly all the elements are just unspecified
> blobs.  Normally with an XML format you can expect to at least start your
> work by running it through something like XMLSpy that will autogenerate a
> reader and a writer that you can then polish up (to handle, for example, the
> necessary weirdness of base64+zlib in the peaklists).  But with this, you
> get no kind of a head start at all, since the vast majority of the syntax is
> hidden behind blobs like dx:CVParamType and dx:ParamGroupType.  It's just
> not a specification.
>
> The statement that led to your question, I think, was just me saying that
> if we *did* create an actual schema, we'd want its terminology to agree with
> the ontology where ever possible.  But it has to actually contain
> some terminology, unlike the current schema.
>
> Brian
>
>
>  ------------------------------
> *From:* del...@gm... [mailto:del...@gm...] *On Behalf Of *Angel
> Pizarro
> *Sent:* Tuesday, August 07, 2007 1:10 PM
> *To:* Brian Pratt
> *Cc:* psi...@li...
> *Subject:* Re: [Psidev-ms-dev] cvParams using name attribute as value
>
>
>
> On 8/7/07, Brian Pratt <bri...@in...> wrote:
> >
> >
> > Hey, the horse just twitched:  by placing CVparam information in
> > attributes of the elements of a conventionally structured XML schema (ala
> > mzXML) we can make use of the OBO work without adding a lot of unwanted
> > complexity to software systems that aren't really interested in it.  An
> > mzML that integrates well with OBO-aware systems is an excellent idea, but
> > an mzML that demands you BE an OBO-aware system seems less likely to achieve
> > widespread adoption.
> >
>
> Can you name specific attributes that you want to have cv terms be the
> value for that are currently not in the schema?
> -angel
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >>  http://get.splunk.com/
> _______________________________________________
> Psidev-ms-dev mailing list
> Psi...@li...
> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev
>
>


-- 
Angel Pizarro
Director, Bioinformatics Facility
Institute for Translational Medicine and Therapeutics
University of Pennsylvania
806 BRB II/III
421 Curie Blvd.
Philadelphia, PA 19104-6160

P: 215-573-3736
F: 215-573-9004

Re: [Psidev-ms-dev] cvParams using name attribute as value

From: Brian P. <bri...@in...> - 2007-08-08 15:49:39

If ionSelection is just one of many things that are too complicated and
varied and dynamic to actually specify, then just off the top of my head I
think it's going to be pretty hard to do a good job of parsing mzML.  I take
your point about mzXML being too specific, but there's such a thing as too
general as well.  My fear is that we'll see it balkanized, with most parsers
only really able to deal with the mode of mzML usage that the author really
cares about, which just leaves us with a bunch of ad hoc standards.  The
instrument name example (wherein a parser cannot be made robust enough to
read future versions) makes me think that not enough mental energy has gone
into considering the practicalities of being a consumer of mzML.  I've seen
this in other standards efforts I've been involved with in other industries
(internet security, circuit board manufacturing) - writers (mostly hardware
vendors) love the fexibility because they can just do it their way, but
readers (software vendors) bear the brunt of what amounts to one format per
vendor, and finally just fall back onto the per-vendor solutions they have
already invested in.
 
>> it is the same amount of work as if everything was in the schema. 
There actually *is* an advantage of specifying via schema instead of
ontology, which I've already pointed out - W3C schema is itself a standard
with a host of tools built up around it that will generate readers and
writers from properly formed schemas.  If mzML just used elements for
everything and each element had an attribute pointing at the ontololgy I
think we'd be better off.  The schema and the ontology would need to evolve
together, of course.
 
But, as you say, this thing is more or less nailed down at this point, so
I'm wasting the list's time with this schema talk, and I do apologise.  I
don't blame anyone for being annoyed at me dredging up these fundamental
objections yet again so late in the process.
 
Anyway, off for vacation until the end of next week.  Sorry to start a flame
then abandon it.
 
Cheers,
 
Brian

  _____  

From: del...@gm... [mailto:del...@gm...] On Behalf Of Angel
Pizarro
Sent: Wednesday, August 08, 2007 6:01 AM
To: Brian Pratt
Cc: psi...@li...
Subject: Re: [Psidev-ms-dev] cvParams using name attribute as value


On 8/7/07, Brian Pratt <bri...@in...> wrote: 

Hi Angel,
 
If I understand your question to be about identifying current mismatches
between terminology in the schema and the ontology, I'm not sure there are
any - but probably only because the schema has so little actual terminology
in it.


My question was more of a pragmatic one, about where would you add
specificity into the mzML schema. Your selecitonWindow example below is a
good one, in that the specification of of selectWindow is probably a range
value and we should  have two sub-elements that corresponding to type the
cvParam values to define the window (or just a well defined range
sub-element, skipping cvParam altogether).  

I don't think your second example is a good one tho, since there are so many
permutations of an ionSelection protocol and that more are certainly one the
way, t is better handled by an ontology specification. Yes this does make
parsers slightly harder, since now you must pay attention to the incoming
ontology, but it is the same amount of work as if everything was in the
schema. 

mzXML could get away with tight specification of these complex and changing
annotations, since its sole purpose was support of the ISB pipeline. Its
open source status only served to increase the user base, but the schema
changes were solely driven by the needs of that pipeline and solely by the
community that used it. Tryin to build consensus across many different
groups has led to the current version of mzML and that major structure of
mzML will not change at this point, so please let's just get to the
specifics of going through the schema and identifying where you think an
annotation should be promoted to the level of a schema element, and we'll
discuss as a group. 

-angel
 



Consider this example:
 
<xs:element name="selectionWindow" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="cvParam" type="dx:CVParamType" minOccurs="2"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
 
which says absolutely nothing at all about what a selectionWindow element
can be expected to contain when you encounter it.  It just says it will
contain at least two "parameters".  Not much of an aid to software
development.
 
The schema, if we can call it that, doesn't even specify what some of the
most fundamental information about a scan looks like.  For example, it
specifies that a scan may have a list of precursors, each of which will
contain an ionSelection, but stops short of telling you what an ionSelection
looks like:
<xs:element name="ionSelection" type="dx:ParamGroupType">
<xs:annotation>
<xs:documentation>This captures the type of ion selection being performed,
and trigger m/z (or m/z's), neutral loss criteria etc. for tandem-MS or data
dependent scans.</xs:documentation>
</xs:annotation>
</xs:element>

 Nearly all the details of nearly all the elements are just unspecified
blobs.  Normally with an XML format you can expect to at least start your
work by running it through something like XMLSpy that will autogenerate a
reader and a writer that you can then polish up (to handle, for example, the
necessary weirdness of base64+zlib in the peaklists).  But with this, you
get no kind of a head start at all, since the vast majority of the syntax is
hidden behind blobs like dx:CVParamType and dx:ParamGroupType .  It's just
not a specification.
 
The statement that led to your question, I think, was just me saying that if
we *did* create an actual schema, we'd want its terminology to agree with
the ontology where ever possible.  But it has to actually contain some
terminology, unlike the current schema.
 
Brian
 
 
  _____  

From: del...@gm... [mailto:del...@gm...] On Behalf Of Angel
Pizarro
Sent: Tuesday, August 07, 2007 1:10 PM
To: Brian Pratt
Cc: psi...@li...
Subject: Re: [Psidev-ms-dev] cvParams using name attribute as value





On 8/7/07, Brian Pratt <bri...@in...> wrote: 


Hey, the horse just twitched:  by placing CVparam information in attributes
of the elements of a conventionally structured XML schema (ala mzXML) we can
make use of the OBO work without adding a lot of unwanted complexity to
software systems that aren't really interested in it.  An mzML that
integrates well with OBO-aware systems is an excellent idea, but an mzML
that demands you BE an OBO-aware system seems less likely to achieve
widespread adoption.


Can you name specific attributes that you want to have cv terms be the value
for that are currently not in the schema? 
-angel

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser. 
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________ 
Psidev-ms-dev mailing list
Psi...@li...
https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev






-- 
Angel Pizarro
Director, Bioinformatics Facility
Institute for Translational Medicine and Therapeutics 
University of Pennsylvania
806 BRB II/III
421 Curie Blvd.
Philadelphia, PA 19104-6160

P: 215-573-3736
F: 215-573-9004

Re: [Psidev-ms-dev] cvParams using name attribute as value

From: Matt C. <mat...@va...> - 2007-08-08 13:24:53

Eric Deutsch wrote:
>
> The decision was made to make individual models cv terms to avoid 
> problems like:
>
> <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" 
> value="LCQ Deca"/>
>
> <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" 
> value="LCQ DECA"/>
>
> <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" 
> value="LTQ FT"/>
>
> <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" 
> value="LTQ-FT"/>
>
> <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" 
> value="LTQFT"/>
>
Is this the main/only reason for this usage of terms? This just seems 
like a great argument for having the ontology control the values of the 
terms and not just the terms themselves. That way, the simple 
term/name->value relationship is always maintained, and this problem is 
eliminated. I am not advocating changing the structure of mzML at this 
point, I see this as a rather minor change.

> I would argue that your code snippet below would better look like:
>
> #define MS_CV_POLARITY_TYPE “MS:1000037”
>
> if( element.parent == “spectrumDescription” ) {
>
> for each child {
>
> if (child.name=="cvParam") then {
>
> if( cv.isChildOf(child.attrs[‘accession], MS_CV_POLARITY_TYPE) ) // if 
> a polarity type
>
> spectrum.polarity = cv.getName(child.attrs[‘accession’]);
>
> }
>
> }
>
> Note that the cvParam name (should that be “positive” or “Positive” or 
> “positive polarity” or “Polarity” or “polarity”?) is not in the code, 
> just MS:1000037 which can be considered final.
>
> This does require a CV class and some methods:
>
> cv.loadFromFile()
>
> cv.isChildOf()
>
> cv.getName()
>
> but this is not really complicated.
>

But it is really relatively complicated. It is more conceptually and 
computationally complicated than simple string comparison (with the 
OPTION of checking the CV to see if the value is a controlled one). And 
worse, it's a complication I don't see a justification for unless there 
is a better reason than the one you gave above which has a more simple 
solution. Why force parsers to create a CV class and methods just to 
ensure that "LCQ Deca" is spelled right (or that it's given its proper 
accession number)?

-Matt

Re: [Psidev-ms-dev] cvParams using name attribute as value

From: Mike C. <tu...@gm...> - 2007-08-08 16:24:34

On 8/8/07, Matt Chambers <mat...@va...> wrote:
> > This does require a CV class and some methods:
> > cv.loadFromFile()
> > cv.isChildOf()
> > cv.getName()
> >
> > but this is not really complicated.
>
> But it is really relatively complicated. It is more conceptually and
> computationally complicated than simple string comparison (with the
> OPTION of checking the CV to see if the value is a controlled one). And
> worse, it's a complication I don't see a justification for unless there
> is a better reason than the one you gave above which has a more simple
> solution.

I agree with Matt.  A call like "isChildOf" looks simple, but what's
entailed in that call is that the *correct* CV is available and has
been parsed into a tree in memory.  There are good reasons to think
that this will be fairly difficult to do correctly in practice.

But on top of that, it just seems needlessly difficult.  It'd be a
little like having products in your grocery store marked with their
trademark name, but not a succinct description of what they
*are*--which you can only find out with a stock list lookup.
("Shimmer?  Is that a floor polish or a dessert topping?  Hope my
stock list is up to date...")

The alternative here would appear to be very simple.  Something like
the previously mentioned

   <cvParam cvLabel="MS" accession="MS:1000031" name="instrument
model" value="LTQ-FT"/>

would work fine.  As for the differing spellings of "LTQ-FT", there's
a canonical spelling available in the CV, and anyone that can't get
that right will probably find the complexity of multiple CV versions
insurmountable.

Consider also, how should newly created instruments be handled?  If
our lab invents the "MassMaster2000", do we need to create our own
augmented CV in order to handle this?  Does everyone who wants to read
MassMaster2000 mzML files need a copy of this augmented CV?  What if
they have twenty other augmented CVs?  How are those to be managed?

Mike