Re: [Psidev-pi-dev] Issue 42 in psi-pi: Issues with the CV

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Jones, Andy wrote:
>  Hi all,
>
>  The issues list is getting a bit messy with essentially a mailing
>  list discussion so I'll shift the discussion back here :-)
>
>  There are two points up for discussion.
>
>  1) Use of identifiers for input spectra 2) CV terms shared between
>  psi-ms and psi-pi
>
>  In terms of 1) I've worked through Matt's argument and I'm in general
>  agreement that we would like to use the same system for identifying
>  the input spectrum - these CV terms have only been added relatively
>  recently. I did not realise that the nativeID attribute had been
>  specified to this level, since there is no documentation about this
>  is in the XSD or mzML specification document.
>
>  I don't think we should change the name of the attribute though,
>  since nativeID makes sense for an element called <Spectrum> in mzML
>  but not for an element <SpectrumIdentificationResult> in analysisXML.
>  For referencing mzML spectra, I'm still not sure which attribute we
>  should choose to reference since the "true" (and guaranteed unique)
>  spectrum identifier in mzML is actually the ID attribute. I can
>  envisage a case where instruments output mzML directly and the
>  nativeID is not implemented sensibly. The xs:ID datatype on "ID"
>  guarantees that these will always be unique whatever changes happen
>  to documentation in the future or whatever tools are used to create
>  the file.
I contest the term "guaranteed unique" since the one doing the 
guaranteeing is the schema and there is no guarantee that somebody runs 
their output through a schema validator. :) If you take the validation 
step to the semantic validator (which is what the standard demands), the 
nativeID term is also guaranteed to be unique (and must be "implemented 
sensibly"), and as David suggested earlier, it should be possible to add 
a uniqueness constraint to the nativeID attribute in the schema even 
though it is xsd:string (but uniqueness is not so helpful when the 
actual form of a Thermo RAW id must be: "controller=xsd:positiveInteger 
scan=xsd:nonZeroInteger"). The name of the attribute doesn't bother me, 
but I don't understand your reasoning for not changing it. :)

>  So I agree with Matt but I don't want to change the schema :-) I'm
>  happy to add something to the documentation specifying how different
>  identifiers should be implemented, following the rules in the psi-ms
>  CV.
If the attribute name doesn't change, only the xsd documentation needs 
to be updated to reflect which attribute the spectrumID points to and 
that it can be used even if the input spectra file is not mzML!

>  In terms of 2), we had made a decision in the past that we would
>  simply create terms as we need them in PSI-PI, rather than worrying
>  if they should be common between psi-ms and psi-pi and trying to
>  coordinate updates across groups. If a term is present in psi-ms with
>  the exact meaning that we want (taking into account its position in
>  the hierarchy), I think we should just use it and update the mapping
>  file to reference it. Are there many terms from psi-ms that we want
>  to use?
It's looking like scan time (aka retention time) will be useful in 
analysisXML as an "alternative identifier" for the special use case of 
converting existing search results to analysisXML where a reliable 
nativeID to the original vendor format has been lost. Presumably, even 
in this use case a nativeID could be provided to point back to a 
spectrum in the search engine's immediate spectra input file (i.e. 
MGF).  If not even that is possible, either spectrumID has to be 
optional or the use case is rather suspect. :)

Additionally, if your "spectrumID" attribute matches the "nativeID" 
attribute in mzML, the mapping file must require one of the nativeID 
format terms in the file header: the specific place is TBD in 
analysisXML, in mzML it's mapped to the fileDescription element. 
Remember, nativeID is always available from any input spectra file, so 
there's no problem requiring it as long as decent references to the 
input spectra are maintained.

The scan time as an "alternative identifier" issue makes me wonder if a 
"scan time native spectrum identifier" term is called for. It still 
wouldn't solve all of the problems with David's use case (i.e. if the 
MGF was missing RTINSECONDS attributes), but it seems potentially useful.

-Matt

>  I am working on the spec document today and would like to get all
>  issues tidied up ASAP... Cheers Andy
>
>
>
>
>
>
> > -----Original Message----- From: cod...@go...
> > [mailto:cod...@go...] Sent: 30 November 2008 19:36
> > To: psi...@li... Subject: [Psidev-pi-dev]
> > Issue 42 in psi-pi: Issues with the CV
> >
> >
> > Comment #56 on issue 42 by matthew....@vanderbilt.edu: Issues with
> > the CV http://code.google.com/p/psi-pi/issues/detail?id=42
> >
> > Yes, I was at that meeting too. :) The one (important, IMO) use
> > case we did not consider at that time is output of analysisXML
> > without a corresponding mzML document. In such a case, the mzML
> > arbitrary id does not exist, but the nativeID does. This fact
> > convinces me that nativeID is a better reference than the arbitrary
> > id.
> >
> > The change of attribute name to nativeID is not so critical, but I
> > think the risk of confusing the spectrumID with the id attribute
> > when it actually points to the nativeID attribute is worse than the
> > risk of confusing the nativeID attribute with some property of the
> > search engine. I think the documentation for the nativeID attribute
> > can easily make it clear what it's supposed to reference,
> > especially since it's on a spectrum-centric element; you can copy
> > it from the mzML schema (although I think this documentation could
> > be improved upon): <xs:documentation>The native identifier for the
> > spectrum, used by the acquisition software.</xs:documentation>
> >
> > It's good to know about the header information. The nativeID (or
> > whatever it's called in analysisXML) format term would go in the
> > spectra input definition as a CV Param required by the mapping
> > file.
>