Re: [Psidev-pi-dev] Fragment ion information

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Yes, and the answer is:
a lib is a result,
mzML does not anntoate spectra, as it is the task of an interpretation 
analysis. Therefore, a library should be stored in an AnalysisXML 
format, not mzML...
Pierre-Alain

Angel Pizarro wrote:
> Matt, have the mzML group thought about annotated spectra libs, like 
> those provided by NIST, yet?
> -angel
>
> On Thu, Jul 3, 2008 at 9:21 AM, Matt Chambers 
> <mat...@va... 
> <mailto:mat...@va...>> wrote:
>
>     Hi Lennart,
>
>     If you think about storing fragments in as verbose a format as Andy
>     Jones's suggestion, for every result in every spectrum (keeping in
>     mind
>     that more than the top result/s is/are written out per spectrum), it
>     would represent an intolerable (to me) bloat to the file. As I
>     understand it, we want to mention the algorithm, its parameters,
>     and its
>     version in the CV. We would then recommend to search engine developers
>     that when they implement support for analysisXML they provide an
>     online
>     script for generating fragments based on the controlled parameters and
>     the algorithm version. I do not think this reconstruction of the
>     fragment information is "next to impossible" as long as such a
>     script is
>     provided.
>
>     Alternatively, I think we could come up with a much briefer format to
>     store the fragments in, something like:
>     <FragmentIonMatches>b2 y2 y6-NH3 y6-NH3(+2)</FragmentIonMatches>
>
>     It's ugly as sin, but we can come up with a controlled pattern to
>     store
>     the ion types in. The numbers are mostly redundant: the expected m/z
>     values can be recalculated from the ion type and the sequence, and the
>     observed m/z values can be looked up in the spectrum according to some
>     rules regarding mass/m/z tolerances and whatever data processing was
>     applied to the original spectrum by the search engine (which
>     again, is a
>     good reason to have search engines write out the results of their
>     preprocessing to an mzML file).
>
>     -Matt
>
>
>     Lennart Martens wrote:
>     > Dear PSI-PI'ers,
>     >
>     >
>     > I recently came across a discussion related to the inclusion of
>     fragment
>     > ions (as called by the search engine during identification) in the
>     > analysisXML format (see issue 28 on the Google tracker, direct link:
>     > http://code.google.com/p/psi-pi/issues/detail?id=28).
>     >
>     > It somehow seems that popular opinion is against inclusion of
>     this vital
>     > piece of information, and that makes me very worried. One of the
>     > comments on the issue page in fact is that fragment ion calling is
>     > algorithm specific (which is true), and therefore should not be
>     a part
>     > of analysisXML.
>     > I'd actually like to use this same datum to strongly argue the other
>     > way: since the calling is algorithm specific, it is next to
>     impossible
>     > to reconstruct the original calling after analysisXML export. So
>     > essentially, a vital piece of information (the ability of the
>     spectrum
>     > to support the peptide identification as judged by the algorithm) is
>     > thrown away during analysisXML conversion or output.
>     >
>     > I also believe that the difficulty in annotating which fragments are
>     > called from the spectrum is definitely not insurmountable. The
>     link with
>     > mzML should be there anyway (otherwise you would not even be able to
>     > retrieve the spectrum the identification was made from, an
>     unthinkable
>     > scenario), so inclusion of this is trivial (as in: already there).
>     > Additionally, the unambiguous reference to the exact peak called
>     in the
>     > spectrum is also trivial: simply copy in the actual mass - or more
>     > likely: m/z - in the analysisXML tag. Ion type should be easy
>     enough to
>     > annotate (there are only so many ion types, and these can be
>     modelled in
>     > CV), while charge state is a call made by the algorithm anyway,
>     and can
>     > therefore also be included easily. So this essentially fully
>     backs up
>     > Andy Jones' suggested tag format on the issue 28 page. And Andy has
>     > included some other information, such as 'subsequence' and
>     'theoretical
>     > mass' which people are free to dicuss the usefulness of (as it
>     probably
>     > constitutes redundant information).
>     >
>     > So my conclusion is: it's relatively easy to do, will capture vital
>     > information about the identification and how it was established, and
>     > conserves irreplacable data.
>     > So consider any weight I might have to be formally thrown behind
>     > including this in version 1.0!
>     >
>     > Let the argument (re-)commence!
>     >
>     >
>     > Cheers,
>     >
>     > lnnrt.
>     >
>
>
> ------------------------------------------------------------------------
>
> -------------------------------------------------------------------------
> Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
> Studies have shown that voting for your favorite open source project,
> along with a healthy diet, reduces your potential for chronic lameness
> and boredom. Vote Now at http://www.sourceforge.net/community/cca08
> ------------------------------------------------------------------------
>
> _______________________________________________
> Psidev-pi-dev mailing list
> Psi...@li...
> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev
>