Re: [Psidev-pi-dev] Fragment ion information

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Matt, have the mzML group thought about annotated spectra libs, like those
provided by NIST, yet?
-angel

On Thu, Jul 3, 2008 at 9:21 AM, Matt Chambers <
mat...@va...> wrote:

> Hi Lennart,
>
> If you think about storing fragments in as verbose a format as Andy
> Jones's suggestion, for every result in every spectrum (keeping in mind
> that more than the top result/s is/are written out per spectrum), it
> would represent an intolerable (to me) bloat to the file. As I
> understand it, we want to mention the algorithm, its parameters, and its
> version in the CV. We would then recommend to search engine developers
> that when they implement support for analysisXML they provide an online
> script for generating fragments based on the controlled parameters and
> the algorithm version. I do not think this reconstruction of the
> fragment information is "next to impossible" as long as such a script is
> provided.
>
> Alternatively, I think we could come up with a much briefer format to
> store the fragments in, something like:
> <FragmentIonMatches>b2 y2 y6-NH3 y6-NH3(+2)</FragmentIonMatches>
>
> It's ugly as sin, but we can come up with a controlled pattern to store
> the ion types in. The numbers are mostly redundant: the expected m/z
> values can be recalculated from the ion type and the sequence, and the
> observed m/z values can be looked up in the spectrum according to some
> rules regarding mass/m/z tolerances and whatever data processing was
> applied to the original spectrum by the search engine (which again, is a
> good reason to have search engines write out the results of their
> preprocessing to an mzML file).
>
> -Matt
>
>
> Lennart Martens wrote:
> > Dear PSI-PI'ers,
> >
> >
> > I recently came across a discussion related to the inclusion of fragment
> > ions (as called by the search engine during identification) in the
> > analysisXML format (see issue 28 on the Google tracker, direct link:
> > http://code.google.com/p/psi-pi/issues/detail?id=28).
> >
> > It somehow seems that popular opinion is against inclusion of this vital
> > piece of information, and that makes me very worried. One of the
> > comments on the issue page in fact is that fragment ion calling is
> > algorithm specific (which is true), and therefore should not be a part
> > of analysisXML.
> > I'd actually like to use this same datum to strongly argue the other
> > way: since the calling is algorithm specific, it is next to impossible
> > to reconstruct the original calling after analysisXML export. So
> > essentially, a vital piece of information (the ability of the spectrum
> > to support the peptide identification as judged by the algorithm) is
> > thrown away during analysisXML conversion or output.
> >
> > I also believe that the difficulty in annotating which fragments are
> > called from the spectrum is definitely not insurmountable. The link with
> > mzML should be there anyway (otherwise you would not even be able to
> > retrieve the spectrum the identification was made from, an unthinkable
> > scenario), so inclusion of this is trivial (as in: already there).
> > Additionally, the unambiguous reference to the exact peak called in the
> > spectrum is also trivial: simply copy in the actual mass - or more
> > likely: m/z - in the analysisXML tag. Ion type should be easy enough to
> > annotate (there are only so many ion types, and these can be modelled in
> > CV), while charge state is a call made by the algorithm anyway, and can
> > therefore also be included easily. So this essentially fully backs up
> > Andy Jones' suggested tag format on the issue 28 page. And Andy has
> > included some other information, such as 'subsequence' and 'theoretical
> > mass' which people are free to dicuss the usefulness of (as it probably
> > constitutes redundant information).
> >
> > So my conclusion is: it's relatively easy to do, will capture vital
> > information about the identification and how it was established, and
> > conserves irreplacable data.
> > So consider any weight I might have to be formally thrown behind
> > including this in version 1.0!
> >
> > Let the argument (re-)commence!
> >
> >
> > Cheers,
> >
> > lnnrt.
> >
>
>