Re: [Psidev-pi-dev] Fragment ion information

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Lennart,

If you think about storing fragments in as verbose a format as Andy 
Jones's suggestion, for every result in every spectrum (keeping in mind 
that more than the top result/s is/are written out per spectrum), it 
would represent an intolerable (to me) bloat to the file. As I 
understand it, we want to mention the algorithm, its parameters, and its 
version in the CV. We would then recommend to search engine developers 
that when they implement support for analysisXML they provide an online 
script for generating fragments based on the controlled parameters and 
the algorithm version. I do not think this reconstruction of the 
fragment information is "next to impossible" as long as such a script is 
provided.

Alternatively, I think we could come up with a much briefer format to 
store the fragments in, something like:
<FragmentIonMatches>b2 y2 y6-NH3 y6-NH3(+2)</FragmentIonMatches>

It's ugly as sin, but we can come up with a controlled pattern to store 
the ion types in. The numbers are mostly redundant: the expected m/z 
values can be recalculated from the ion type and the sequence, and the 
observed m/z values can be looked up in the spectrum according to some 
rules regarding mass/m/z tolerances and whatever data processing was 
applied to the original spectrum by the search engine (which again, is a 
good reason to have search engines write out the results of their 
preprocessing to an mzML file).

-Matt

Lennart Martens wrote:
> Dear PSI-PI'ers,
>
>
> I recently came across a discussion related to the inclusion of fragment 
> ions (as called by the search engine during identification) in the 
> analysisXML format (see issue 28 on the Google tracker, direct link: 
> http://code.google.com/p/psi-pi/issues/detail?id=28).
>
> It somehow seems that popular opinion is against inclusion of this vital 
> piece of information, and that makes me very worried. One of the 
> comments on the issue page in fact is that fragment ion calling is 
> algorithm specific (which is true), and therefore should not be a part 
> of analysisXML.
> I'd actually like to use this same datum to strongly argue the other 
> way: since the calling is algorithm specific, it is next to impossible 
> to reconstruct the original calling after analysisXML export. So 
> essentially, a vital piece of information (the ability of the spectrum 
> to support the peptide identification as judged by the algorithm) is 
> thrown away during analysisXML conversion or output.
>
> I also believe that the difficulty in annotating which fragments are 
> called from the spectrum is definitely not insurmountable. The link with 
> mzML should be there anyway (otherwise you would not even be able to 
> retrieve the spectrum the identification was made from, an unthinkable 
> scenario), so inclusion of this is trivial (as in: already there). 
> Additionally, the unambiguous reference to the exact peak called in the 
> spectrum is also trivial: simply copy in the actual mass - or more 
> likely: m/z - in the analysisXML tag. Ion type should be easy enough to 
> annotate (there are only so many ion types, and these can be modelled in 
> CV), while charge state is a call made by the algorithm anyway, and can 
> therefore also be included easily. So this essentially fully backs up 
> Andy Jones' suggested tag format on the issue 28 page. And Andy has 
> included some other information, such as 'subsequence' and 'theoretical 
> mass' which people are free to dicuss the usefulness of (as it probably 
> constitutes redundant information).
>
> So my conclusion is: it's relatively easy to do, will capture vital 
> information about the identification and how it was established, and 
> conserves irreplacable data.
> So consider any weight I might have to be formally thrown behind 
> including this in version 1.0!
>
> Let the argument (re-)commence!
>
>
> Cheers,
>
> lnnrt.
>