[Psidev-pi-dev] Fragment ion information

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Dear PSI-PI'ers,

I recently came across a discussion related to the inclusion of fragment 
ions (as called by the search engine during identification) in the 
analysisXML format (see issue 28 on the Google tracker, direct link: 
http://code.google.com/p/psi-pi/issues/detail?id=28).

It somehow seems that popular opinion is against inclusion of this vital 
piece of information, and that makes me very worried. One of the 
comments on the issue page in fact is that fragment ion calling is 
algorithm specific (which is true), and therefore should not be a part 
of analysisXML.
I'd actually like to use this same datum to strongly argue the other 
way: since the calling is algorithm specific, it is next to impossible 
to reconstruct the original calling after analysisXML export. So 
essentially, a vital piece of information (the ability of the spectrum 
to support the peptide identification as judged by the algorithm) is 
thrown away during analysisXML conversion or output.

I also believe that the difficulty in annotating which fragments are 
called from the spectrum is definitely not insurmountable. The link with 
mzML should be there anyway (otherwise you would not even be able to 
retrieve the spectrum the identification was made from, an unthinkable 
scenario), so inclusion of this is trivial (as in: already there). 
Additionally, the unambiguous reference to the exact peak called in the 
spectrum is also trivial: simply copy in the actual mass - or more 
likely: m/z - in the analysisXML tag. Ion type should be easy enough to 
annotate (there are only so many ion types, and these can be modelled in 
CV), while charge state is a call made by the algorithm anyway, and can 
therefore also be included easily. So this essentially fully backs up 
Andy Jones' suggested tag format on the issue 28 page. And Andy has 
included some other information, such as 'subsequence' and 'theoretical 
mass' which people are free to dicuss the usefulness of (as it probably 
constitutes redundant information).

So my conclusion is: it's relatively easy to do, will capture vital 
information about the identification and how it was established, and 
conserves irreplacable data.
So consider any weight I might have to be formally thrown behind 
including this in version 1.0!

Let the argument (re-)commence!

Cheers,

lnnrt.