From: Lennart M. <len...@eb...> - 2008-07-03 12:57:11
|
Dear PSI-PI'ers, I recently came across a discussion related to the inclusion of fragment ions (as called by the search engine during identification) in the analysisXML format (see issue 28 on the Google tracker, direct link: http://code.google.com/p/psi-pi/issues/detail?id=28). It somehow seems that popular opinion is against inclusion of this vital piece of information, and that makes me very worried. One of the comments on the issue page in fact is that fragment ion calling is algorithm specific (which is true), and therefore should not be a part of analysisXML. I'd actually like to use this same datum to strongly argue the other way: since the calling is algorithm specific, it is next to impossible to reconstruct the original calling after analysisXML export. So essentially, a vital piece of information (the ability of the spectrum to support the peptide identification as judged by the algorithm) is thrown away during analysisXML conversion or output. I also believe that the difficulty in annotating which fragments are called from the spectrum is definitely not insurmountable. The link with mzML should be there anyway (otherwise you would not even be able to retrieve the spectrum the identification was made from, an unthinkable scenario), so inclusion of this is trivial (as in: already there). Additionally, the unambiguous reference to the exact peak called in the spectrum is also trivial: simply copy in the actual mass - or more likely: m/z - in the analysisXML tag. Ion type should be easy enough to annotate (there are only so many ion types, and these can be modelled in CV), while charge state is a call made by the algorithm anyway, and can therefore also be included easily. So this essentially fully backs up Andy Jones' suggested tag format on the issue 28 page. And Andy has included some other information, such as 'subsequence' and 'theoretical mass' which people are free to dicuss the usefulness of (as it probably constitutes redundant information). So my conclusion is: it's relatively easy to do, will capture vital information about the identification and how it was established, and conserves irreplacable data. So consider any weight I might have to be formally thrown behind including this in version 1.0! Let the argument (re-)commence! Cheers, lnnrt. |