Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is currently handled in PRIDE (Issue 28)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Andy,

As we have both said, it's important to determine the use cases for this 
information. :) The only reasonable use case that doesn't take up oodles 
of disk space is simply knowing the ion types that were predicted.

Unless I planned to reproduce the search engine's comparison exactly, I 
don't see the point in knowing the exact mass(es) that the search engine 
expected and the observed ion(s) that it matched to. And if I plan to 
reproduce the score, that probably means I have access to the search 
engine's algorithm, so I'd just regenerate the comparison.

As for mapping to the observed ion(s), I think it's not relevant for the 
purposes of basic annotation. For clarity of presentation, viewers 
usually show the ion as either a logical point in the spectrum 
independent of the data itself, or they map it to the most abundant peak 
in the window. These approaches can be combined by changing the 
annotation when the user zooms in.

So yes, in this approach we have information loss. But I think it's 
better than not having the information at all (and depending on a 
vendor-supplied and version-dependent script to regenerate it) and 
certainly better than choking on 10gb analysis files. ;)

-Matt

Jones, Andy wrote:
> Hi all,
>
>   
>> An example to show how compact it could be:
>> fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2"
>>     
>
> I have a couple of queries about this proposal...
>
> Given a peptide sequence, we would be able to work out what were the expected masses of these fragments, assuming a standard method of calculating the masses of the b and y ions (and losses) - do all search engines use the same equation to calculate ion masses? 
>
> We wouldn't really know which peaks in the source spectrum corresponded with which ion. For many of the peaks we would be able to make a fair guess i.e. there is an observed peak within the tolerance window matching the expected mass but this doesn't help when there are multiple peaks within the window - I don't think we could correctly assume it would always be the most abundant peak...?
>
> In other words, we still have information loss. Perhaps one way forward would be for us to list the use cases that fragment ions must be reported for - do we have a list of use cases anywhere?
>
> I think getting this right will be a long process, so we have to make sure that we have a strong enough use case if we really want to get this into analysisXML version1.
>
> Cheers
> Andy
>
>
>