From: Matt C. <mat...@va...> - 2008-07-21 13:21:12
|
Hi Andy, As we have both said, it's important to determine the use cases for this information. :) The only reasonable use case that doesn't take up oodles of disk space is simply knowing the ion types that were predicted. Unless I planned to reproduce the search engine's comparison exactly, I don't see the point in knowing the exact mass(es) that the search engine expected and the observed ion(s) that it matched to. And if I plan to reproduce the score, that probably means I have access to the search engine's algorithm, so I'd just regenerate the comparison. As for mapping to the observed ion(s), I think it's not relevant for the purposes of basic annotation. For clarity of presentation, viewers usually show the ion as either a logical point in the spectrum independent of the data itself, or they map it to the most abundant peak in the window. These approaches can be combined by changing the annotation when the user zooms in. So yes, in this approach we have information loss. But I think it's better than not having the information at all (and depending on a vendor-supplied and version-dependent script to regenerate it) and certainly better than choking on 10gb analysis files. ;) -Matt Jones, Andy wrote: > Hi all, > > >> An example to show how compact it could be: >> fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" >> > > I have a couple of queries about this proposal... > > Given a peptide sequence, we would be able to work out what were the expected masses of these fragments, assuming a standard method of calculating the masses of the b and y ions (and losses) - do all search engines use the same equation to calculate ion masses? > > We wouldn't really know which peaks in the source spectrum corresponded with which ion. For many of the peaks we would be able to make a fair guess i.e. there is an observed peak within the tolerance window matching the expected mass but this doesn't help when there are multiple peaks within the window - I don't think we could correctly assume it would always be the most abundant peak...? > > In other words, we still have information loss. Perhaps one way forward would be for us to list the use cases that fragment ions must be reported for - do we have a list of use cases anywhere? > > I think getting this right will be a long process, so we have to make sure that we have a strong enough use case if we really want to get this into analysisXML version1. > > Cheers > Andy > > > |