From: Angel P. <an...@ma...> - 2008-07-03 13:36:30
|
Matt, have the mzML group thought about annotated spectra libs, like those provided by NIST, yet? -angel On Thu, Jul 3, 2008 at 9:21 AM, Matt Chambers < mat...@va...> wrote: > Hi Lennart, > > If you think about storing fragments in as verbose a format as Andy > Jones's suggestion, for every result in every spectrum (keeping in mind > that more than the top result/s is/are written out per spectrum), it > would represent an intolerable (to me) bloat to the file. As I > understand it, we want to mention the algorithm, its parameters, and its > version in the CV. We would then recommend to search engine developers > that when they implement support for analysisXML they provide an online > script for generating fragments based on the controlled parameters and > the algorithm version. I do not think this reconstruction of the > fragment information is "next to impossible" as long as such a script is > provided. > > Alternatively, I think we could come up with a much briefer format to > store the fragments in, something like: > <FragmentIonMatches>b2 y2 y6-NH3 y6-NH3(+2)</FragmentIonMatches> > > It's ugly as sin, but we can come up with a controlled pattern to store > the ion types in. The numbers are mostly redundant: the expected m/z > values can be recalculated from the ion type and the sequence, and the > observed m/z values can be looked up in the spectrum according to some > rules regarding mass/m/z tolerances and whatever data processing was > applied to the original spectrum by the search engine (which again, is a > good reason to have search engines write out the results of their > preprocessing to an mzML file). > > -Matt > > > Lennart Martens wrote: > > Dear PSI-PI'ers, > > > > > > I recently came across a discussion related to the inclusion of fragment > > ions (as called by the search engine during identification) in the > > analysisXML format (see issue 28 on the Google tracker, direct link: > > http://code.google.com/p/psi-pi/issues/detail?id=28). > > > > It somehow seems that popular opinion is against inclusion of this vital > > piece of information, and that makes me very worried. One of the > > comments on the issue page in fact is that fragment ion calling is > > algorithm specific (which is true), and therefore should not be a part > > of analysisXML. > > I'd actually like to use this same datum to strongly argue the other > > way: since the calling is algorithm specific, it is next to impossible > > to reconstruct the original calling after analysisXML export. So > > essentially, a vital piece of information (the ability of the spectrum > > to support the peptide identification as judged by the algorithm) is > > thrown away during analysisXML conversion or output. > > > > I also believe that the difficulty in annotating which fragments are > > called from the spectrum is definitely not insurmountable. The link with > > mzML should be there anyway (otherwise you would not even be able to > > retrieve the spectrum the identification was made from, an unthinkable > > scenario), so inclusion of this is trivial (as in: already there). > > Additionally, the unambiguous reference to the exact peak called in the > > spectrum is also trivial: simply copy in the actual mass - or more > > likely: m/z - in the analysisXML tag. Ion type should be easy enough to > > annotate (there are only so many ion types, and these can be modelled in > > CV), while charge state is a call made by the algorithm anyway, and can > > therefore also be included easily. So this essentially fully backs up > > Andy Jones' suggested tag format on the issue 28 page. And Andy has > > included some other information, such as 'subsequence' and 'theoretical > > mass' which people are free to dicuss the usefulness of (as it probably > > constitutes redundant information). > > > > So my conclusion is: it's relatively easy to do, will capture vital > > information about the identification and how it was established, and > > conserves irreplacable data. > > So consider any weight I might have to be formally thrown behind > > including this in version 1.0! > > > > Let the argument (re-)commence! > > > > > > Cheers, > > > > lnnrt. > > > > |