From: Pierre-Alain B. <pie...@is...> - 2008-07-05 12:19:32
|
Yes, and the answer is: a lib is a result, mzML does not anntoate spectra, as it is the task of an interpretation analysis. Therefore, a library should be stored in an AnalysisXML format, not mzML... Pierre-Alain Angel Pizarro wrote: > Matt, have the mzML group thought about annotated spectra libs, like > those provided by NIST, yet? > -angel > > On Thu, Jul 3, 2008 at 9:21 AM, Matt Chambers > <mat...@va... > <mailto:mat...@va...>> wrote: > > Hi Lennart, > > If you think about storing fragments in as verbose a format as Andy > Jones's suggestion, for every result in every spectrum (keeping in > mind > that more than the top result/s is/are written out per spectrum), it > would represent an intolerable (to me) bloat to the file. As I > understand it, we want to mention the algorithm, its parameters, > and its > version in the CV. We would then recommend to search engine developers > that when they implement support for analysisXML they provide an > online > script for generating fragments based on the controlled parameters and > the algorithm version. I do not think this reconstruction of the > fragment information is "next to impossible" as long as such a > script is > provided. > > Alternatively, I think we could come up with a much briefer format to > store the fragments in, something like: > <FragmentIonMatches>b2 y2 y6-NH3 y6-NH3(+2)</FragmentIonMatches> > > It's ugly as sin, but we can come up with a controlled pattern to > store > the ion types in. The numbers are mostly redundant: the expected m/z > values can be recalculated from the ion type and the sequence, and the > observed m/z values can be looked up in the spectrum according to some > rules regarding mass/m/z tolerances and whatever data processing was > applied to the original spectrum by the search engine (which > again, is a > good reason to have search engines write out the results of their > preprocessing to an mzML file). > > -Matt > > > Lennart Martens wrote: > > Dear PSI-PI'ers, > > > > > > I recently came across a discussion related to the inclusion of > fragment > > ions (as called by the search engine during identification) in the > > analysisXML format (see issue 28 on the Google tracker, direct link: > > http://code.google.com/p/psi-pi/issues/detail?id=28). > > > > It somehow seems that popular opinion is against inclusion of > this vital > > piece of information, and that makes me very worried. One of the > > comments on the issue page in fact is that fragment ion calling is > > algorithm specific (which is true), and therefore should not be > a part > > of analysisXML. > > I'd actually like to use this same datum to strongly argue the other > > way: since the calling is algorithm specific, it is next to > impossible > > to reconstruct the original calling after analysisXML export. So > > essentially, a vital piece of information (the ability of the > spectrum > > to support the peptide identification as judged by the algorithm) is > > thrown away during analysisXML conversion or output. > > > > I also believe that the difficulty in annotating which fragments are > > called from the spectrum is definitely not insurmountable. The > link with > > mzML should be there anyway (otherwise you would not even be able to > > retrieve the spectrum the identification was made from, an > unthinkable > > scenario), so inclusion of this is trivial (as in: already there). > > Additionally, the unambiguous reference to the exact peak called > in the > > spectrum is also trivial: simply copy in the actual mass - or more > > likely: m/z - in the analysisXML tag. Ion type should be easy > enough to > > annotate (there are only so many ion types, and these can be > modelled in > > CV), while charge state is a call made by the algorithm anyway, > and can > > therefore also be included easily. So this essentially fully > backs up > > Andy Jones' suggested tag format on the issue 28 page. And Andy has > > included some other information, such as 'subsequence' and > 'theoretical > > mass' which people are free to dicuss the usefulness of (as it > probably > > constitutes redundant information). > > > > So my conclusion is: it's relatively easy to do, will capture vital > > information about the identification and how it was established, and > > conserves irreplacable data. > > So consider any weight I might have to be formally thrown behind > > including this in version 1.0! > > > > Let the argument (re-)commence! > > > > > > Cheers, > > > > lnnrt. > > > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! > Studies have shown that voting for your favorite open source project, > along with a healthy diet, reduces your potential for chronic lameness > and boredom. Vote Now at http://www.sourceforge.net/community/cca08 > ------------------------------------------------------------------------ > > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > |