From: David C. <dc...@ma...> - 2008-07-03 13:43:34
|
Hi Lennart, >>> Let the argument (re-)commence! Good to hear from you - maybe ;) Just to clarify, without this, there is no information haemorrhaging. It's just rather hard to reconstruct some of the information... which you may say is the same thing. What's the problem with a separate to tool to generate the information from the (less huge) anaylsisXML file and the mzML file? David Lennart Martens wrote: > Hi Matt, > > > Just to clarify, I do not care about the actual formatting, I only care > about preventing information loss. > In that respect, your less verbose format would be fine, with the one > caveat that it doesn't explicitly point to a peak in the spectrum. As a > result, one would have to calculate the theoretical m/z for the fragment > ion, then apply the fragment ion mass threshold used, and then somehow > select a single peak from all candidates in this m/z window in the > spectrum, introducing an arbitrary component in the process (e.g., I > might simply choose the largest peak in the interval, you might pick the > one with the best fitting isotopic envelope, while the actual search > engine originally chose the peak with the smallest mass delta -- so we'd > all end up with a different opinion once again). > > But anyway, the format is definitely open to any suggestions. I simply > want to stop the information haemorrhaging that results from the > exclusion of these data. > > > Cheers, > > lnnrt. > > Matt Chambers wrote: >> Hi Lennart, >> >> If you think about storing fragments in as verbose a format as Andy >> Jones's suggestion, for every result in every spectrum (keeping in mind >> that more than the top result/s is/are written out per spectrum), it >> would represent an intolerable (to me) bloat to the file. As I >> understand it, we want to mention the algorithm, its parameters, and its >> version in the CV. We would then recommend to search engine developers >> that when they implement support for analysisXML they provide an online >> script for generating fragments based on the controlled parameters and >> the algorithm version. I do not think this reconstruction of the >> fragment information is "next to impossible" as long as such a script is >> provided. >> >> Alternatively, I think we could come up with a much briefer format to >> store the fragments in, something like: >> <FragmentIonMatches>b2 y2 y6-NH3 y6-NH3(+2)</FragmentIonMatches> >> >> It's ugly as sin, but we can come up with a controlled pattern to store >> the ion types in. The numbers are mostly redundant: the expected m/z >> values can be recalculated from the ion type and the sequence, and the >> observed m/z values can be looked up in the spectrum according to some >> rules regarding mass/m/z tolerances and whatever data processing was >> applied to the original spectrum by the search engine (which again, is a >> good reason to have search engines write out the results of their >> preprocessing to an mzML file). >> >> -Matt >> >> >> Lennart Martens wrote: >>> Dear PSI-PI'ers, >>> >>> >>> I recently came across a discussion related to the inclusion of fragment >>> ions (as called by the search engine during identification) in the >>> analysisXML format (see issue 28 on the Google tracker, direct link: >>> http://code.google.com/p/psi-pi/issues/detail?id=28). >>> >>> It somehow seems that popular opinion is against inclusion of this vital >>> piece of information, and that makes me very worried. One of the >>> comments on the issue page in fact is that fragment ion calling is >>> algorithm specific (which is true), and therefore should not be a part >>> of analysisXML. >>> I'd actually like to use this same datum to strongly argue the other >>> way: since the calling is algorithm specific, it is next to impossible >>> to reconstruct the original calling after analysisXML export. So >>> essentially, a vital piece of information (the ability of the spectrum >>> to support the peptide identification as judged by the algorithm) is >>> thrown away during analysisXML conversion or output. >>> >>> I also believe that the difficulty in annotating which fragments are >>> called from the spectrum is definitely not insurmountable. The link with >>> mzML should be there anyway (otherwise you would not even be able to >>> retrieve the spectrum the identification was made from, an unthinkable >>> scenario), so inclusion of this is trivial (as in: already there). >>> Additionally, the unambiguous reference to the exact peak called in the >>> spectrum is also trivial: simply copy in the actual mass - or more >>> likely: m/z - in the analysisXML tag. Ion type should be easy enough to >>> annotate (there are only so many ion types, and these can be modelled in >>> CV), while charge state is a call made by the algorithm anyway, and can >>> therefore also be included easily. So this essentially fully backs up >>> Andy Jones' suggested tag format on the issue 28 page. And Andy has >>> included some other information, such as 'subsequence' and 'theoretical >>> mass' which people are free to dicuss the usefulness of (as it probably >>> constitutes redundant information). >>> >>> So my conclusion is: it's relatively easy to do, will capture vital >>> information about the identification and how it was established, and >>> conserves irreplacable data. >>> So consider any weight I might have to be formally thrown behind >>> including this in version 1.0! >>> >>> Let the argument (re-)commence! >>> >>> >>> Cheers, >>> >>> lnnrt. >>> >> >> ------------------------------------------------------------------------- >> Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! >> Studies have shown that voting for your favorite open source project, >> along with a healthy diet, reduces your potential for chronic lameness >> and boredom. Vote Now at http://www.sourceforge.net/community/cca08 >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> > > ------------------------------------------------------------------------- > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! > Studies have shown that voting for your favorite open source project, > along with a healthy diet, reduces your potential for chronic lameness > and boredom. Vote Now at http://www.sourceforge.net/community/cca08 > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |