From: Matthew C. <mat...@va...> - 2007-11-26 20:47:32
|
That's a neat idea Fredrik, but it would mean that MS/MS search engines would have to parse any MS1 that led to an MS2 (and somehow cache it if the MS1 led to multiple MS2s) which is even worse than jumping around the file to find the MS1. The charge state array is interesting too, but as I understand it, charge state determination is even more ambiguous than monoisotopic peak determination if the selection window is crowded. And although your approach would get rid of the need to store anything with the MS2 but the actual selection window itself, some processors might want to give the individual peaks in the selection window multiple charge states due to uncertainty in the determination. -Matt Fredrik Levander wrote: > This is certainly an interesting case, and the mzML format is not > perfect for it. To separate acquisition parameters from analysis > results, one (xsd valid ) possibility would be that the ionSelection > element is used for representing only the selection window. In the > precursor element there is a direct reference to the MS scan that was > used for the ion selection. This referenced spectrum can in the mzML > file be represented as a peak list with all the peaks and their accurate > masses in the selection window. One could imagine three binary arrays, > with m/z, intensity and charge, respectively. This would clearly > distinguish the machine selection window and the post-run peak analysis. > However, this is not an optimal solution for search engines which would > have to jump around in the file to find the precursor masses to consider. > > Fredrik > > Matthew Chambers skrev: > >> I think most writers of the nearest-instrument-format mzML will have >> done some processing step(s) to determine the precursor mass to write >> into the file. To take the Thermo RAW format as an example, it seems to >> only be able to store one precursor m/z and charge state per MS2, but >> since MS2s are fragmented from a relatively wide m/z isolation window >> (e.g. +/- 2.5 Da/z), it's entirely possible in a complex sample for a >> MS2 to be a multiplex (aka chimeric) spectrum in that it represents the >> fragmentation of multiple precursors of different masses and charge >> states. So storing one precursor for such a spectrum is utterly >> inadequate if high mass accuracy is used and expected. Add to that the >> fact that Thermo's processing step(s) are not optimal (in other words: >> Thermo's monoisotopic mass determination and charge state determination >> leave room for improvement). >> >> You may be aware that it is common practice in shotgun proteomics to >> determine if an MS2+ came from a singly or multiply charged precursor, >> and if multiply charged, to treat it as both +2 and +3 (and on some >> data, even higher charge states). With higher mass accuracy instruments >> coming online and in the absence of better precursor estimation from the >> vendor software, it will be increasingly common practice to treat a scan >> as coming from multiple precursor masses, not just precursor charge >> states. The multiple precursor masses can be due to uncertain isotopic >> variants in the precursor's isotopic envelope and/or due to multiple >> precursor species in the same isolation window. >> >> I too would like your take on how to represent this in mzML. >> >> Thanks, >> Matt >> >> >> Angel Pizarro wrote: >> >> >>> Interesting. Here is how understand matters (keep in mind I don't >>> actually perform experiments) >>> >>> <precursorList> I thought was a list of selection windows for a MS^n >>> experiment. In other words a MS2 would only have one precursor >>> selection window, MS3 would have two, etc. etc. >>> >>> The experiment as described actually sounds to me like there is a FT >>> MS1 scan independent of the selection window for the MS2 spectrum. >>> You then run an analysis to determine a more accurate precursor for >>> the MS2 spectra, making a set of relationships and assigning a score >>> to the likely candidates. >>> >>> That to me sounds like a processing step or analysis and not part of >>> the data acquisition experiment. So before we start discussing >>> structure, is this use (e.g. spectral processing) a role of the mzML >>> format? And if so is this encoded as a new file, or as part of an >>> export from the original data format ( e.g. should the intermediate >>> original format to mzml be output in the process) ? >>> >>> -angel >>> >>> |