From: Wilfred H T. <Ta...@ap...> - 2008-08-08 06:43:48
|
For a given mass spectrum, there are a large number of possible ways to generate m/z vs. intensity data arrays, depending on the answer to questions such as: - Entire m/z vs. intensity profile or peaks only? - All peaks or monoisotopic peaks only? - All peaks or "prominent" peaks only (must be above some signal-to-noise threshold, for example)? - "De-charge" peaks or not? - And so on... I can think of 2 main ways to handle these variations (as well as options in-between that are a hybrid of these 2 extremes): (1) The software that generates mzML asks the user in detail what his or her preferences are, and then the software generates the appropriate m/z vs. intensity data arrays. (2) The software that generates mzML doesn't ask the user anything. Rather the software writes out all the data with bit masks and other auxiliary data. That is, rather than having just m/z vs. intensity data arrays, there are extra data arrays carrying more info. Example: m/z Intensity Monoisotopic? S/N Charge ... 501.00 1000 Yes 50 3 501.33 500 No 20 3 501.66 100 No 10 3 751.00 5000 Yes 150 2 751.50 2000 No 70 2 Option (1) has the advantage of being very simple for any software consuming mzML to interpret the data arrays without any ambiguity. The disadvantage of option (1) is that any change in data requirements requires that one go back to the original raw data and re-generate a completely new mzML file. By contrast, option (2) has a lot more data embedded in the file and is much less likely to require re-generation of mzML, but requires more intelligence from the software consuming the mzML - for example, if only the monoisotopic peaks are desirable, the consuming software must nevertheless understand that it can't just read the m/z and intensity data arrays but it MUST ALSO read the monoisotopic data array and throw out some data values based on the contents of the monoisotopic bitmask array; otherwise the results can be bad. Is there any guidance for how these things should be handled in mzML? What are the assumptions made by existing mzML consumers? Thanks, Wilfred |