psidev-ms-dev Mailing List for Proteomics Standards Initiative (Page 86)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

For a given mass spectrum, there are a large number of possible ways to 
generate m/z vs. intensity data arrays, depending on the answer to 
questions such as:
 - Entire m/z vs. intensity profile or peaks only?
 - All peaks or monoisotopic peaks only?
 - All peaks or "prominent" peaks only (must be above some signal-to-noise 
threshold, for example)?
 - "De-charge" peaks or not?
 - And so on...

I can think of 2 main ways to handle these variations (as well as options 
in-between that are a hybrid of these 2 extremes):
(1) The software that generates mzML asks the user in detail what his or 
her preferences are, and then the software generates the appropriate m/z 
vs. intensity data arrays.
(2) The software that generates mzML doesn't ask the user anything. Rather 
the software writes out all the data with bit masks and other auxiliary 
data. That is, rather than having just m/z vs. intensity data arrays, 
there are extra data arrays carrying more info. Example:
        m/z     Intensity       Monoisotopic?   S/N     Charge ...
        501.00  1000            Yes             50      3
        501.33  500             No              20      3
        501.66  100             No              10      3
        751.00  5000            Yes             150     2
        751.50  2000            No              70      2

Option (1) has the advantage of being very simple for any software 
consuming mzML to interpret the data arrays without any ambiguity. The 
disadvantage of option (1) is that any change in data requirements 
requires that one go back to the original raw data and re-generate a 
completely new mzML file. By contrast, option (2) has a lot more data 
embedded in the file and is much less likely to require re-generation of 
mzML, but requires more intelligence from the software consuming the mzML 
- for example, if only the monoisotopic peaks are desirable, the consuming 
software must nevertheless understand that it can't just read the m/z and 
intensity data arrays but it MUST ALSO read the monoisotopic data array 
and throw out some data values based on the contents of the monoisotopic 
bitmask array; otherwise the results can be bad.

Is there any guidance for how these things should be handled in mzML? What 
are the assumptions made by existing mzML consumers?

Thanks,
Wilfred

2002	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (3)	Nov	Dec
2003	Jan	Feb	Mar	Apr (1)	May	Jun	Jul (1)	Aug	Sep	Oct	Nov (3)	Dec
2004	Jan	Feb	Mar	Apr	May (2)	Jun	Jul (1)	Aug (5)	Sep	Oct (5)	Nov (1)	Dec (2)
2005	Jan (2)	Feb (5)	Mar	Apr (1)	May (5)	Jun (2)	Jul (3)	Aug (7)	Sep (18)	Oct (22)	Nov (10)	Dec (15)
2006	Jan (15)	Feb (8)	Mar (16)	Apr (8)	May (2)	Jun (5)	Jul (3)	Aug (1)	Sep (34)	Oct (21)	Nov (14)	Dec (2)
2007	Jan	Feb (17)	Mar (10)	Apr (25)	May (11)	Jun (30)	Jul (1)	Aug (38)	Sep	Oct (119)	Nov (18)	Dec (3)
2008	Jan (34)	Feb (202)	Mar (57)	Apr (76)	May (44)	Jun (33)	Jul (33)	Aug (32)	Sep (41)	Oct (49)	Nov (84)	Dec (216)
2009	Jan (102)	Feb (126)	Mar (112)	Apr (26)	May (91)	Jun (54)	Jul (39)	Aug (29)	Sep (16)	Oct (18)	Nov (12)	Dec (23)
2010	Jan (29)	Feb (7)	Mar (11)	Apr (22)	May (9)	Jun (13)	Jul (7)	Aug (10)	Sep (9)	Oct (20)	Nov (1)	Dec
2011	Jan	Feb (4)	Mar (27)	Apr (15)	May (23)	Jun (13)	Jul (15)	Aug (11)	Sep (23)	Oct (18)	Nov (10)	Dec (7)
2012	Jan (23)	Feb (19)	Mar (7)	Apr (20)	May (16)	Jun (4)	Jul (6)	Aug (6)	Sep (14)	Oct (16)	Nov (31)	Dec (23)
2013	Jan (14)	Feb (19)	Mar (7)	Apr (25)	May (8)	Jun (5)	Jul (5)	Aug (6)	Sep (20)	Oct (19)	Nov (10)	Dec (12)
2014	Jan (6)	Feb (15)	Mar (6)	Apr (4)	May (16)	Jun (6)	Jul (4)	Aug (2)	Sep (3)	Oct (3)	Nov (7)	Dec (3)
2015	Jan (3)	Feb (8)	Mar (14)	Apr (3)	May (17)	Jun (9)	Jul (4)	Aug (2)	Sep	Oct (13)	Nov	Dec (6)
2016	Jan (8)	Feb (1)	Mar (20)	Apr (16)	May (11)	Jun (6)	Jul (5)	Aug	Sep (2)	Oct (5)	Nov (7)	Dec (2)
2017	Jan (10)	Feb (3)	Mar (17)	Apr (7)	May (5)	Jun (11)	Jul (4)	Aug (12)	Sep (9)	Oct (7)	Nov (2)	Dec (4)
2018	Jan (7)	Feb (2)	Mar (5)	Apr (6)	May (7)	Jun (7)	Jul (7)	Aug (1)	Sep (9)	Oct (5)	Nov (3)	Dec (5)
2019	Jan (10)	Feb	Mar (4)	Apr (4)	May (2)	Jun (8)	Jul (2)	Aug (2)	Sep	Oct (2)	Nov (9)	Dec (1)
2020	Jan (3)	Feb (1)	Mar (2)	Apr	May (3)	Jun	Jul (2)	Aug	Sep	Oct (1)	Nov	Dec (1)
2021	Jan	Feb	Mar	Apr (5)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2022	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec
2023	Jan	Feb	Mar (1)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2024	Jan	Feb (1)	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec (2)
2025	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec

psidev-ms-dev Mailing List for Proteomics Standards Initiative (Page 86)

psidev-ms-dev — Mass spectroscopy standard development