psidev-ms-dev Mailing List for Proteomics Standards Initiative (Page 112)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Matt,

OK, I see the disconnect - you aren't using an API for reading mass spec
data, you're using an API for reading XML (expat - an excellent choice).
You're speaking in terms of "the parser", but the APIs we're concerned with
(RAMP, JRAP) are front ends to multiple parsers and they abstract the mass
spec file format choice away from the logic that deals with mass spec data,
which keeps us from needing to change a couple dozen programs (along with
others we don't even know about, since RAMP and JRAP are open source) when a
new format pops up.  So yes, you'll certainly need to make extensive parser
changes to deal with mzML, as will RAMP and JRAP.  And, if you want to
retain the ability to read the mass spec files you're already reading,
you'll need to somehow deal with using multiple parsers inside your code.
In short, you'll need to create your own mass spec reader file API.

So why all the excitement about API stability?  Consider this: originally,
RAMP read mzXML only.  Then we added the ability to read mzData.  Now, all
of the many programs that employ RAMP suddenly could read both mzData and
mzXML with nothing more than a recompilation (OK, that first time actually
required a small RAMP API tweak - using a RAMPFILE handle instead of a FILE
handle).  Later we added mzXML 3.0 with its compressed peak lists, and RAMP
users only needed to recompile to get this additional capability - no
"downstream" changes needed.  There have even been unreleased versions of
RAMP that read intermediate proposed forms of mzML.  Such ease of adoption
is very powerful when trying to establish a new data standard.  But guess
what?  RAMP can't be made to transparently handle the current proposed mzML
format due to the breaking of the one file / one run mapping.

Truly new mass spec behaviors will eventually make it necessary to expand or
even break the current mass spec data reader APIs.  Multiple precursors are
actually a good example of this (as an expansion, hopefully).  But, breaking
the one run / one file relationship isn't driven by new mass spec behaviors
that I know of.  What is the use case for this feature, anyway?   What's so
compelling about having multiple runs in a single mzML file that everyone
will want to massively rejigger their code to implement this?  Seems like
we're just creating an orphan feature that will only serve to trip up unwary
mzML output writers ("nice multi-run output ya got there - too bad nobody
can read it"), which I think is exactly the kind of thing the committee said
they wanted to avoid.

- Brian

-----Original Message-----
From: Matthew Chambers [mailto:mat...@va...] 
Sent: Thursday, August 02, 2007 3:11 PM
To: 'Joshua Tasman'
Cc: 'Brian Pratt'; psi...@li...
Subject: RE: [Psidev-ms-dev] mzML 0.93 ready for first review

> -----Original Message-----
> From: Joshua Tasman [mailto:jt...@sy...]
> Sent: Thursday, August 02, 2007 5:03 PM
> To: Matthew Chambers
> Cc: 'Brian Pratt'; psi...@li...
> Subject: Re: [Psidev-ms-dev] mzML 0.93 ready for first review
> 
> Hi Matt,
> 
> As the person writing both the writers and readers (at least for 
> now)-- a brief comment:
> 
> > Parameter groups, multiple runs, multiple precursors, and
> >> compressed binary data are all major "completely predictable 
> >> trouble spots."
> 
> Brian is correct-- neither parameter groups or compressed binary data 
> change the expected relationship of scans-to-file.  (Nor do multiple 
> precursor, which will require downstream code changes, but only apply 
> to MS level > 2 scans and it's important info to get in there anyhow, 
> so a good downstream change to make.)  RAMP already reads compressed 
> mzXML, for example.
> 
> -Josh

I didn't mean to imply that parameter groups or multiple precursors changed
the relationship of scans-to-file, only that they change the parser and
underlying data structure quite significantly.  I don't use RAMP in my C++
code, I wrote my own simple expat-based parser, but I know changes will have
to be made with the new format.  When I said "readers develop faster than
the file writers" I meant all the way downstream to the UI (i.e. supporting
multiple sources in a single input file).  In any case, even before those
downstream changes are made, a RunList::count == 1 dependent parser can
simply only read the first run from a multi-run file, just like some parsers
may only read MS>=2 spectra from the file.

-Matt

2002	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (3)	Nov	Dec
2003	Jan	Feb	Mar	Apr (1)	May	Jun	Jul (1)	Aug	Sep	Oct	Nov (3)	Dec
2004	Jan	Feb	Mar	Apr	May (2)	Jun	Jul (1)	Aug (5)	Sep	Oct (5)	Nov (1)	Dec (2)
2005	Jan (2)	Feb (5)	Mar	Apr (1)	May (5)	Jun (2)	Jul (3)	Aug (7)	Sep (18)	Oct (22)	Nov (10)	Dec (15)
2006	Jan (15)	Feb (8)	Mar (16)	Apr (8)	May (2)	Jun (5)	Jul (3)	Aug (1)	Sep (34)	Oct (21)	Nov (14)	Dec (2)
2007	Jan	Feb (17)	Mar (10)	Apr (25)	May (11)	Jun (30)	Jul (1)	Aug (38)	Sep	Oct (119)	Nov (18)	Dec (3)
2008	Jan (34)	Feb (202)	Mar (57)	Apr (76)	May (44)	Jun (33)	Jul (33)	Aug (32)	Sep (41)	Oct (49)	Nov (84)	Dec (216)
2009	Jan (102)	Feb (126)	Mar (112)	Apr (26)	May (91)	Jun (54)	Jul (39)	Aug (29)	Sep (16)	Oct (18)	Nov (12)	Dec (23)
2010	Jan (29)	Feb (7)	Mar (11)	Apr (22)	May (9)	Jun (13)	Jul (7)	Aug (10)	Sep (9)	Oct (20)	Nov (1)	Dec
2011	Jan	Feb (4)	Mar (27)	Apr (15)	May (23)	Jun (13)	Jul (15)	Aug (11)	Sep (23)	Oct (18)	Nov (10)	Dec (7)
2012	Jan (23)	Feb (19)	Mar (7)	Apr (20)	May (16)	Jun (4)	Jul (6)	Aug (6)	Sep (14)	Oct (16)	Nov (31)	Dec (23)
2013	Jan (14)	Feb (19)	Mar (7)	Apr (25)	May (8)	Jun (5)	Jul (5)	Aug (6)	Sep (20)	Oct (19)	Nov (10)	Dec (12)
2014	Jan (6)	Feb (15)	Mar (6)	Apr (4)	May (16)	Jun (6)	Jul (4)	Aug (2)	Sep (3)	Oct (3)	Nov (7)	Dec (3)
2015	Jan (3)	Feb (8)	Mar (14)	Apr (3)	May (17)	Jun (9)	Jul (4)	Aug (2)	Sep	Oct (13)	Nov	Dec (6)
2016	Jan (8)	Feb (1)	Mar (20)	Apr (16)	May (11)	Jun (6)	Jul (5)	Aug	Sep (2)	Oct (5)	Nov (7)	Dec (2)
2017	Jan (10)	Feb (3)	Mar (17)	Apr (7)	May (5)	Jun (11)	Jul (4)	Aug (12)	Sep (9)	Oct (7)	Nov (2)	Dec (4)
2018	Jan (7)	Feb (2)	Mar (5)	Apr (6)	May (7)	Jun (7)	Jul (7)	Aug (1)	Sep (9)	Oct (5)	Nov (3)	Dec (5)
2019	Jan (10)	Feb	Mar (4)	Apr (4)	May (2)	Jun (8)	Jul (2)	Aug (2)	Sep	Oct (2)	Nov (9)	Dec (1)
2020	Jan (3)	Feb (1)	Mar (2)	Apr	May (3)	Jun	Jul (2)	Aug	Sep	Oct (1)	Nov	Dec (1)
2021	Jan	Feb	Mar	Apr (5)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2022	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec
2023	Jan	Feb	Mar (1)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2024	Jan	Feb (1)	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec (2)
2025	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec

psidev-ms-dev Mailing List for Proteomics Standards Initiative (Page 112)

psidev-ms-dev — Mass spectroscopy standard development