You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(3) |
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(3) |
Dec
|
2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
(1) |
Aug
(5) |
Sep
|
Oct
(5) |
Nov
(1) |
Dec
(2) |
2005 |
Jan
(2) |
Feb
(5) |
Mar
|
Apr
(1) |
May
(5) |
Jun
(2) |
Jul
(3) |
Aug
(7) |
Sep
(18) |
Oct
(22) |
Nov
(10) |
Dec
(15) |
2006 |
Jan
(15) |
Feb
(8) |
Mar
(16) |
Apr
(8) |
May
(2) |
Jun
(5) |
Jul
(3) |
Aug
(1) |
Sep
(34) |
Oct
(21) |
Nov
(14) |
Dec
(2) |
2007 |
Jan
|
Feb
(17) |
Mar
(10) |
Apr
(25) |
May
(11) |
Jun
(30) |
Jul
(1) |
Aug
(38) |
Sep
|
Oct
(119) |
Nov
(18) |
Dec
(3) |
2008 |
Jan
(34) |
Feb
(202) |
Mar
(57) |
Apr
(76) |
May
(44) |
Jun
(33) |
Jul
(33) |
Aug
(32) |
Sep
(41) |
Oct
(49) |
Nov
(84) |
Dec
(216) |
2009 |
Jan
(102) |
Feb
(126) |
Mar
(112) |
Apr
(26) |
May
(91) |
Jun
(54) |
Jul
(39) |
Aug
(29) |
Sep
(16) |
Oct
(18) |
Nov
(12) |
Dec
(23) |
2010 |
Jan
(29) |
Feb
(7) |
Mar
(11) |
Apr
(22) |
May
(9) |
Jun
(13) |
Jul
(7) |
Aug
(10) |
Sep
(9) |
Oct
(20) |
Nov
(1) |
Dec
|
2011 |
Jan
|
Feb
(4) |
Mar
(27) |
Apr
(15) |
May
(23) |
Jun
(13) |
Jul
(15) |
Aug
(11) |
Sep
(23) |
Oct
(18) |
Nov
(10) |
Dec
(7) |
2012 |
Jan
(23) |
Feb
(19) |
Mar
(7) |
Apr
(20) |
May
(16) |
Jun
(4) |
Jul
(6) |
Aug
(6) |
Sep
(14) |
Oct
(16) |
Nov
(31) |
Dec
(23) |
2013 |
Jan
(14) |
Feb
(19) |
Mar
(7) |
Apr
(25) |
May
(8) |
Jun
(5) |
Jul
(5) |
Aug
(6) |
Sep
(20) |
Oct
(19) |
Nov
(10) |
Dec
(12) |
2014 |
Jan
(6) |
Feb
(15) |
Mar
(6) |
Apr
(4) |
May
(16) |
Jun
(6) |
Jul
(4) |
Aug
(2) |
Sep
(3) |
Oct
(3) |
Nov
(7) |
Dec
(3) |
2015 |
Jan
(3) |
Feb
(8) |
Mar
(14) |
Apr
(3) |
May
(17) |
Jun
(9) |
Jul
(4) |
Aug
(2) |
Sep
|
Oct
(13) |
Nov
|
Dec
(6) |
2016 |
Jan
(8) |
Feb
(1) |
Mar
(20) |
Apr
(16) |
May
(11) |
Jun
(6) |
Jul
(5) |
Aug
|
Sep
(2) |
Oct
(5) |
Nov
(7) |
Dec
(2) |
2017 |
Jan
(10) |
Feb
(3) |
Mar
(17) |
Apr
(7) |
May
(5) |
Jun
(11) |
Jul
(4) |
Aug
(12) |
Sep
(9) |
Oct
(7) |
Nov
(2) |
Dec
(4) |
2018 |
Jan
(7) |
Feb
(2) |
Mar
(5) |
Apr
(6) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(1) |
Sep
(9) |
Oct
(5) |
Nov
(3) |
Dec
(5) |
2019 |
Jan
(10) |
Feb
|
Mar
(4) |
Apr
(4) |
May
(2) |
Jun
(8) |
Jul
(2) |
Aug
(2) |
Sep
|
Oct
(2) |
Nov
(9) |
Dec
(1) |
2020 |
Jan
(3) |
Feb
(1) |
Mar
(2) |
Apr
|
May
(3) |
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
(1) |
2021 |
Jan
|
Feb
|
Mar
|
Apr
(5) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(2) |
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Brian P. <bri...@in...> - 2007-10-05 00:46:51
|
Well, ASAPRatio (like most ISB tools) uses the RAMP API, which, we are promised, will read mzML, so it's just a recompile. I just hope it's not a big performance hit due to the nature of the mzML format. For closed source tools, yes, at least until the vendors get on board then intermediate conversion steps are a fact of life. But each translation step introduces the possibility of error (I'm a very suspicious guy when it comes to software: more code means more bugs) and is best avoided when possible, so starting with a format that pretty much expects you to convert away from it for operational purposes spooks me. Bad for throughput, too. - Brian _____ From: psi...@li... [mailto:psi...@li...] On Behalf Of Angel Pizarro Sent: Thursday, October 04, 2007 5:29 PM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] honey vs vinegar On 10/4/07, Brian Pratt <bri...@in...> wrote: Hi Angel, I fear I may be misunderstanding your point, though? It might be read as implying, for example, that converting from mzML back to mzXML for the purposes of ASAPRatio and its elution profiling is a proper thing to do, but I don't expect that's what you meant to say. Can you clarify? Yep, that's exactly what I was proposing, but maybe ASAP ration is a bad example since ASAP ratio is open source and controlled by the TPP folks ;) A better example would be sequest and bioworks, which uses a binary file format for storing processed peaks and the result in one file. The conversion would be mzML -> RAW/SRF -> SRF -> whatever you want here. The pay-off for bioworks to do something like this is fine-tuned random access for spectral processing. Plus the code investment in supporting mzML is relatively small and restricted to in/out of their format. Actually I take it back, ASAPR is a good example b/c using this model of translating an archive format to/from operational formats allows the ISB to put its development effort on newer algorithms, and prevent older projects from being put out to pasture. -angel Thanks, Brian _____ From: psi...@li... [mailto:psi...@li...] On Behalf Of Angel Pizarro Sent: Thursday, October 04, 2007 1:10 PM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] honey vs vinegar On 10/4/07, Brian Pratt <bri...@in...> wrote: These are interesting questions about how folks will use the format. I'm not comfortable with the idea that the format is intended for repositories instead of processing. I'd think you'd want a repository to contain exactly the same artifacts that were processed lest anyone wonder later what differences may have existed in the various representations of the data. I think we agree here but are coming from different perspectives. In my mind in order for a repository to have the most accurate representation of the data, the standard has to be purposed for data archival and flexible experimental annotation. Data processing routines would then take that format and do whatever it will for peak detection, noise reduction, base-line correction, etc. to give a final set of values (that typically go into the search algorithms). All of the intermediate steps in the processing should in theory be able to be represented by the same format. I think that mzML as it stands is able to do track the data and the processes that where applied to it, but it will certainly not be the most efficient way to represent the data *as the processing is being done*. A special purpose format for the algorithm at hand will always win in terms of engineering ease / speed / performance / interoperability (within a set of tools). This I think is at the heart of the whole discussion, and why I think cvParam is always getting hammered on the list. So while it seems that we are talking cross purposes, I really don't think we are. -angel ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev -- Angel Pizarro Director, Bioinformatics Facility Institute for Translational Medicine and Therapeutics University of Pennsylvania 806 BRB II/III 421 Curie Blvd. Philadelphia, PA 19104-6160 P: 215-573-3736 F: 215-573-9004 |
From: Angel P. <an...@ma...> - 2007-10-05 00:28:49
|
On 10/4/07, Brian Pratt <bri...@in...> wrote: > > Hi Angel, > > I fear I may be misunderstanding your point, though? It might be read as > implying, for example, that converting from mzML back to mzXML for the > purposes of ASAPRatio and its elution profiling is a proper thing to do, but > I don't expect that's what you meant to say. Can you clarify? > Yep, that's exactly what I was proposing, but maybe ASAP ration is a bad example since ASAP ratio is open source and controlled by the TPP folks ;) A better example would be sequest and bioworks, which uses a binary file format for storing processed peaks and the result in one file. The conversion would be mzML -> RAW/SRF -> SRF -> whatever you want here. The pay-off for bioworks to do something like this is fine-tuned random access for spectral processing. Plus the code investment in supporting mzML is relatively small and restricted to in/out of their format. Actually I take it back, ASAPR is a good example b/c using this model of translating an archive format to/from operational formats allows the ISB to put its development effort on newer algorithms, and prevent older projects from being put out to pasture. -angel Thanks, > > > > Brian > > > > > ------------------------------ > > *From:* psi...@li... [mailto: > psi...@li...] *On Behalf Of *Angel Pizarro > *Sent:* Thursday, October 04, 2007 1:10 PM > *To:* Mass spectrometry standard development > *Subject:* Re: [Psidev-ms-dev] honey vs vinegar > > > > On 10/4/07, *Brian Pratt* <bri...@in...> wrote: > > These are interesting questions about how folks will use the format. I'm > not comfortable with the idea that the format is intended for repositories > instead of processing. I'd think you'd want a repository to contain > exactly > the same artifacts that were processed lest anyone wonder later what > differences may have existed in the various representations of the data. > > > I think we agree here but are coming from different perspectives. In my > mind in order for a repository to have the most accurate representation of > the data, the standard has to be purposed for data archival and flexible > experimental annotation. Data processing routines would then take that > format and do whatever it will for peak detection, noise reduction, > base-line correction, etc. to give a final set of values (that typically go > into the search algorithms). All of the intermediate steps in the processing > should in theory be able to be represented by the same format. > > I think that mzML as it stands is able to do track the data and the > processes that where applied to it, but it will certainly not be the most > efficient way to represent the data *as the processing is being done*. A > special purpose format for the algorithm at hand will always win in terms of > engineering ease / speed / performance / interoperability (within a set of > tools). > > This I think is at the heart of the whole discussion, and why I think > cvParam is always getting hammered on the list. So while it seems that we > are talking cross purposes, I really don't think we are. > > -angel > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > -- Angel Pizarro Director, Bioinformatics Facility Institute for Translational Medicine and Therapeutics University of Pennsylvania 806 BRB II/III 421 Curie Blvd. Philadelphia, PA 19104-6160 P: 215-573-3736 F: 215-573-9004 |
From: Angel P. <an...@ma...> - 2007-10-05 00:16:59
|
On 10/4/07, Brian Pratt <bri...@in...> wrote: > > It still kind of amazes me that this is a problem we're > solving from scratch in a world with W3C schema in it, but I'm trying to > play nice since the cvParam thing seems to have unstoppable inertia. I'd > much prefer this: > <InstrumentType name="LCQ Deca" accession="MS:1000554" /> > - that's proper XML, to my mind, as opposed to merely valid XML, and it > still leverages the power of the CV. Actually I would prefer that structure as well and asked on the list for folks to specifically outline places in the schema where this could happen: http://sourceforge.net/mailarchive/message.php?msg_name=e38f4b170708071310m76356fe5g3f81b5eff44ce2c6%40mail.gmail.com See the threads from 8/7 - 8/9 for the full discussion, but let me just put it out there that it is not too late to have these types of changes! That's what the public review process is for! I don't think we did a good enough job of communicating to folks that this type of typed CV structure was an option for schema change proposals. -angel |
From: Brian P. <bri...@in...> - 2007-10-05 00:05:46
|
Hi Matt, You're right, to get complete automated validation from standard XML handling tools you'd want to employ restriction elements in the schema. So, yes, the schema would officially rev every time the CV officially did, which makes sense as it's a tool for checking CV conformance. And in the end, stability of the schema isn't the goal - stability of the code that deals with the data format is the goal. For the kind of leaf-level CV changes we're talking about, most parsers would *not* change since they do not in general bother with validating against restriction lists for performance reasons. As such parsers would also function perfectly well on most data that anticipate official CV+schema updates. And, the mzML format would be more compact and much more human readable. This external CV mapping file sounds like an artifact that could just as readily be derived on the fly by examining the is_a and part_of fields in the CV itself, yes? Have you got a URL for an example? Brian -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Matt Chambers Sent: Thursday, October 04, 2007 4:18 PM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process I think I may understand him. However, as far as I know there ARE supposed to be restriction elements for instrument names (otherwise you wouldn't have a valid accession number; although like I've already suggested, we could have a special accession number to mean 'not yet in CV' or 'CV entry pending'). With the external mapping file, they've got the following logic: > Given our current parser state in the "spectrum description" section of a spectrum, make sure all cvParams in this section have an accession number in the CV that pertains to describing the spectrum, e.g. the accession number for "SRM Spectrum." It can get more specific than that, of course. So the mapping file could stay the same when terms are added, it would only need to be changed when the schema's structure changed. As far as I know, with an XML schema, there is no way to create an enumeration dynamically, i.e. for a cvParam in the spectrum description section: <xs:restriction><-- dynamically restrict to accession numbers in CV related to spectrum description --></xs:restriction> If I understand this right, I still don't get the advantage. What do we gain by having a stable mapping file which dynamically restricts by looking up to the CV, versus a machine-generated schema which is automatically updated every time the CV changes? In both cases, you can't remove terms from the CV without breaking backward compatibility, but otherwise you should be fine. The only changes between schema versions would be changes to the <xs:restriction> enumerations that define which accession numbers can appear where. -Matt Brian Pratt wrote: > Hi Lennart, > > I'm not sure I understand, but my guess is that what's being said here is > that most CV additions are just leaves on the inheritance tree, along the > lines of our example of the introduction of "Super Ion Trap Turbo", and are > minimally disruptive. Such additions would be minimally disruptive to a W3C > schema as well, as long as it doesn't bother with restriction elements for > things like instrument names, which it really shouldn't (it's not an error > to come up with a new instrument name value). Thus the addition of > instrument type "Super Ion Trap Turbo" to the CV would not provoke a rev of > the the W3C schema, so that's nothing to worry about if we went that route. > > > Come to think of it, it sounds a bit like that mapping file is just another > dialect of schema? Maybe we're nearly there already. > > But I'm pretty sure I didn't understand... perhaps an example would help? > > Thanks, > > Brian > > > > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On Behalf Of Lennart > Martens > Sent: Thursday, October 04, 2007 3:21 PM > To: Matthew Chambers > Cc: psi...@li... > Subject: Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process > > Hi Matt, > > >> But what is the different between a frequently updated mapping >> file which is REQUIRED to get semantic validation, and a frequently >> updated primary schema which is REQUIRED to get semantic validation? >> > > The fact that the mapping file most often does not need to be updated to > operate correctly after CV changes, since it is based on the CV > structure (term-to-term links) rather than the actual accession numbers. > Indeed, for many CV param elements, the required (allowed) accession > numbers for that alement are not even in the cv mapping. > > > Cheers, > > lnnrt. > ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Matt C. <mat...@va...> - 2007-10-04 23:20:20
|
I think I may understand him. However, as far as I know there ARE supposed to be restriction elements for instrument names (otherwise you wouldn't have a valid accession number; although like I've already suggested, we could have a special accession number to mean 'not yet in CV' or 'CV entry pending'). With the external mapping file, they've got the following logic: > Given our current parser state in the "spectrum description" section of a spectrum, make sure all cvParams in this section have an accession number in the CV that pertains to describing the spectrum, e.g. the accession number for "SRM Spectrum." It can get more specific than that, of course. So the mapping file could stay the same when terms are added, it would only need to be changed when the schema's structure changed. As far as I know, with an XML schema, there is no way to create an enumeration dynamically, i.e. for a cvParam in the spectrum description section: <xs:restriction><-- dynamically restrict to accession numbers in CV related to spectrum description --></xs:restriction> If I understand this right, I still don't get the advantage. What do we gain by having a stable mapping file which dynamically restricts by looking up to the CV, versus a machine-generated schema which is automatically updated every time the CV changes? In both cases, you can't remove terms from the CV without breaking backward compatibility, but otherwise you should be fine. The only changes between schema versions would be changes to the <xs:restriction> enumerations that define which accession numbers can appear where. -Matt Brian Pratt wrote: > Hi Lennart, > > I'm not sure I understand, but my guess is that what's being said here is > that most CV additions are just leaves on the inheritance tree, along the > lines of our example of the introduction of "Super Ion Trap Turbo", and are > minimally disruptive. Such additions would be minimally disruptive to a W3C > schema as well, as long as it doesn't bother with restriction elements for > things like instrument names, which it really shouldn't (it's not an error > to come up with a new instrument name value). Thus the addition of > instrument type "Super Ion Trap Turbo" to the CV would not provoke a rev of > the the W3C schema, so that's nothing to worry about if we went that route. > > > Come to think of it, it sounds a bit like that mapping file is just another > dialect of schema? Maybe we're nearly there already. > > But I'm pretty sure I didn't understand... perhaps an example would help? > > Thanks, > > Brian > > > > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On Behalf Of Lennart > Martens > Sent: Thursday, October 04, 2007 3:21 PM > To: Matthew Chambers > Cc: psi...@li... > Subject: Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process > > Hi Matt, > > >> But what is the different between a frequently updated mapping >> file which is REQUIRED to get semantic validation, and a frequently >> updated primary schema which is REQUIRED to get semantic validation? >> > > The fact that the mapping file most often does not need to be updated to > operate correctly after CV changes, since it is based on the CV > structure (term-to-term links) rather than the actual accession numbers. > Indeed, for many CV param elements, the required (allowed) accession > numbers for that alement are not even in the cv mapping. > > > Cheers, > > lnnrt. > |
From: Brian P. <bri...@in...> - 2007-10-04 22:47:39
|
Hi Lennart, I'm not sure I understand, but my guess is that what's being said here is that most CV additions are just leaves on the inheritance tree, along the lines of our example of the introduction of "Super Ion Trap Turbo", and are minimally disruptive. Such additions would be minimally disruptive to a W3C schema as well, as long as it doesn't bother with restriction elements for things like instrument names, which it really shouldn't (it's not an error to come up with a new instrument name value). Thus the addition of instrument type "Super Ion Trap Turbo" to the CV would not provoke a rev of the the W3C schema, so that's nothing to worry about if we went that route. Come to think of it, it sounds a bit like that mapping file is just another dialect of schema? Maybe we're nearly there already. But I'm pretty sure I didn't understand... perhaps an example would help? Thanks, Brian -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Lennart Martens Sent: Thursday, October 04, 2007 3:21 PM To: Matthew Chambers Cc: psi...@li... Subject: Re: [Psidev-ms-dev] mzML 0.99.0 submitted to document process Hi Matt, > But what is the different between a frequently updated mapping > file which is REQUIRED to get semantic validation, and a frequently > updated primary schema which is REQUIRED to get semantic validation? The fact that the mapping file most often does not need to be updated to operate correctly after CV changes, since it is based on the CV structure (term-to-term links) rather than the actual accession numbers. Indeed, for many CV param elements, the required (allowed) accession numbers for that alement are not even in the cv mapping. Cheers, lnnrt. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Brian P. <bri...@in...> - 2007-10-04 22:27:57
|
Agreed on both counts. I'm just making the case for a format that requires as few conversion steps as possible in an analysis pipeline, since each is an opportunity for introduction of error. In some cases (input to closed source tools) another file format conversion is unavoidable, but in all others it would be best if mzML was a format that lends itself to easy and fast conversion directly to data structures by the tool (that is, easy to write and maintain parsers for). This in response to a perceived argument along the lines of "it's ok if it's kind of hard to parse efficiently, just convert it to some special-purpose format that better suits the performance needs of the tool in question", which just strikes me as the wrong approach. - Brian -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Mike Coleman Sent: Thursday, October 04, 2007 2:56 PM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] honey vs vinegar On 10/4/07, Brian Pratt <bri...@in...> wrote: > I'm > not comfortable with the idea that the format is intended for repositories > instead of processing. I'd think you'd want a repository to contain exactly > the same artifacts that were processed lest anyone wonder later what > differences may have existed in the various representations of the data. If you're talking about mzML files vs (say) ms2 files, it makes sense to me to archive the mzML file and then specify that version X of mzML-to-ms2 was used to prepare the spectra for search. If you're talking about mzML files vs RAW files, I'd still prefer to archive the mzML files, even though they are conceptually downstream from the RAW files. Although both files are produced via magical processes (secret vendor software), at least the mzML file follows a standard and can be read and understood without further magic. Mike ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Lennart M. <len...@gm...> - 2007-10-04 22:21:07
|
Hi Matt, > But what is the different between a frequently updated mapping > file which is REQUIRED to get semantic validation, and a frequently > updated primary schema which is REQUIRED to get semantic validation? The fact that the mapping file most often does not need to be updated to operate correctly after CV changes, since it is based on the CV structure (term-to-term links) rather than the actual accession numbers. Indeed, for many CV param elements, the required (allowed) accession numbers for that alement are not even in the cv mapping. Cheers, lnnrt. |
From: Mike C. <tu...@gm...> - 2007-10-04 21:56:06
|
On 10/4/07, Brian Pratt <bri...@in...> wrote: > I'm > not comfortable with the idea that the format is intended for repositories > instead of processing. I'd think you'd want a repository to contain exactly > the same artifacts that were processed lest anyone wonder later what > differences may have existed in the various representations of the data. If you're talking about mzML files vs (say) ms2 files, it makes sense to me to archive the mzML file and then specify that version X of mzML-to-ms2 was used to prepare the spectra for search. If you're talking about mzML files vs RAW files, I'd still prefer to archive the mzML files, even though they are conceptually downstream from the RAW files. Although both files are produced via magical processes (secret vendor software), at least the mzML file follows a standard and can be read and understood without further magic. Mike |
From: Mike C. <tu...@gm...> - 2007-10-04 21:40:37
|
On 10/4/07, Matthew Chambers <mat...@va...> wrote: > I am under the impression that /replacing/ the vendor-specific raw > formats was never the intent of this specification. By replacing, I > assume you mean that the people who run MS instruments would choose to > store all their data in mzML instead of the raw instrument formats (i.e. > the mzML would be archived and the raw formats deleted). It's possible that I have missed the plot. This is what *I* am hoping to get out of the spec, anyway. It appears to me that mzML files will have all of the information in them that I care about, and since they are standardized and relatively readable, in that sense they would be superior to RAW files, which are ad hoc and opaque. > > That said, it's not yet clear how things will play out beyond that. > > Will users/programmers/shops choose to keep their data in mzML format > > and develop lots of programs to deal with that format? Or will they > > choose to immediately "rip" mzML files into some other format that > > they perceive to be simpler, better, or more familiar? > > > If a software group develops support for reading mzML, developing a > writer should be a piece of cake and I see no incentive for them to > create and write their own redundant format when they already developed > a reader for mzML. To be a little more concrete, our current pipeline uses ms2 files. Our current primary search program does not accept mzML input and we do not have source code for the program, which means that we cannot adapt it to do so. Also, it appears that for our spectra, mzML files may be somewhat larger than the corresponding ms2 files. They're also not as human-readable. Plus other issues like what we're discussing today. So, there may be notable pluses and minuses to a full-scale conversion to mzML, versus a rip-to-ms2 approach, which would be cheap and simple in the short run. I'm taking a wait-and-see approach for now. > I don't think there is a lot of controversy over using a single MS data > exchange format (excepting the current cvParam one) I don't think there will be shouting matches, no. Perhaps it would be more like the IPv6 conversion that was--and is still--just around the corner. > it's when the > analysisXml standard nears completion that software groups will really > get serious headaches trying to decide what format to store their > analysis results in. :) I've been too afraid of this to look. :-) Mike |
From: Matthew C. <mat...@va...> - 2007-10-04 21:27:51
|
I am starting to agree with Brian in that it seems that some of our requirements are mutually exclusive: - we want a schema that doesn't change -> thus we cannot represent the ever-changing semantics in the schema - we want a semantic validation tool -> thus we need the tool to keep up with the ever-changing semantics somehow, be it in the schema or some external mapping file, I don't see the difference! And what is the point of the schema itself if it doesn't capture the semantics of the specification? -Matt Brian Pratt wrote: > Quite right, attribute order ought not to matter syntactically. Just a > convention suggestion. > > I was thinking that the parentAcession would be the immediate parent in the > inheritance tree so you could begin finding your way up to something you > recognize (the root of the tree might be higher than you wanted to go, and > finding your way down is even more annoying than finding your way up). > > Of course having the immediate parent's accession number is not much help if > the parent isn't in the CV, but all we're really hoping to guard against > here is failing in the case of things like the new "LCQ Deca Turbo" model > coming out, when the data looks otherwise the same as that from the "LCQ > Deca" model. There's no magic bullet for dealing with radical additions to > the syntax - I think we're really just wrangling about how to deal with new > enum values. It still kind of amazes me that this is a problem we're > solving from scratch in a world with W3C schema in it, but I'm trying to > play nice since the cvParam thing seems to have unstoppable inertia. I'd > much prefer this: > <InstrumentType name="LCQ Deca" accession="MS:1000554" /> > - that's proper XML, to my mind, as opposed to merely valid XML, and it > still leverages the power of the CV. A schema generated from and referring > to the CV just doesn't seem like a problem - there's a schema in the CV > crying to get out, in the form of the is_a and part_of data (and if there > isn't, the CV is probably broken, so it's a useful exercise either way). > > - Brian > |
From: Brian P. <bri...@in...> - 2007-10-04 21:13:37
|
Hi Angel, I don't think anyone meant to say that mzML should represent the data as the processing is being done, that's normally some in-memory representation. There's just concern that the cvParam approach makes getting the data out of the file and into data structures for processing more complex than it needs to be. I fear I may be misunderstanding your point, though? It might be read as implying, for example, that converting from mzML back to mzXML for the purposes of ASAPRatio and its elution profiling is a proper thing to do, but I don't expect that's what you meant to say. Can you clarify? Thanks, Brian _____ From: psi...@li... [mailto:psi...@li...] On Behalf Of Angel Pizarro Sent: Thursday, October 04, 2007 1:10 PM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] honey vs vinegar On 10/4/07, Brian Pratt <bri...@in...> wrote: These are interesting questions about how folks will use the format. I'm not comfortable with the idea that the format is intended for repositories instead of processing. I'd think you'd want a repository to contain exactly the same artifacts that were processed lest anyone wonder later what differences may have existed in the various representations of the data. I think we agree here but are coming from different perspectives. In my mind in order for a repository to have the most accurate representation of the data, the standard has to be purposed for data archival and flexible experimental annotation. Data processing routines would then take that format and do whatever it will for peak detection, noise reduction, base-line correction, etc. to give a final set of values (that typically go into the search algorithms). All of the intermediate steps in the processing should in theory be able to be represented by the same format. I think that mzML as it stands is able to do track the data and the processes that where applied to it, but it will certainly not be the most efficient way to represent the data *as the processing is being done*. A special purpose format for the algorithm at hand will always win in terms of engineering ease / speed / performance / interoperability (within a set of tools). This I think is at the heart of the whole discussion, and why I think cvParam is always getting hammered on the list. So while it seems that we are talking cross purposes, I really don't think we are. -angel |
From: Brian P. <bri...@in...> - 2007-10-04 21:11:34
|
Quite right, attribute order ought not to matter syntactically. Just a convention suggestion. I was thinking that the parentAcession would be the immediate parent in the inheritance tree so you could begin finding your way up to something you recognize (the root of the tree might be higher than you wanted to go, and finding your way down is even more annoying than finding your way up). Of course having the immediate parent's accession number is not much help if the parent isn't in the CV, but all we're really hoping to guard against here is failing in the case of things like the new "LCQ Deca Turbo" model coming out, when the data looks otherwise the same as that from the "LCQ Deca" model. There's no magic bullet for dealing with radical additions to the syntax - I think we're really just wrangling about how to deal with new enum values. It still kind of amazes me that this is a problem we're solving from scratch in a world with W3C schema in it, but I'm trying to play nice since the cvParam thing seems to have unstoppable inertia. I'd much prefer this: <InstrumentType name="LCQ Deca" accession="MS:1000554" /> - that's proper XML, to my mind, as opposed to merely valid XML, and it still leverages the power of the CV. A schema generated from and referring to the CV just doesn't seem like a problem - there's a schema in the CV crying to get out, in the form of the is_a and part_of data (and if there isn't, the CV is probably broken, so it's a useful exercise either way). - Brian -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Matthew Chambers Sent: Thursday, October 04, 2007 12:38 PM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] Option A, B, or C Brian Pratt wrote: > To review: > A) <cvParam cvLabel="MS" accession="MS:1000583" name="SRM spectrum" > value=""/> > > B) <cvParam cvLabel="MS" accession="MS:1000035" name="spectrum type" > value="SRM spectrum"/> > > C) <cvParam cvLabel="MS" categoryAccession="MS:1000035" > categoryName="spectrum type" accession="MS:1000583" name="SRM > spectrum" value=""/> > > I'd propose option D (or C+ if you prefer): > > <cvParam cvLabel="MS" categoryAccession="MS:1000035" > accession="MS:1000583" name="SRM spectrum" /> > > The category (I'd prefer "parent") name is redundant - the parser is > going to use the accession number, and the human is going to get > meaning from the name itself with the CV as a fallback. The value for > "value" should be defaulted to "", it's just taking up space. > > Also, for eyeballing purposes it would be nice if the human readable > part came first rather than last, if it's all the same to the parsers. > And, I'd move the parent to the end since it's likely it won't be > needed. So, > > <cvParam name="SRM spectrum" cvLabel="MS" accession="MS:1000583" > parentAccession="MS:1000035"/> > > - Brian I agree that ordering the attributes the way you have them might be good for convention and they should be that way in the examples, there's no reason to actually require them to be in the order, is there? Also, to add my proposal from the other post, I'll call it: E) <cvParam cvLabel="MS" accession="MS:1000035" name="spectrum type" valueAccession="MS:1000583" valueName="SRM spectrum"/> I feel rather strongly that the "name" of a "parameter" should not ever be interpreted as a value. A "valueName" on the other hand, can be a text description of the valueAccession which is what the parser will usually care about. Additionally, this proposal allows the "accession" attribute to consistently refer to a category, instead of sometimes referring to a category and sometimes referring to a value, which is counter-intuitive. Another thing to discuss with either C, D, or E, is what exactly is the "category" accession going to refer to? In a previous post of yours Brian, you wrote: > Piling on with Mike, here: > So the first thing any parser must do is load up the OBO file. In > practice, such a software system will need to bundle an OBO in some > fashion, in the extremely likely event that the OBO used by the mzML > file in question is not present. Don't forget to update your distro > each time the OBO gets updated, and make sure that in the event the > OBO used by the mzML file IS present, you use that intead. > Then, read: > > > <cvParam cvLabel="MS" accession="MS:1000554" name="LCQ Deca" value=""/> > > then ask yourself, "whazzat?", and look up: > > id: MS:1000554 > name: LCQ Deca > def: "ThermoFinnigan LCQ Deca." [PSI:MS] > is_a: MS:1000125 ! thermo finnigan > > which leads you to: > > id: MS:1000125 > name: thermo finnigan > def: "ThermoFinnigan from Thermo Electron Corporation" [PSI:MS] > is_a: MS:1000483 ! thermo fisher scientific > > which leads you to: > > id: MS:1000483 > name: thermo fisher scientific > def: "Thermo Fisher Scientific. Also known as Thermo Finnigan > corporation." [PSI:MS] > related_synonym: "Thermo Scientific" [] > is_a: MS:1000031 ! model by vendor > > which leads you to: > > id: MS:1000031 > name: model by vendor > def: "Instrument's model name (everything but the vendor's name) > ---Free text ?" [PSI:MS] > relationship: part_of MS:1000463 ! instrument description > > which leads you to: > > id: MS:1000463 > name: instrument description > def: "Device which performs a measurement." [PSI:MS] > relationship: part_of MS:0000000 ! mzOntology > > aha! now populate the "instrument description" element in your database. > So the main category is MS:1000463, but MS:1000463 is not the parent of MS:1000554 (it is an ancestor, but more specifically it is the root). Intuitively, the category accession number should of course be the root in this case, but will that always be the case? -Matt ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Angel P. <an...@ma...> - 2007-10-04 20:09:49
|
On 10/4/07, Brian Pratt <bri...@in...> wrote: > > These are interesting questions about how folks will use the format. I'm > not comfortable with the idea that the format is intended for repositories > instead of processing. I'd think you'd want a repository to contain > exactly > the same artifacts that were processed lest anyone wonder later what > differences may have existed in the various representations of the data. I think we agree here but are coming from different perspectives. In my mind in order for a repository to have the most accurate representation of the data, the standard has to be purposed for data archival and flexible experimental annotation. Data processing routines would then take that format and do whatever it will for peak detection, noise reduction, base-line correction, etc. to give a final set of values (that typically go into the search algorithms). All of the intermediate steps in the processing should in theory be able to be represented by the same format. I think that mzML as it stands is able to do track the data and the processes that where applied to it, but it will certainly not be the most efficient way to represent the data *as the processing is being done*. A special purpose format for the algorithm at hand will always win in terms of engineering ease / speed / performance / interoperability (within a set of tools). This I think is at the heart of the whole discussion, and why I think cvParam is always getting hammered on the list. So while it seems that we are talking cross purposes, I really don't think we are. -angel |
From: Chris T. <chr...@eb...> - 2007-10-04 19:53:27
|
There is another reason for numerical accessions in classifications (that I may have missed someone else offering in the flood today) be it a CV or a DB like GenBank or whatever, which is kind of trivial but nonetheless worth keeping in mind (and regardless, let us remember that not only the PSI's CVs constitute use cases for whatever structure is agreed -- while the MS CV is under PSI control, little else is): The reason is a simple one -- accession _numbers_ are most usually used because they are assigned like tickets for people waiting in line at the store -- whatever turns up gets the next available number from the stack basically. Using meaningful strings makes this much more of a pain as the space of 'nice' names will get used up and you can guess the rest -- names will ultimately get less intuitive (and remember a good CV can take a paragraph to _define_ a concept to avoid misinterpretation, so a word/phrase is not enough to achieve interpretability in many cases anyway); it'll be an increasing pain checking uniqueness before assigning new labels; case-sensitivity issues may even arise in some contexts perhaps (although I know you will tell me that lookup and other processing is unaffacted). A nice contrast can be had by comparing DeltaMass (term = accession -- worst case scenario) to say RESID or Unimod. Another thought occurs -- would one need to agree a naming convention for names as accessions? No white space -- underscores versus CamelHump versus camelHump etc. A world of hurt as Jesse Ventura once put it ;) Cheers, Chris. P.S. I know none of the above are killer arguments, but maybe strawsForTheCamelsBack? Matthew Chambers wrote: > Brian Pratt wrote: >> To review: >> A) <cvParam cvLabel="MS" accession="MS:1000583" name="SRM spectrum" >> value=""/> >> >> B) <cvParam cvLabel="MS" accession="MS:1000035" name="spectrum type" >> value="SRM spectrum"/> >> >> C) <cvParam cvLabel="MS" categoryAccession=”MS:1000035” >> categoryName=”spectrum type” accession="MS:1000583" name="SRM >> spectrum" value=""/> >> >> I'd propose option D (or C+ if you prefer): >> >> <cvParam cvLabel="MS" categoryAccession=”MS:1000035” >> accession="MS:1000583" name="SRM spectrum" /> >> >> The category (I'd prefer "parent") name is redundant - the parser is >> going to use the accession number, and the human is going to get >> meaning from the name itself with the CV as a fallback. The value for >> "value" should be defaulted to "", it's just taking up space. >> >> Also, for eyeballing purposes it would be nice if the human readable >> part came first rather than last, if it's all the same to the parsers. >> And, I'd move the parent to the end since it's likely it won't be >> needed. So, >> >> <cvParam name="SRM spectrum" cvLabel="MS" accession="MS:1000583" >> parentAccession=”MS:1000035”/> >> >> - Brian > I agree that ordering the attributes the way you have them might be good > for convention and they should be that way in the examples, there's no > reason to actually require them to be in the order, is there? Also, to > add my proposal from the other post, I'll call it: > E) <cvParam cvLabel="MS" accession=”MS:1000035” name=”spectrum type” > valueAccession="MS:1000583" valueName="SRM spectrum"/> > > I feel rather strongly that the "name" of a "parameter" should not ever > be interpreted as a value. A "valueName" on the other hand, can be a > text description of the valueAccession which is what the parser will > usually care about. Additionally, this proposal allows the "accession" > attribute to consistently refer to a category, instead of sometimes > referring to a category and sometimes referring to a value, which is > counter-intuitive. > > Another thing to discuss with either C, D, or E, is what exactly is the > "category" accession going to refer to? In a previous post of yours > Brian, you wrote: >> Piling on with Mike, here: >> So the first thing any parser must do is load up the OBO file. In >> practice, such a software system will need to bundle an OBO in some >> fashion, in the extremely likely event that the OBO used by the mzML >> file in question is not present. Don't forget to update your distro >> each time the OBO gets updated, and make sure that in the event the >> OBO used by the mzML file IS present, you use that intead. >> Then, read: >> >> >> <cvParam cvLabel="MS" accession="MS:1000554" name="LCQ Deca" value=""/> >> >> then ask yourself, "whazzat?", and look up: >> >> id: MS:1000554 >> name: LCQ Deca >> def: "ThermoFinnigan LCQ Deca." [PSI:MS] >> is_a: MS:1000125 ! thermo finnigan >> >> which leads you to: >> >> id: MS:1000125 >> name: thermo finnigan >> def: "ThermoFinnigan from Thermo Electron Corporation" [PSI:MS] >> is_a: MS:1000483 ! thermo fisher scientific >> >> which leads you to: >> >> id: MS:1000483 >> name: thermo fisher scientific >> def: "Thermo Fisher Scientific. Also known as Thermo Finnigan >> corporation." [PSI:MS] >> related_synonym: "Thermo Scientific" [] >> is_a: MS:1000031 ! model by vendor >> >> which leads you to: >> >> id: MS:1000031 >> name: model by vendor >> def: "Instrument's model name (everything but the vendor's name) >> ---Free text ?" [PSI:MS] >> relationship: part_of MS:1000463 ! instrument description >> >> which leads you to: >> >> id: MS:1000463 >> name: instrument description >> def: "Device which performs a measurement." [PSI:MS] >> relationship: part_of MS:0000000 ! mzOntology >> >> aha! now populate the "instrument description" element in your database. >> > So the main category is MS:1000463, but MS:1000463 is not the parent of > MS:1000554 (it is an ancestor, but more specifically it is the root). > Intuitively, the category accession number should of course be the root > in this case, but will that always be the case? > > -Matt > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > -- ~~~~~~~~~~~~~~~~~~~~~~~~ chr...@eb... http://mibbi.sf.net/ ~~~~~~~~~~~~~~~~~~~~~~~~ |
From: Matthew C. <mat...@va...> - 2007-10-04 19:37:31
|
Brian Pratt wrote: > To review: > A) <cvParam cvLabel="MS" accession="MS:1000583" name="SRM spectrum" > value=""/> > > B) <cvParam cvLabel="MS" accession="MS:1000035" name="spectrum type" > value="SRM spectrum"/> > > C) <cvParam cvLabel="MS" categoryAccession=”MS:1000035” > categoryName=”spectrum type” accession="MS:1000583" name="SRM > spectrum" value=""/> > > I'd propose option D (or C+ if you prefer): > > <cvParam cvLabel="MS" categoryAccession=”MS:1000035” > accession="MS:1000583" name="SRM spectrum" /> > > The category (I'd prefer "parent") name is redundant - the parser is > going to use the accession number, and the human is going to get > meaning from the name itself with the CV as a fallback. The value for > "value" should be defaulted to "", it's just taking up space. > > Also, for eyeballing purposes it would be nice if the human readable > part came first rather than last, if it's all the same to the parsers. > And, I'd move the parent to the end since it's likely it won't be > needed. So, > > <cvParam name="SRM spectrum" cvLabel="MS" accession="MS:1000583" > parentAccession=”MS:1000035”/> > > - Brian I agree that ordering the attributes the way you have them might be good for convention and they should be that way in the examples, there's no reason to actually require them to be in the order, is there? Also, to add my proposal from the other post, I'll call it: E) <cvParam cvLabel="MS" accession=”MS:1000035” name=”spectrum type” valueAccession="MS:1000583" valueName="SRM spectrum"/> I feel rather strongly that the "name" of a "parameter" should not ever be interpreted as a value. A "valueName" on the other hand, can be a text description of the valueAccession which is what the parser will usually care about. Additionally, this proposal allows the "accession" attribute to consistently refer to a category, instead of sometimes referring to a category and sometimes referring to a value, which is counter-intuitive. Another thing to discuss with either C, D, or E, is what exactly is the "category" accession going to refer to? In a previous post of yours Brian, you wrote: > Piling on with Mike, here: > So the first thing any parser must do is load up the OBO file. In > practice, such a software system will need to bundle an OBO in some > fashion, in the extremely likely event that the OBO used by the mzML > file in question is not present. Don't forget to update your distro > each time the OBO gets updated, and make sure that in the event the > OBO used by the mzML file IS present, you use that intead. > Then, read: > > > <cvParam cvLabel="MS" accession="MS:1000554" name="LCQ Deca" value=""/> > > then ask yourself, "whazzat?", and look up: > > id: MS:1000554 > name: LCQ Deca > def: "ThermoFinnigan LCQ Deca." [PSI:MS] > is_a: MS:1000125 ! thermo finnigan > > which leads you to: > > id: MS:1000125 > name: thermo finnigan > def: "ThermoFinnigan from Thermo Electron Corporation" [PSI:MS] > is_a: MS:1000483 ! thermo fisher scientific > > which leads you to: > > id: MS:1000483 > name: thermo fisher scientific > def: "Thermo Fisher Scientific. Also known as Thermo Finnigan > corporation." [PSI:MS] > related_synonym: "Thermo Scientific" [] > is_a: MS:1000031 ! model by vendor > > which leads you to: > > id: MS:1000031 > name: model by vendor > def: "Instrument's model name (everything but the vendor's name) > ---Free text ?" [PSI:MS] > relationship: part_of MS:1000463 ! instrument description > > which leads you to: > > id: MS:1000463 > name: instrument description > def: "Device which performs a measurement." [PSI:MS] > relationship: part_of MS:0000000 ! mzOntology > > aha! now populate the "instrument description" element in your database. > So the main category is MS:1000463, but MS:1000463 is not the parent of MS:1000554 (it is an ancestor, but more specifically it is the root). Intuitively, the category accession number should of course be the root in this case, but will that always be the case? -Matt |
From: Brian P. <bri...@in...> - 2007-10-04 19:36:11
|
These are interesting questions about how folks will use the format. I'm not comfortable with the idea that the format is intended for repositories instead of processing. I'd think you'd want a repository to contain exactly the same artifacts that were processed lest anyone wonder later what differences may have existed in the various representations of the data. Seems to me the format has to be suitable for processing first and foremost or it's not likely to end up in a repository at all. - Brian -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Matthew Chambers Sent: Thursday, October 04, 2007 12:17 PM To: Mike Coleman Cc: PSI MS Dev Subject: Re: [Psidev-ms-dev] honey vs vinegar Mike Coleman wrote: > As far as I'm concerned, the only thing mzML had to do to be a wild > success was to provide a standard replacement for the current > vendor-specific, secret RAW file formats, anything further being > gravy. This is clearly going to happen and I think those responsible > should pat themselves on the back for a job well done. > > I am under the impression that /replacing/ the vendor-specific raw formats was never the intent of this specification. By replacing, I assume you mean that the people who run MS instruments would choose to store all their data in mzML instead of the raw instrument formats (i.e. the mzML would be archived and the raw formats deleted). I do not understand how such a thing would even be possible without intense cooperation, dedication, and commitment from all of the vendors. I don't think we have that and I think it's not realistic to expect it. I am under the impression that the intent of this specification is to provide a way to exchange the most significant metadata and data of MS runs. By that intent, the current spec looks great (excepting the current cvParam controversy). > That said, it's not yet clear how things will play out beyond that. > Will users/programmers/shops choose to keep their data in mzML format > and develop lots of programs to deal with that format? Or will they > choose to immediately "rip" mzML files into some other format that > they perceive to be simpler, better, or more familiar? > If a software group develops support for reading mzML, developing a writer should be a piece of cake and I see no incentive for them to create and write their own redundant format when they already developed a reader for mzML. > Each shop will be forced to make this decision eventually, and > developers will also have to make it for each program that they write. > I think most people would prefer the mzML-everywhere alternative, but > there is no Microsoft here to force the decision, so mzML must win by > being as appealing as possible. > I don't think there is a lot of controversy over using a single MS data exchange format (excepting the current cvParam one); it's when the analysisXml standard nears completion that software groups will really get serious headaches trying to decide what format to store their analysis results in. :) > To me, this means keeping things as simple and intuitive as possible, > and keeping them as decoupled as possible from other systems and > programs. Ideally, mzML would be even "fun" to use. > > Agreed. I will add that, to me, intuitive means that values are not stored in a 'name' attribute with no explicit category context. -Matt ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Mike C. <tu...@gm...> - 2007-10-04 19:34:25
|
F) <cvParam cvLabel="MS" categoryName="spectrum type" name="SRM spectrum"> ? That is, can the accession number be uniquely determined from the name? If so, could these be looked up later if needed? Mike |
From: Matthew C. <mat...@va...> - 2007-10-04 19:16:50
|
Mike Coleman wrote: > As far as I'm concerned, the only thing mzML had to do to be a wild > success was to provide a standard replacement for the current > vendor-specific, secret RAW file formats, anything further being > gravy. This is clearly going to happen and I think those responsible > should pat themselves on the back for a job well done. > > I am under the impression that /replacing/ the vendor-specific raw formats was never the intent of this specification. By replacing, I assume you mean that the people who run MS instruments would choose to store all their data in mzML instead of the raw instrument formats (i.e. the mzML would be archived and the raw formats deleted). I do not understand how such a thing would even be possible without intense cooperation, dedication, and commitment from all of the vendors. I don't think we have that and I think it's not realistic to expect it. I am under the impression that the intent of this specification is to provide a way to exchange the most significant metadata and data of MS runs. By that intent, the current spec looks great (excepting the current cvParam controversy). > That said, it's not yet clear how things will play out beyond that. > Will users/programmers/shops choose to keep their data in mzML format > and develop lots of programs to deal with that format? Or will they > choose to immediately "rip" mzML files into some other format that > they perceive to be simpler, better, or more familiar? > If a software group develops support for reading mzML, developing a writer should be a piece of cake and I see no incentive for them to create and write their own redundant format when they already developed a reader for mzML. > Each shop will be forced to make this decision eventually, and > developers will also have to make it for each program that they write. > I think most people would prefer the mzML-everywhere alternative, but > there is no Microsoft here to force the decision, so mzML must win by > being as appealing as possible. > I don't think there is a lot of controversy over using a single MS data exchange format (excepting the current cvParam one); it's when the analysisXml standard nears completion that software groups will really get serious headaches trying to decide what format to store their analysis results in. :) > To me, this means keeping things as simple and intuitive as possible, > and keeping them as decoupled as possible from other systems and > programs. Ideally, mzML would be even "fun" to use. > > Agreed. I will add that, to me, intuitive means that values are not stored in a 'name' attribute with no explicit category context. -Matt |
From: Brian P. <bri...@in...> - 2007-10-04 19:05:12
|
To review: A) <cvParam cvLabel="MS" accession="MS:1000583" name="SRM spectrum" value=""/> B) <cvParam cvLabel="MS" accession="MS:1000035" name="spectrum type" value="SRM spectrum"/> C) <cvParam cvLabel="MS" categoryAccession="MS:1000035" categoryName="spectrum type" accession="MS:1000583" name="SRM spectrum" value=""/> I'd propose option D (or C+ if you prefer): <cvParam cvLabel="MS" categoryAccession="MS:1000035" accession="MS:1000583" name="SRM spectrum" /> The category (I'd prefer "parent") name is redundant - the parser is going to use the accession number, and the human is going to get meaning from the name itself with the CV as a fallback. The value for "value" should be defaulted to "", it's just taking up space. Also, for eyeballing purposes it would be nice if the human readable part came first rather than last, if it's all the same to the parsers. And, I'd move the parent to the end since it's likely it won't be needed. So, <cvParam name="SRM spectrum" cvLabel="MS" accession="MS:1000583" parentAccession="MS:1000035"/> - Brian |
From: Matthew C. <mat...@va...> - 2007-10-04 19:00:32
|
Oh, I understand now. I am not familiar with what GPM/CPAS/SBEAMS do with MS data when they parse it, but I can certainly conceive of simply reading the cvParams in as key-value pairs and storing them as text. Like I said earlier, it's adding support in signal processing software for the new terms that has the greatest cost, and very little of that cost needs to go toward supporting the new terms in the software's parser. If the cvParam takes the form of method A in the spec, though, then a manually written, CV-unaware parser could potentially require significant changes, whereas method B or C (or my modified proposal of C) would not. -Matt Angel Pizarro wrote: > Y, I guess that it was not too clear, sorry about that. I did not mean > to imply users can add terms and accession on the fly. That would be a > userParam. cvParams need a source CV and that source CV would be the > portal for submitting new terms. > > Shameless plug for PSI: all of the working groups have a CV > development component, so if an area is important to you, please > review the CV's and send additions / amendments to the group for review! > > By my reply, I only meant that parsers written for data loading into > a repository ( e.g. theGPM / CPAS / SBEAMS) have a different set of > requirements than other tools. New terms (e.g. not in the repositories > catalog yet) should not be show-stoppers for those types of parsers. > > -angel > On 10/4/07, *Matthew Chambers* < mat...@va... > <mailto:mat...@va...>> wrote: > > I'm not sure what you're saying here. Users can programmatically > (via a > web service, I presume) add terms to the CV without going through a > community approval process? If it's something else, please elaborate. > > -Matt > > Angel Pizarro wrote: > > > > WRT to my point about operational vs. repository data formats. > For a > > repository, it is completely valid (and desirable) for the > software to > > parse this new value and add it to the list of possible values > for the > > ontology category. > > > > -angel > > > > > > |
From: Angel P. <an...@ma...> - 2007-10-04 18:51:05
|
Y, I guess that it was not too clear, sorry about that. I did not mean to imply users can add terms and accession on the fly. That would be a userParam. cvParams need a source CV and that source CV would be the portal for submitting new terms. Shameless plug for PSI: all of the working groups have a CV development component, so if an area is important to you, please review the CV's and send additions / amendments to the group for review! By my reply, I only meant that parsers written for data loading into a repository ( e.g. theGPM / CPAS / SBEAMS) have a different set of requirements than other tools. New terms (e.g. not in the repositories catalog yet) should not be show-stoppers for those types of parsers. -angel On 10/4/07, Matthew Chambers <mat...@va...> wrote: > > I'm not sure what you're saying here. Users can programmatically (via a > web service, I presume) add terms to the CV without going through a > community approval process? If it's something else, please elaborate. > > -Matt > > Angel Pizarro wrote: > > > > WRT to my point about operational vs. repository data formats. For a > > repository, it is completely valid (and desirable) for the software to > > parse this new value and add it to the list of possible values for the > > ontology category. > > > > -angel > > > > > |
From: Matthew C. <mat...@va...> - 2007-10-04 18:39:35
|
I'm not sure what you're saying here. Users can programmatically (via a web service, I presume) add terms to the CV without going through a community approval process? If it's something else, please elaborate. -Matt Angel Pizarro wrote: > > WRT to my point about operational vs. repository data formats. For a > repository, it is completely valid (and desirable) for the software to > parse this new value and add it to the list of possible values for the > ontology category. > > -angel > > |
From: Mike C. <tu...@gm...> - 2007-10-04 18:37:35
|
As far as I'm concerned, the only thing mzML had to do to be a wild success was to provide a standard replacement for the current vendor-specific, secret RAW file formats, anything further being gravy. This is clearly going to happen and I think those responsible should pat themselves on the back for a job well done. That said, it's not yet clear how things will play out beyond that. Will users/programmers/shops choose to keep their data in mzML format and develop lots of programs to deal with that format? Or will they choose to immediately "rip" mzML files into some other format that they perceive to be simpler, better, or more familiar? Each shop will be forced to make this decision eventually, and developers will also have to make it for each program that they write. I think most people would prefer the mzML-everywhere alternative, but there is no Microsoft here to force the decision, so mzML must win by being as appealing as possible. To me, this means keeping things as simple and intuitive as possible, and keeping them as decoupled as possible from other systems and programs. Ideally, mzML would be even "fun" to use. I know that this is a lot to ask for. I'll happily take all the gravy I can get. Mike |
From: Eric D. <ede...@sy...> - 2007-10-04 18:29:14
|
Hi everyone, thank you for the discussion. Please do try to keep the posts fairly respectful since we don't want to turn off others from contributing to the discussion. I won't be able to reply to all the posts here, but I am reading them with interest. I do note that the discussion is dominated by a few. I know there is a large group of lurkers out there, reading but not saying anything. I would highly encourage those who have not yet contributed to send a short note with your thoughts, however brief. While it may be a small group of us wrestling with the details, we're very interested what everyone else out there is thinking, including the vendors. Even regarding the issues of extensive use of cvParams and the CV and a long-term stable schema: the builders of mzML have taken cues from the community that this is important to them, despite the rapidly advancing field. If you have opinions on this, please do share them. Most XML formats that I generate and use myself are quite strongly structured, so such heavy use of cvParams is a stretch for me. I agree that there is a significant element of risk here. I want to believe that we can make this work because we have a high quality semantic validator easily available to the community at the time of submission for review. This is new, as far as I'm aware. If we can get everyone to use that validator responsibly, this may be a success story. Thank you! Eric > -----Original Message----- > From: psi...@li... [mailto:psidev-ms-dev- > bo...@li...] On Behalf Of Matthew Chambers > Sent: Thursday, October 04, 2007 9:57 AM > To: Angel Pizarro > Cc: psi...@li... > Subject: Re: [Psidev-ms-dev] attributes vs cvParams >=20 > Thanks Angel, I didn't intend for the discussion to get heated, it just > seemed to me that Lennart didn't understand what I posted (which may be > my fault, it's hard to know without other replies). Remember I posted > that I agree with cvParams and appreciate the flexibility they provide. > But there is a difference between cvParams that have meaning without the > CV and cvParams that aren't. I much prefer the latter. So neither of > us are arguing for cvParams to go away. You must be talking to somebody > else. :) >=20 > -Matt >=20 > Angel Pizarro wrote: > > Lennert and Matt, > > > > While I appreciate that this is a topic of great interest to everyone > > in the community, let's turn the heat down a bit. Let me see if I can > > play the arbiter here: > > > > cvParams since their introduction have always been contentious. Given > > the choice for design of a data formate where attributes (or sub > > elements or inner text) could be encoded with a tight set of > > enumerated sets of values vs. empty slots, a developer will always > > choose the former. > > > > Why then did the mzML group choose cvParams? The answer is two fold: > > 1) the audience, and 2) the intent of the standard > > > > 1) Name one standard that has received industry support across > > multiple vendors/tools/institutions that is tightly controlled with > > enumerated values. Prove me wrong, but I can't think of any. > > > > The reasons for this is that consensus building is a slow process and > > approval of any change in a data format can take months if not years. > > You need flexible data formats for standards. This already rules out > > enumerated values, but you can also make the case that vendors are > > unwilling to tie their development efforts to projects that are not > > under their complete control (essentially motivated by risk > > management). As a vendor, if you officially support even on release of > > a fast moving data format, customer expectations are such that you are > > now expected to support all future releases of that format. > > > > 2) The intent of mzML is data transfer and vendor independent storage > > of mass spec experimental data. It is not (officially) meant to be an > > operational format. Operational formats would put much more weight on > > the side of enumerated values. > > > > > > So for theses reasons (there are more though) cvParams are not going > > to go away. As for actually doing work with mzML files, Matt is > > absolutely right, this is going to be way more difficult than working > > with mzXML 2.x (as a developer) While OLS is a fine andd dandy > > project, it is not the end-all be-all solution to our problems. It > > assumes network connectivity, which is a dubious assumption. Even > > assuming very fast connectivity, the overhead of SOAP protocols are > > waaaayyy too big to except in your typical use of mzML files, which > > are signal processing and searches. Please stop equating OLS with > > mzML (or any other ML) since for most uses outside of a repository it > > just won't work. -a >=20 >=20 > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |