From: Kessner, D. E. <Dar...@cs...> - 2008-02-14 02:20:13
|
1) exact_synonym same as name (except for capitalization): [Term] id: MS:1000114 name: microchannel plate detector def: ... exact_synonym: "Microchannel Plate Detector" [] exact_synonym: "multichannel plate" [] is_a: MS:1000026 ! detector type 2) near name collision with exact_synonym: [Term] id: MS:1000270 name: multiple stage mass spectrometry def: ... exact_synonym: "MSn" [] is_a: MS:1000445 ! sequential m/z separation method ? [Term] id: MS:1000580 name: MSn spectrum def: ... exact_synonym: "Multiple-Stage Mass Spectrometry" [] is_a: MS:1000524 ! data file content is_a: MS:1000559 ! spectrum type 3) some term names have ? at end -- I assume this is to flag for reconsideration Darren Darren Kessner Scientific Programmer Dar...@cs... 310-423-9538 Spielberg Family Center for Applied Proteomics Cedars-Sinai Medical Center http://www.sfcap.cshs.org/ IMPORTANT WARNING: This message is intended for the use of the person or entity to which it is addressed and may contain information that is privileged and confidential, the disclosure of which is governed by applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this information is STRICTLY PROHIBITED. If you have received this message in error, please notify us immediately by calling (310) 423-6428 and destroy the related message. Thank You for your cooperation. |
From: Lennart M. <len...@eb...> - 2008-02-14 11:23:28
|
Hi Darren, > 1) exact_synonym same as name (except for capitalization): > > [Term] > id: MS:1000114 > name: microchannel plate detector > def: ... > exact_synonym: "Microchannel Plate Detector" [] > exact_synonym: "multichannel plate" [] > is_a: MS:1000026 ! detector type I believe this one is intentional so I'm unsure about whether to record it as 'to correct' (thoughts anyone?), but the rest is certainly flagged now! Thanks! Cheers, lnnrt. > > > > > > 2) near name collision with exact_synonym: > > > > [Term] > > id: MS:1000270 > > name: multiple stage mass spectrometry > > def: ... > > exact_synonym: "MSn" [] > > is_a: MS:1000445 ! sequential m/z separation method ? > > > > [Term] > > id: MS:1000580 > > name: MSn spectrum > > def: ... > > exact_synonym: "Multiple-Stage Mass Spectrometry" [] > > is_a: MS:1000524 ! data file content > > is_a: MS:1000559 ! spectrum type > > > > > > 3) some term names have ? at end -- I assume this is to flag for > reconsideration |
From: Fredrik L. <Fre...@im...> - 2008-02-14 13:51:20
|
Hi All, In the Proteios platform we're including converters from some peak list formats to mzData, and now also to mzML. It is clearly not optimal with such conversion since instrument settings etcetera are lost. However, I guess there will be need for such converters if someone wants to use their old instruments with manufacturer peak picking algorithms. There are sample files generated from DTAs and ProteinLynx by the converters (0.99.1) at: http://trac.thep.lu.se/trac/fp6-prodac/browser/trunk/mzML The converters will be part of the new release of the Proteios Software Environment, but if anyone would like to try them with their files, there is a standalone package (mzMLconverters.zip) at the address above which should work under Windows/Linux/OSX with Java 1.5 or higher. Please notice that the output files are not schematically valid since some terms are still missing in the CV. For the conversion of multiple DTA files to one mzML file there is a small problem which is related to how lcq_dta generates dta files: If the charge state of the precursor can not be determined, a spectrum can result in two DTA files which are identical apart from the precursor. There are two solutions on how to handle this: 1) Two spectra, with the same scanNumber but different spectrum Ids (The solution used by the current converter) 2) One spectrum, two precursors. However, this will not work with the current schema since there can only be one sourceFileRef for a spectrum. Do you all think solution 1 is fine, or is there a better solution? Solution 2 seems to need schema changes. Other comments are also welcome Thanks, Fredrik |
From: Rune S. P. <mai...@ph...> - 2008-02-14 15:10:55
|
Isn't it required that scan numbers are unique and increasing within a run? Is it necessary for your scan numbers to be the same? -- Rune Fredrik Levander wrote: > For the conversion of multiple DTA files to one mzML file there is a > small problem which is related to how lcq_dta generates dta files: If > the charge state of the precursor can not be determined, a spectrum can > result in two DTA files which are identical apart from the precursor. > There are two solutions on how to handle this: > 1) Two spectra, with the same scanNumber but different spectrum Ids (The > solution used by the current converter) > 2) One spectrum, two precursors. However, this will not work with the > current schema since there can only be one sourceFileRef for a spectrum. > Do you all think solution 1 is fine, or is there a better solution? > Solution 2 seems to need schema changes. > Other comments are also welcome > |
From: Matthew C. <mat...@va...> - 2008-02-14 15:28:25
|
Just saw what may be an error in either the documentation or in the schema: <xs:complexType name="SourceFileType"> has an id attribute: <xs:attribute name="id" type="xs:string" use="required"> <xs:attribute name="sourceFileRef" type="xs:anyURI" use="optional"> is supposed to be a URI, but it should reference the 'id' which is a string? That doesn't make sense. <xs:documentation>This attribute can optionally reference the 'id' of the appropriate SourceFileType.</xs:documentation> Actually, looking a little closer, a lot of the Ref types (but not all) use xs:anyURI to point to an id attribute which is a string. What is the rationale for this? Out-of-file referencing? -Matt |
From: Eric D. <ede...@sy...> - 2008-02-19 06:41:11
|
Hi Matt, I don't know the answer to this exactly, but I will say that back after the EBI workshop most ids and refs an xs:anyURI and XMLSpy was happy to allow that "1" was an xs:anyURI, but my later attempts to validate the file with Xerces yielded angry errors that "1" was not an xs:anyURI, and I changed some things back to string. So I don't know the answer here is, but before we start making more things xs:anyURI, let's please test the Xerces validator to make sure it's okay or we're okay with how those attributes are filled. An item for the to do list, I guess. Thanks, Eric > -----Original Message----- > From: psi...@li... [mailto:psidev-ms-dev- > bo...@li...] On Behalf Of Matthew Chambers > Sent: Thursday, February 14, 2008 7:28 AM > To: Mass spectrometry standard development > Subject: [Psidev-ms-dev] Schema error for SoftwareType > > Just saw what may be an error in either the documentation or in the > schema: > > <xs:complexType name="SourceFileType"> has an id attribute: > <xs:attribute name="id" type="xs:string" use="required"> > > <xs:attribute name="sourceFileRef" type="xs:anyURI" use="optional"> is > supposed to be a URI, but it should reference the 'id' which is a string? > That doesn't make sense. > <xs:documentation>This attribute can optionally reference the 'id' of the > appropriate SourceFileType.</xs:documentation> > > Actually, looking a little closer, a lot of the Ref types (but not all) > use xs:anyURI to point to an id attribute which is a string. What is the > rationale for this? Out-of-file referencing? > > > -Matt > > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Fredrik L. <Fre...@im...> - 2008-02-14 17:55:27
|
Hi Matt and Rune, Thanks for the comments. I agree that the important information is the scan number, since this is what you would like to look up in the raw data file. And it doesn't make much sense to have the scan repeated twice in the file, so I think we'll go for solution 2 and just keep the sourceFileRef to one of the files. However, since we do have unique spectrum ids there should not be any real need to stick to the unique scan number requirement from what I got from the indexing discussion, even if it is still in the specs (?). Couldn't there be cases when data is collected in different channels where the scan numbers are the same in different channels? Regards Fredrik Matthew Chambers skrev: > Hi Fredrik, > > Our group has a converter that does this conversion (to mzXML or mzData > currently, not yet mzML, but they all have the same uniqueness > constraints on scan numbers and they all support multiple precursors at > least in theory); we went with solution 2 because solution 1 is invalid > for all the XML formats (i.e. it would need a schema change and that > change isn't likely to happen, whereas multiple sourceFileRefs would be > understandable). As I understand it, sourceFileRef is optional > ("<xs:attribute name="sourceFileRef" type="xs:anyURI" use="optional">"), > so if you can't or don't want to encode it correctly, just don't include > it. Our converter doesn't even bother to include the sourceFileRefs to > the DTAs, it's not helpful information IMO. As long as the conversion is > done without data loss, get it over with and then have mercy on your > filesystem by deleting the DTAs. ;) > > -Matt > > > Fredrik Levander wrote: > >> Hi All, >> >> In the Proteios platform we're including converters from some peak list >> formats to mzData, and now also to mzML. It is clearly not optimal with >> such conversion since instrument settings etcetera are lost. However, I >> guess there will be need for such converters if someone wants to use >> their old instruments with manufacturer peak picking algorithms. >> >> There are sample files generated from DTAs and ProteinLynx by the >> converters (0.99.1) at: >> http://trac.thep.lu.se/trac/fp6-prodac/browser/trunk/mzML >> >> The converters will be part of the new release of the Proteios Software >> Environment, but if anyone would like to try them with their files, >> there is a standalone package (mzMLconverters.zip) at the address above >> which should work under Windows/Linux/OSX with Java 1.5 or higher. >> >> Please notice that the output files are not schematically valid since >> some terms are still missing in the CV. >> >> For the conversion of multiple DTA files to one mzML file there is a >> small problem which is related to how lcq_dta generates dta files: If >> the charge state of the precursor can not be determined, a spectrum can >> result in two DTA files which are identical apart from the precursor. >> There are two solutions on how to handle this: >> 1) Two spectra, with the same scanNumber but different spectrum Ids (The >> solution used by the current converter) >> 2) One spectrum, two precursors. However, this will not work with the >> current schema since there can only be one sourceFileRef for a spectrum. >> Do you all think solution 1 is fine, or is there a better solution? >> Solution 2 seems to need schema changes. >> Other comments are also welcome >> >> Thanks, >> >> Fredrik >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> >> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Eric D. <ede...@sy...> - 2008-02-19 09:14:37
|
> -----Original Message----- > From: psi...@li... [mailto:psidev-ms-dev- > bo...@li...] On Behalf Of Lennart Martens > Sent: Thursday, February 14, 2008 3:24 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] a few more CV name issues > > Hi Darren, > > > > 1) exact_synonym same as name (except for capitalization): > > > > [Term] > > id: MS:1000114 > > name: microchannel plate detector > > def: ... > > exact_synonym: "Microchannel Plate Detector" [] > > exact_synonym: "multichannel plate" [] > > is_a: MS:1000026 ! detector type > > I believe this one is intentional so I'm unsure about whether to record > it as 'to correct' (thoughts anyone?), but the rest is certainly flagged > now! I don't see why this would make sense. I think it should be deleted. I just did. Anyone yell if it should be restored. This brings up another issue though. At one point last year, we went through an effort to change all terms to lower case (except proper names like "Waters" and acronyms). But it looks like this change was not applied to exact_synonyms. It should be. Regarding Darren's: > id: MS:1000580 > name: MSn spectrum > def: ... > exact_synonym: "Multiple-Stage Mass Spectrometry" [] synonym collision, I changed to: exact_synonym: "multiple-stage mass spectrometry spectrum" [] Please let me know if you object. Yes, any term with a '?' should be revisited. Thanks, Eric > > Thanks! > > Cheers, > > lnnrt. > > > > > > > > > > > > 2) near name collision with exact_synonym: > > > > > > > > [Term] > > > > id: MS:1000270 > > > > name: multiple stage mass spectrometry > > > > def: ... > > > > exact_synonym: "MSn" [] > > > > is_a: MS:1000445 ! sequential m/z separation method ? > > > > > > > > [Term] > > > > id: MS:1000580 > > > > name: MSn spectrum > > > > def: ... > > > > exact_synonym: "Multiple-Stage Mass Spectrometry" [] > > > > is_a: MS:1000524 ! data file content > > > > is_a: MS:1000559 ! spectrum type > > > > > > > > > > > > 3) some term names have ? at end -- I assume this is to flag for > > reconsideration > > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Fredrik L. <Fre...@im...> - 2008-02-19 15:28:56
|
Hi All, In QTOF files from Waters with mixed MS1 and MS2 data we have several parallel 'functions' with data being recorded into separate files. The scan numbers are only unique within each function. In the raw data folder we thus have several different spectra with the same scan number (but different source files). When converting this into an mzML file it would be good to keep the original scan numbers which are useful for traceability, but to generate unique spectrum ids. I thus propose that the requirement for unique scanNumbers within an mzML file is removed. However, spectra should not be repeated within the file, so this would NOT be applicable to the dta to mzML conversion use case. Would such a change generate problems for the readers? How is this solved in MassWolf? Regards Fredrik |
From: Matthew C. <mat...@va...> - 2008-02-19 15:37:35
|
How do you feel about generating arbitrary unique scan numbers and then using the id attribute to preserve the original filename and scan number: <spectrum id="function1.1" scanNumber="1" ...> <spectrum id="function1.2" scanNumber="2" ...> ... <spectrum id="function2.1" scanNumber="100" ...> <spectrum id="function2.2" scanNumber="101" ...> ... Or probably more intuitive would be to store the parallel spectra sequentially (assuming that the same scan number from each function is correlated): <spectrum id="function1.1" scanNumber="1" ...> <spectrum id="function2.1" scanNumber="2" ...> ... <spectrum id="function1.2" scanNumber="100" ...> <spectrum id="function2.2" scanNumber="101" ...> ... It's either that or store each function in a separate mzML file, because mzML doesn't support multiple runs in the same file. -Matt Fredrik Levander wrote: > Hi All, > > In QTOF files from Waters with mixed MS1 and MS2 data we have several > parallel 'functions' with data being recorded into separate files. The > scan numbers are only unique within each function. In the raw data > folder we thus have several different spectra with the same scan number > (but different source files). When converting this into an mzML file it > would be good to keep the original scan numbers which are useful for > traceability, but to generate unique spectrum ids. I thus propose that > the requirement for unique scanNumbers within an mzML file is removed. > However, spectra should not be repeated within the file, so this would > NOT be applicable to the dta to mzML conversion use case. > Would such a change generate problems for the readers? > How is this solved in MassWolf? > > > Regards > > Fredrik > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > |
From: Coleman, M. <MK...@St...> - 2008-02-19 15:57:43
|
I don't understand the issues involved in this particular question, but it reminds me of this key requirement: - There has to be a way of generating a unique key for each spectrum (i.e., unique across all spectra in the file) that will work for all mzML files. In the example below, it looks like that key is the 2-tuple "(id, scanNumber)". (Whatever the key is, it should be specified as such in the standard.) If the key includes any numeric fields, it needs to be specified whether or not (say) "0010" is equal to "10", whether or not "1.0" is equal to "1", and whether or not "-0" is equal to "0". Hopefully either (a) the former is simply disallowed in all of these cases or (b) all fields are to be treated as strings, rather than numbers, and comparison done on that basis. Mike > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On > Behalf Of Matthew Chambers > Sent: Tuesday, February 19, 2008 9:37 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] Unique scan numbers > > > How do you feel about generating arbitrary unique scan > numbers and then > using the id attribute to preserve the original filename and > scan number: > <spectrum id="function1.1" scanNumber="1" ...> > <spectrum id="function1.2" scanNumber="2" ...> > ... > <spectrum id="function2.1" scanNumber="100" ...> > <spectrum id="function2.2" scanNumber="101" ...> > ... > > Or probably more intuitive would be to store the parallel spectra > sequentially (assuming that the same scan number from each > function is > correlated): > <spectrum id="function1.1" scanNumber="1" ...> > <spectrum id="function2.1" scanNumber="2" ...> > ... > <spectrum id="function1.2" scanNumber="100" ...> > <spectrum id="function2.2" scanNumber="101" ...> > ... > > It's either that or store each function in a separate mzML > file, because > mzML doesn't support multiple runs in the same file. > > -Matt > > > Fredrik Levander wrote: > > Hi All, > > > > In QTOF files from Waters with mixed MS1 and MS2 data we > have several > > parallel 'functions' with data being recorded into separate > files. The > > scan numbers are only unique within each function. In the raw data > > folder we thus have several different spectra with the same > scan number > > (but different source files). When converting this into an > mzML file it > > would be good to keep the original scan numbers which are > useful for > > traceability, but to generate unique spectrum ids. I thus > propose that > > the requirement for unique scanNumbers within an mzML file > is removed. > > However, spectra should not be repeated within the file, so > this would > > NOT be applicable to the dta to mzML conversion use case. > > Would such a change generate problems for the readers? > > How is this solved in MassWolf? > > > > > > Regards > > > > Fredrik > > > > > -------------------------------------------------------------- > ----------- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > > -------------------------------------------------------------- > ----------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Matthew C. <mat...@va...> - 2008-02-19 16:06:55
|
Hi Michael, As it currently stands, both scanNumber and id are unique keys to a spectrum - they need not be combined to create a unique key. Id is a string and as such should be compared on a lexicographical basis (if that isn't stated in the spec, it should be), and scanNumber is an integer: <xs:attribute name="scanNumber" type="xs:int" use="required"> By the way, I think we should change that type to be xs:positiveInteger so that the range is schematically limited to [1-infinity). 0 shouldn't be a valid scan number (if 0 is allowed then Michael's point about the -0 and 0 issue should be addressed, although that might be done by the XML Schema specification). -Matt Coleman, Michael wrote: > I don't understand the issues involved in this particular question, but > it reminds me of this key requirement: > > - There has to be a way of generating a unique key for each spectrum > (i.e., unique across all spectra in the file) that will work for all > mzML files. > > In the example below, it looks like that key is the 2-tuple "(id, > scanNumber)". (Whatever the key is, it should be specified as such in > the standard.) > > > If the key includes any numeric fields, it needs to be specified whether > or not (say) "0010" is equal to "10", whether or not "1.0" is equal to > "1", and whether or not "-0" is equal to "0". Hopefully either (a) the > former is simply disallowed in all of these cases or (b) all fields are > to be treated as strings, rather than numbers, and comparison done on > that basis. > > Mike > > > > >> -----Original Message----- >> From: psi...@li... >> [mailto:psi...@li...] On >> Behalf Of Matthew Chambers >> Sent: Tuesday, February 19, 2008 9:37 AM >> To: Mass spectrometry standard development >> Subject: Re: [Psidev-ms-dev] Unique scan numbers >> >> >> How do you feel about generating arbitrary unique scan >> numbers and then >> using the id attribute to preserve the original filename and >> scan number: >> <spectrum id="function1.1" scanNumber="1" ...> >> <spectrum id="function1.2" scanNumber="2" ...> >> ... >> <spectrum id="function2.1" scanNumber="100" ...> >> <spectrum id="function2.2" scanNumber="101" ...> >> ... >> >> Or probably more intuitive would be to store the parallel spectra >> sequentially (assuming that the same scan number from each >> function is >> correlated): >> <spectrum id="function1.1" scanNumber="1" ...> >> <spectrum id="function2.1" scanNumber="2" ...> >> ... >> <spectrum id="function1.2" scanNumber="100" ...> >> <spectrum id="function2.2" scanNumber="101" ...> >> ... >> >> It's either that or store each function in a separate mzML >> file, because >> mzML doesn't support multiple runs in the same file. >> >> -Matt >> >> >> Fredrik Levander wrote: >> >>> Hi All, >>> >>> In QTOF files from Waters with mixed MS1 and MS2 data we >>> >> have several >> >>> parallel 'functions' with data being recorded into separate >>> >> files. The >> >>> scan numbers are only unique within each function. In the raw data >>> folder we thus have several different spectra with the same >>> >> scan number >> >>> (but different source files). When converting this into an >>> >> mzML file it >> >>> would be good to keep the original scan numbers which are >>> >> useful for >> >>> traceability, but to generate unique spectrum ids. I thus >>> >> propose that >> >>> the requirement for unique scanNumbers within an mzML file >>> >> is removed. >> >>> However, spectra should not be repeated within the file, so >>> >> this would >> >>> NOT be applicable to the dta to mzML conversion use case. >>> Would such a change generate problems for the readers? >>> How is this solved in MassWolf? >>> >>> >>> Regards >>> >>> Fredrik >>> >>> >>> >> -------------------------------------------------------------- >> ----------- >> >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> Psidev-ms-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>> >>> >>> >> -------------------------------------------------------------- >> ----------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > |
From: Coleman, M. <MK...@St...> - 2008-02-19 16:26:07
|
Is there a requirement that scanNumber and id be "co-ordered" (if I sort a file's spectra by one of these keys, the other key will then necessarily also be sorted)? Is there a requirement that the spectra in all mzML files be ordered by one or both of these keys? If scanNumber is being used for ordering in one of these ways, I agree that lexicographic ordering should be specified. If not, I'm wondering whether there is any other reason for specifying the ordering. If scanNumbers were specified to be contiguous, I'd say we ought to allow 0 as a scan number, since essentially all modern programming languages use 0-based arrays. But if I understand correctly, scanNumbers need not be contiguous (and thus programmers should not assume that they can be directly used for array indexing). Are scan numbers up to at least 2**62 or so allowed, to prepare for the coming ten-billion-spectrum runs? :-) Mike > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On > Behalf Of Matthew Chambers > Sent: Tuesday, February 19, 2008 10:07 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] Unique scan numbers > > > Hi Michael, > > As it currently stands, both scanNumber and id are unique keys to a > spectrum - they need not be combined to create a unique key. Id is a > string and as such should be compared on a lexicographical basis (if > that isn't stated in the spec, it should be), and scanNumber > is an integer: > <xs:attribute name="scanNumber" type="xs:int" use="required"> > > By the way, I think we should change that type to be > xs:positiveInteger > so that the range is schematically limited to [1-infinity). 0 > shouldn't > be a valid scan number (if 0 is allowed then Michael's point > about the > -0 and 0 issue should be addressed, although that might be > done by the > XML Schema specification). > > -Matt > > > Coleman, Michael wrote: > > I don't understand the issues involved in this particular > question, but > > it reminds me of this key requirement: > > > > - There has to be a way of generating a unique key for each spectrum > > (i.e., unique across all spectra in the file) that will work for all > > mzML files. > > > > In the example below, it looks like that key is the 2-tuple "(id, > > scanNumber)". (Whatever the key is, it should be specified > as such in > > the standard.) > > > > > > If the key includes any numeric fields, it needs to be > specified whether > > or not (say) "0010" is equal to "10", whether or not "1.0" > is equal to > > "1", and whether or not "-0" is equal to "0". Hopefully > either (a) the > > former is simply disallowed in all of these cases or (b) > all fields are > > to be treated as strings, rather than numbers, and > comparison done on > > that basis. > > > > Mike > > > > > > > > > >> -----Original Message----- > >> From: psi...@li... > >> [mailto:psi...@li...] On > >> Behalf Of Matthew Chambers > >> Sent: Tuesday, February 19, 2008 9:37 AM > >> To: Mass spectrometry standard development > >> Subject: Re: [Psidev-ms-dev] Unique scan numbers > >> > >> > >> How do you feel about generating arbitrary unique scan > >> numbers and then > >> using the id attribute to preserve the original filename and > >> scan number: > >> <spectrum id="function1.1" scanNumber="1" ...> > >> <spectrum id="function1.2" scanNumber="2" ...> > >> ... > >> <spectrum id="function2.1" scanNumber="100" ...> > >> <spectrum id="function2.2" scanNumber="101" ...> > >> ... > >> > >> Or probably more intuitive would be to store the parallel spectra > >> sequentially (assuming that the same scan number from each > >> function is > >> correlated): > >> <spectrum id="function1.1" scanNumber="1" ...> > >> <spectrum id="function2.1" scanNumber="2" ...> > >> ... > >> <spectrum id="function1.2" scanNumber="100" ...> > >> <spectrum id="function2.2" scanNumber="101" ...> > >> ... > >> > >> It's either that or store each function in a separate mzML > >> file, because > >> mzML doesn't support multiple runs in the same file. > >> > >> -Matt > >> > >> > >> Fredrik Levander wrote: > >> > >>> Hi All, > >>> > >>> In QTOF files from Waters with mixed MS1 and MS2 data we > >>> > >> have several > >> > >>> parallel 'functions' with data being recorded into separate > >>> > >> files. The > >> > >>> scan numbers are only unique within each function. In the > raw data > >>> folder we thus have several different spectra with the same > >>> > >> scan number > >> > >>> (but different source files). When converting this into an > >>> > >> mzML file it > >> > >>> would be good to keep the original scan numbers which are > >>> > >> useful for > >> > >>> traceability, but to generate unique spectrum ids. I thus > >>> > >> propose that > >> > >>> the requirement for unique scanNumbers within an mzML file > >>> > >> is removed. > >> > >>> However, spectra should not be repeated within the file, so > >>> > >> this would > >> > >>> NOT be applicable to the dta to mzML conversion use case. > >>> Would such a change generate problems for the readers? > >>> How is this solved in MassWolf? > >>> > >>> > >>> Regards > >>> > >>> Fredrik > >>> > >>> > >>> > >> -------------------------------------------------------------- > >> ----------- > >> > >>> This SF.net email is sponsored by: Microsoft > >>> Defy all challenges. Microsoft(R) Visual Studio 2008. > >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>> _______________________________________________ > >>> Psidev-ms-dev mailing list > >>> Psi...@li... > >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >>> > >>> > >>> > >> -------------------------------------------------------------- > >> ----------- > >> This SF.net email is sponsored by: Microsoft > >> Defy all challenges. Microsoft(R) Visual Studio 2008. > >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >> _______________________________________________ > >> Psidev-ms-dev mailing list > >> Psi...@li... > >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >> > >> > > > > > -------------------------------------------------------------- > ----------- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > > -------------------------------------------------------------- > ----------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Matthew C. <mat...@va...> - 2008-02-19 20:54:21
|
There is no requirement that an ascending lexicographical sort would produce the same order that a sort by ascending scanNumber would (that would be far too restrictive), but there is a requirement that the spectrum elements in the file be stored in ascending order by scanNumber. We had quite a bit of discussion in the teleconference today about removing scanNumber as a primary key and replacing it with an index attribute; such an attribute would probably have different semantics (0-based and contiguous). I think Eric is probably preparing to post a good summary of the discussion. My proposal for positiveInteger for scanNumber was accepted but then rendered moot by the proposal to get rid of the scanNumber attribute. ;) "positiveInteger" has no schematic upper limit, so perhaps if we switch to an index attribute we should make it "unsignedLong" (schematically defined as a 64-bit unsigned integer). -Matt Coleman, Michael wrote: > Is there a requirement that scanNumber and id be "co-ordered" (if I sort > a file's spectra by one of these keys, the other key will then > necessarily also be sorted)? > > Is there a requirement that the spectra in all mzML files be ordered by > one or both of these keys? > > If scanNumber is being used for ordering in one of these ways, I agree > that lexicographic ordering should be specified. If not, I'm wondering > whether there is any other reason for specifying the ordering. > > > If scanNumbers were specified to be contiguous, I'd say we ought to > allow 0 as a scan number, since essentially all modern programming > languages use 0-based arrays. But if I understand correctly, > scanNumbers need not be contiguous (and thus programmers should not > assume that they can be directly used for array indexing). > > Are scan numbers up to at least 2**62 or so allowed, to prepare for the > coming ten-billion-spectrum runs? :-) > > > Mike > > > > >> -----Original Message----- >> From: psi...@li... >> [mailto:psi...@li...] On >> Behalf Of Matthew Chambers >> Sent: Tuesday, February 19, 2008 10:07 AM >> To: Mass spectrometry standard development >> Subject: Re: [Psidev-ms-dev] Unique scan numbers >> >> >> Hi Michael, >> >> As it currently stands, both scanNumber and id are unique keys to a >> spectrum - they need not be combined to create a unique key. Id is a >> string and as such should be compared on a lexicographical basis (if >> that isn't stated in the spec, it should be), and scanNumber >> is an integer: >> <xs:attribute name="scanNumber" type="xs:int" use="required"> >> >> By the way, I think we should change that type to be >> xs:positiveInteger >> so that the range is schematically limited to [1-infinity). 0 >> shouldn't >> be a valid scan number (if 0 is allowed then Michael's point >> about the >> -0 and 0 issue should be addressed, although that might be >> done by the >> XML Schema specification). >> >> -Matt >> >> >> Coleman, Michael wrote: >> >>> I don't understand the issues involved in this particular >>> >> question, but >> >>> it reminds me of this key requirement: >>> >>> - There has to be a way of generating a unique key for each spectrum >>> (i.e., unique across all spectra in the file) that will work for all >>> mzML files. >>> >>> In the example below, it looks like that key is the 2-tuple "(id, >>> scanNumber)". (Whatever the key is, it should be specified >>> >> as such in >> >>> the standard.) >>> >>> >>> If the key includes any numeric fields, it needs to be >>> >> specified whether >> >>> or not (say) "0010" is equal to "10", whether or not "1.0" >>> >> is equal to >> >>> "1", and whether or not "-0" is equal to "0". Hopefully >>> >> either (a) the >> >>> former is simply disallowed in all of these cases or (b) >>> >> all fields are >> >>> to be treated as strings, rather than numbers, and >>> >> comparison done on >> >>> that basis. >>> >>> Mike >>> >>> >>> >>> >>> >>>> -----Original Message----- >>>> From: psi...@li... >>>> [mailto:psi...@li...] On >>>> Behalf Of Matthew Chambers >>>> Sent: Tuesday, February 19, 2008 9:37 AM >>>> To: Mass spectrometry standard development >>>> Subject: Re: [Psidev-ms-dev] Unique scan numbers >>>> >>>> >>>> How do you feel about generating arbitrary unique scan >>>> numbers and then >>>> using the id attribute to preserve the original filename and >>>> scan number: >>>> <spectrum id="function1.1" scanNumber="1" ...> >>>> <spectrum id="function1.2" scanNumber="2" ...> >>>> ... >>>> <spectrum id="function2.1" scanNumber="100" ...> >>>> <spectrum id="function2.2" scanNumber="101" ...> >>>> ... >>>> >>>> Or probably more intuitive would be to store the parallel spectra >>>> sequentially (assuming that the same scan number from each >>>> function is >>>> correlated): >>>> <spectrum id="function1.1" scanNumber="1" ...> >>>> <spectrum id="function2.1" scanNumber="2" ...> >>>> ... >>>> <spectrum id="function1.2" scanNumber="100" ...> >>>> <spectrum id="function2.2" scanNumber="101" ...> >>>> ... >>>> >>>> It's either that or store each function in a separate mzML >>>> file, because >>>> mzML doesn't support multiple runs in the same file. >>>> >>>> -Matt >>>> >>>> >>>> Fredrik Levander wrote: >>>> >>>> >>>>> Hi All, >>>>> >>>>> In QTOF files from Waters with mixed MS1 and MS2 data we >>>>> >>>>> >>>> have several >>>> >>>> >>>>> parallel 'functions' with data being recorded into separate >>>>> >>>>> >>>> files. The >>>> >>>> >>>>> scan numbers are only unique within each function. In the >>>>> >> raw data >> >>>>> folder we thus have several different spectra with the same >>>>> >>>>> >>>> scan number >>>> >>>> >>>>> (but different source files). When converting this into an >>>>> >>>>> >>>> mzML file it >>>> >>>> >>>>> would be good to keep the original scan numbers which are >>>>> >>>>> >>>> useful for >>>> >>>> >>>>> traceability, but to generate unique spectrum ids. I thus >>>>> >>>>> >>>> propose that >>>> >>>> >>>>> the requirement for unique scanNumbers within an mzML file >>>>> >>>>> >>>> is removed. >>>> >>>> >>>>> However, spectra should not be repeated within the file, so >>>>> >>>>> >>>> this would >>>> >>>> >>>>> NOT be applicable to the dta to mzML conversion use case. >>>>> Would such a change generate problems for the readers? >>>>> How is this solved in MassWolf? >>>>> >>>>> >>>>> Regards >>>>> >>>>> Fredrik >>>>> >>>>> >>>>> >>>>> >>>> -------------------------------------------------------------- >>>> ----------- >>>> >>>> >>>>> This SF.net email is sponsored by: Microsoft >>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>> _______________________________________________ >>>>> Psidev-ms-dev mailing list >>>>> Psi...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>>> >>>>> >>>>> >>>>> >>>> -------------------------------------------------------------- >>>> ----------- >>>> This SF.net email is sponsored by: Microsoft >>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>> _______________________________________________ >>>> Psidev-ms-dev mailing list >>>> Psi...@li... >>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>> >>>> >>>> >>> >> -------------------------------------------------------------- >> ----------- >> >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> Psidev-ms-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>> >>> >>> >> -------------------------------------------------------------- >> ----------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > |
From: Randy J. <rkj...@in...> - 2008-02-20 02:21:29
|
Based on today's discussion, I think the goal people had for scanNumber is better achieved using the acqNumber element. Actually, what people seemed to want was a place to put the Thermo-specific 'scan number' which allows you to go back to a raw file and see the scan in the vendor file. There is no reason why you couldn't put the scan number from Thermo as an 'index' (the acqNumber element would still allow a more accurate representation of the source of the spectrum, since it can handle the summation or averaging of spectra), and having a 'positiveInteger' for this makes great sense. The value of using a scan number as an index comes from the problem of trying to order spectra from non-LC experiments where acquisition time is meaningless. Thermo's 'scan number' works great for this, and we need something like this for the other vendor formats too. We should have more discussion on this, but it will probably help to have some examples - which should be available from the group soon. Randy -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Matthew Chambers Sent: Tuesday, February 19, 2008 3:54 PM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] Unique scan numbers There is no requirement that an ascending lexicographical sort would produce the same order that a sort by ascending scanNumber would (that would be far too restrictive), but there is a requirement that the spectrum elements in the file be stored in ascending order by scanNumber. We had quite a bit of discussion in the teleconference today about removing scanNumber as a primary key and replacing it with an index attribute; such an attribute would probably have different semantics (0-based and contiguous). I think Eric is probably preparing to post a good summary of the discussion. My proposal for positiveInteger for scanNumber was accepted but then rendered moot by the proposal to get rid of the scanNumber attribute. ;) "positiveInteger" has no schematic upper limit, so perhaps if we switch to an index attribute we should make it "unsignedLong" (schematically defined as a 64-bit unsigned integer). -Matt Coleman, Michael wrote: > Is there a requirement that scanNumber and id be "co-ordered" (if I sort > a file's spectra by one of these keys, the other key will then > necessarily also be sorted)? > > Is there a requirement that the spectra in all mzML files be ordered by > one or both of these keys? > > If scanNumber is being used for ordering in one of these ways, I agree > that lexicographic ordering should be specified. If not, I'm wondering > whether there is any other reason for specifying the ordering. > > > If scanNumbers were specified to be contiguous, I'd say we ought to > allow 0 as a scan number, since essentially all modern programming > languages use 0-based arrays. But if I understand correctly, > scanNumbers need not be contiguous (and thus programmers should not > assume that they can be directly used for array indexing). > > Are scan numbers up to at least 2**62 or so allowed, to prepare for the > coming ten-billion-spectrum runs? :-) > > > Mike > > > > >> -----Original Message----- >> From: psi...@li... >> [mailto:psi...@li...] On >> Behalf Of Matthew Chambers >> Sent: Tuesday, February 19, 2008 10:07 AM >> To: Mass spectrometry standard development >> Subject: Re: [Psidev-ms-dev] Unique scan numbers >> >> >> Hi Michael, >> >> As it currently stands, both scanNumber and id are unique keys to a >> spectrum - they need not be combined to create a unique key. Id is a >> string and as such should be compared on a lexicographical basis (if >> that isn't stated in the spec, it should be), and scanNumber >> is an integer: >> <xs:attribute name="scanNumber" type="xs:int" use="required"> >> >> By the way, I think we should change that type to be >> xs:positiveInteger >> so that the range is schematically limited to [1-infinity). 0 >> shouldn't >> be a valid scan number (if 0 is allowed then Michael's point >> about the >> -0 and 0 issue should be addressed, although that might be >> done by the >> XML Schema specification). >> >> -Matt >> >> >> Coleman, Michael wrote: >> >>> I don't understand the issues involved in this particular >>> >> question, but >> >>> it reminds me of this key requirement: >>> >>> - There has to be a way of generating a unique key for each spectrum >>> (i.e., unique across all spectra in the file) that will work for all >>> mzML files. >>> >>> In the example below, it looks like that key is the 2-tuple "(id, >>> scanNumber)". (Whatever the key is, it should be specified >>> >> as such in >> >>> the standard.) >>> >>> >>> If the key includes any numeric fields, it needs to be >>> >> specified whether >> >>> or not (say) "0010" is equal to "10", whether or not "1.0" >>> >> is equal to >> >>> "1", and whether or not "-0" is equal to "0". Hopefully >>> >> either (a) the >> >>> former is simply disallowed in all of these cases or (b) >>> >> all fields are >> >>> to be treated as strings, rather than numbers, and >>> >> comparison done on >> >>> that basis. >>> >>> Mike >>> >>> >>> >>> >>> >>>> -----Original Message----- >>>> From: psi...@li... >>>> [mailto:psi...@li...] On >>>> Behalf Of Matthew Chambers >>>> Sent: Tuesday, February 19, 2008 9:37 AM >>>> To: Mass spectrometry standard development >>>> Subject: Re: [Psidev-ms-dev] Unique scan numbers >>>> >>>> >>>> How do you feel about generating arbitrary unique scan >>>> numbers and then >>>> using the id attribute to preserve the original filename and >>>> scan number: >>>> <spectrum id="function1.1" scanNumber="1" ...> >>>> <spectrum id="function1.2" scanNumber="2" ...> >>>> ... >>>> <spectrum id="function2.1" scanNumber="100" ...> >>>> <spectrum id="function2.2" scanNumber="101" ...> >>>> ... >>>> >>>> Or probably more intuitive would be to store the parallel spectra >>>> sequentially (assuming that the same scan number from each >>>> function is >>>> correlated): >>>> <spectrum id="function1.1" scanNumber="1" ...> >>>> <spectrum id="function2.1" scanNumber="2" ...> >>>> ... >>>> <spectrum id="function1.2" scanNumber="100" ...> >>>> <spectrum id="function2.2" scanNumber="101" ...> >>>> ... >>>> >>>> It's either that or store each function in a separate mzML >>>> file, because >>>> mzML doesn't support multiple runs in the same file. >>>> >>>> -Matt >>>> >>>> >>>> Fredrik Levander wrote: >>>> >>>> >>>>> Hi All, >>>>> >>>>> In QTOF files from Waters with mixed MS1 and MS2 data we >>>>> >>>>> >>>> have several >>>> >>>> >>>>> parallel 'functions' with data being recorded into separate >>>>> >>>>> >>>> files. The >>>> >>>> >>>>> scan numbers are only unique within each function. In the >>>>> >> raw data >> >>>>> folder we thus have several different spectra with the same >>>>> >>>>> >>>> scan number >>>> >>>> >>>>> (but different source files). When converting this into an >>>>> >>>>> >>>> mzML file it >>>> >>>> >>>>> would be good to keep the original scan numbers which are >>>>> >>>>> >>>> useful for >>>> >>>> >>>>> traceability, but to generate unique spectrum ids. I thus >>>>> >>>>> >>>> propose that >>>> >>>> >>>>> the requirement for unique scanNumbers within an mzML file >>>>> >>>>> >>>> is removed. >>>> >>>> >>>>> However, spectra should not be repeated within the file, so >>>>> >>>>> >>>> this would >>>> >>>> >>>>> NOT be applicable to the dta to mzML conversion use case. >>>>> Would such a change generate problems for the readers? >>>>> How is this solved in MassWolf? >>>>> >>>>> >>>>> Regards >>>>> >>>>> Fredrik >>>>> >>>>> >>>>> >>>>> >>>> -------------------------------------------------------------- >>>> ----------- >>>> >>>> >>>>> This SF.net email is sponsored by: Microsoft >>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>> _______________________________________________ >>>>> Psidev-ms-dev mailing list >>>>> Psi...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>>> >>>>> >>>>> >>>>> >>>> -------------------------------------------------------------- >>>> ----------- >>>> This SF.net email is sponsored by: Microsoft >>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>> _______________________________________________ >>>> Psidev-ms-dev mailing list >>>> Psi...@li... >>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>> >>>> >>>> >>> >> -------------------------------------------------------------- >> ----------- >> >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> Psidev-ms-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>> >>> >>> >> -------------------------------------------------------------- >> ----------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> > > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > ------------------------------------------------------------------------ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Fredrik L. <Fre...@im...> - 2008-02-20 12:28:42
|
I like this proposal. This way there is only one place to look for scan numbers, and that will be in the the acquisitionList. This will however mean that mzML files with unprocessed data, like the massWolf output will have to add an acquisitionList with one acquisition element for every spectrum (scan), and that the description of acquisitionType will have to be edited to reflect that it is also usable for spectra (and not just peak lists). It makes sense to rename the current scanNumber to "index", which is what it would be. And yes, acqNumber should be acquisitionNumber, or maybe even better to just 'number' (like in the sourceFile attribute 'sourceFileName' which will change to 'name'). I've uploaded an edited peak list mzML file which has some of these changes, as an example for discussion: http://trac.thep.lu.se/trac/fp6-prodac/browser/trunk/mzML/FF_070504_MSMS_5B_edited.mzML At row 110-126 there is also an experiment with different ways to refer to the source for scans, including external referencing using URI. Fredrik Randy Julian wrote: > Based on today's discussion, I think the goal people had for scanNumber > is better achieved using the acqNumber element. Actually, what people > seemed to want was a place to put the Thermo-specific 'scan number' > which allows you to go back to a raw file and see the scan in the vendor > file. There is no reason why you couldn't put the scan number from > Thermo as an 'index' (the acqNumber element would still allow a more > accurate representation of the source of the spectrum, since it can > handle the summation or averaging of spectra), and having a > 'positiveInteger' for this makes great sense. > > The value of using a scan number as an index comes from the problem of > trying to order spectra from non-LC experiments where acquisition time > is meaningless. Thermo's 'scan number' works great for this, and we > need something like this for the other vendor formats too. > > We should have more discussion on this, but it will probably help to > have some examples - which should be available from the group soon. > > Randy > > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On Behalf Of > Matthew Chambers > Sent: Tuesday, February 19, 2008 3:54 PM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] Unique scan numbers > > There is no requirement that an ascending lexicographical sort would > produce the same order that a sort by ascending scanNumber would (that > would be far too restrictive), but there is a requirement that the > spectrum elements in the file be stored in ascending order by > scanNumber. > > We had quite a bit of discussion in the teleconference today about > removing scanNumber as a primary key and replacing it with an index > attribute; such an attribute would probably have different semantics > (0-based and contiguous). I think Eric is probably preparing to post a > good summary of the discussion. > > My proposal for positiveInteger for scanNumber was accepted but then > rendered moot by the proposal to get rid of the scanNumber attribute. ;) > > "positiveInteger" has no schematic upper limit, so perhaps if we switch > to an index attribute we should make it "unsignedLong" (schematically > defined as a 64-bit unsigned integer). > > -Matt > > > Coleman, Michael wrote: > >> Is there a requirement that scanNumber and id be "co-ordered" (if I >> > sort > >> a file's spectra by one of these keys, the other key will then >> necessarily also be sorted)? >> >> Is there a requirement that the spectra in all mzML files be ordered >> > by > >> one or both of these keys? >> >> If scanNumber is being used for ordering in one of these ways, I agree >> that lexicographic ordering should be specified. If not, I'm >> > wondering > >> whether there is any other reason for specifying the ordering. >> >> >> If scanNumbers were specified to be contiguous, I'd say we ought to >> allow 0 as a scan number, since essentially all modern programming >> languages use 0-based arrays. But if I understand correctly, >> scanNumbers need not be contiguous (and thus programmers should not >> assume that they can be directly used for array indexing). >> >> Are scan numbers up to at least 2**62 or so allowed, to prepare for >> > the > >> coming ten-billion-spectrum runs? :-) >> >> >> Mike >> >> >> >> >> >>> -----Original Message----- >>> From: psi...@li... >>> [mailto:psi...@li...] On >>> Behalf Of Matthew Chambers >>> Sent: Tuesday, February 19, 2008 10:07 AM >>> To: Mass spectrometry standard development >>> Subject: Re: [Psidev-ms-dev] Unique scan numbers >>> >>> >>> Hi Michael, >>> >>> As it currently stands, both scanNumber and id are unique keys to a >>> spectrum - they need not be combined to create a unique key. Id is a >>> string and as such should be compared on a lexicographical basis (if >>> that isn't stated in the spec, it should be), and scanNumber >>> is an integer: >>> <xs:attribute name="scanNumber" type="xs:int" use="required"> >>> >>> By the way, I think we should change that type to be >>> xs:positiveInteger >>> so that the range is schematically limited to [1-infinity). 0 >>> shouldn't >>> be a valid scan number (if 0 is allowed then Michael's point >>> about the >>> -0 and 0 issue should be addressed, although that might be >>> done by the >>> XML Schema specification). >>> >>> -Matt >>> >>> >>> Coleman, Michael wrote: >>> >>> >>>> I don't understand the issues involved in this particular >>>> >>>> >>> question, but >>> >>> >>>> it reminds me of this key requirement: >>>> >>>> - There has to be a way of generating a unique key for each spectrum >>>> (i.e., unique across all spectra in the file) that will work for all >>>> mzML files. >>>> >>>> In the example below, it looks like that key is the 2-tuple "(id, >>>> scanNumber)". (Whatever the key is, it should be specified >>>> >>>> >>> as such in >>> >>> >>>> the standard.) >>>> >>>> >>>> If the key includes any numeric fields, it needs to be >>>> >>>> >>> specified whether >>> >>> >>>> or not (say) "0010" is equal to "10", whether or not "1.0" >>>> >>>> >>> is equal to >>> >>> >>>> "1", and whether or not "-0" is equal to "0". Hopefully >>>> >>>> >>> either (a) the >>> >>> >>>> former is simply disallowed in all of these cases or (b) >>>> >>>> >>> all fields are >>> >>> >>>> to be treated as strings, rather than numbers, and >>>> >>>> >>> comparison done on >>> >>> >>>> that basis. >>>> >>>> Mike >>>> >>>> >>>> >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: psi...@li... >>>>> [mailto:psi...@li...] On >>>>> Behalf Of Matthew Chambers >>>>> Sent: Tuesday, February 19, 2008 9:37 AM >>>>> To: Mass spectrometry standard development >>>>> Subject: Re: [Psidev-ms-dev] Unique scan numbers >>>>> >>>>> >>>>> How do you feel about generating arbitrary unique scan >>>>> numbers and then >>>>> using the id attribute to preserve the original filename and >>>>> scan number: >>>>> <spectrum id="function1.1" scanNumber="1" ...> >>>>> <spectrum id="function1.2" scanNumber="2" ...> >>>>> ... >>>>> <spectrum id="function2.1" scanNumber="100" ...> >>>>> <spectrum id="function2.2" scanNumber="101" ...> >>>>> ... >>>>> >>>>> Or probably more intuitive would be to store the parallel spectra >>>>> sequentially (assuming that the same scan number from each >>>>> function is >>>>> correlated): >>>>> <spectrum id="function1.1" scanNumber="1" ...> >>>>> <spectrum id="function2.1" scanNumber="2" ...> >>>>> ... >>>>> <spectrum id="function1.2" scanNumber="100" ...> >>>>> <spectrum id="function2.2" scanNumber="101" ...> >>>>> ... >>>>> >>>>> It's either that or store each function in a separate mzML >>>>> file, because >>>>> mzML doesn't support multiple runs in the same file. >>>>> >>>>> -Matt >>>>> >>>>> >>>>> Fredrik Levander wrote: >>>>> >>>>> >>>>> >>>>>> Hi All, >>>>>> >>>>>> In QTOF files from Waters with mixed MS1 and MS2 data we >>>>>> >>>>>> >>>>>> >>>>> have several >>>>> >>>>> >>>>> >>>>>> parallel 'functions' with data being recorded into separate >>>>>> >>>>>> >>>>>> >>>>> files. The >>>>> >>>>> >>>>> >>>>>> scan numbers are only unique within each function. In the >>>>>> >>>>>> >>> raw data >>> >>> >>>>>> folder we thus have several different spectra with the same >>>>>> >>>>>> >>>>>> >>>>> scan number >>>>> >>>>> >>>>> >>>>>> (but different source files). When converting this into an >>>>>> >>>>>> >>>>>> >>>>> mzML file it >>>>> >>>>> >>>>> >>>>>> would be good to keep the original scan numbers which are >>>>>> >>>>>> >>>>>> >>>>> useful for >>>>> >>>>> >>>>> >>>>>> traceability, but to generate unique spectrum ids. I thus >>>>>> >>>>>> >>>>>> >>>>> propose that >>>>> >>>>> >>>>> >>>>>> the requirement for unique scanNumbers within an mzML file >>>>>> >>>>>> >>>>>> >>>>> is removed. >>>>> >>>>> >>>>> >>>>>> However, spectra should not be repeated within the file, so >>>>>> >>>>>> >>>>>> >>>>> this would >>>>> >>>>> >>>>> >>>>>> NOT be applicable to the dta to mzML conversion use case. >>>>>> Would such a change generate problems for the readers? >>>>>> How is this solved in MassWolf? >>>>>> >>>>>> >>>>>> Regards >>>>>> >>>>>> Fredrik >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> -------------------------------------------------------------- >>>>> ----------- >>>>> >>>>> >>>>> >>>>>> This SF.net email is sponsored by: Microsoft >>>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>>> _______________________________________________ >>>>>> Psidev-ms-dev mailing list >>>>>> Psi...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> -------------------------------------------------------------- >>>>> ----------- >>>>> This SF.net email is sponsored by: Microsoft >>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>> _______________________________________________ >>>>> Psidev-ms-dev mailing list >>>>> Psi...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> -------------------------------------------------------------- >>> ----------- >>> >>> >>>> This SF.net email is sponsored by: Microsoft >>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>> _______________________________________________ >>>> Psidev-ms-dev mailing list >>>> Psi...@li... >>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>> >>>> >>>> >>>> >>> -------------------------------------------------------------- >>> ----------- >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> Psidev-ms-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>> >>> >>> >> > ------------------------------------------------------------------------ > - > >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> >> > > ------------------------------------------------------------------------ > - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Rune S. P. <ru...@ph...> - 2008-02-20 07:32:25
|
Fredrik Levander wrote: > In QTOF files from Waters with mixed MS1 and MS2 data we have several > parallel 'functions' with data being recorded into separate files. The > scan numbers are only unique within each function. I thus propose that > the requirement for unique scanNumbers within an mzML file is removed. > How is this solved in MassWolf? > The information is not saved. New scan numbers are assigned. That is, you have to use the time (and precursor with ms^2) information in order to locate the original scan. -- Regards Rune |
From: Randy J. <rkj...@in...> - 2008-02-20 11:47:24
|
This is why I thought placing the non-unique scan number in the acquisition description section and change the meaning of the 'scanNumber' attribute since it cannot be correctly used with all instrument brands. Randy -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Rune Schjellerup Philosof Sent: Wednesday, February 20, 2008 2:32 AM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] Unique scan numbers Fredrik Levander wrote: > In QTOF files from Waters with mixed MS1 and MS2 data we have several > parallel 'functions' with data being recorded into separate files. The > scan numbers are only unique within each function. I thus propose that > the requirement for unique scanNumbers within an mzML file is removed. > How is this solved in MassWolf? > The information is not saved. New scan numbers are assigned. That is, you have to use the time (and precursor with ms^2) information in order to locate the original scan. -- Regards Rune ------------------------------------------------------------------------ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Joshua T. <jt...@sy...> - 2008-02-25 20:56:48
|
Hi Fredrik, Catching up: massWolf simply renumbers all scans starting with "1" in the mzXML output. Like I said in a different post, we'll be adding a "native scan reference" to mzXML for each vendor software type. Josh Fredrik Levander wrote: > Hi All, > > In QTOF files from Waters with mixed MS1 and MS2 data we have several > parallel 'functions' with data being recorded into separate files. The > scan numbers are only unique within each function. In the raw data > folder we thus have several different spectra with the same scan number > (but different source files). When converting this into an mzML file it > would be good to keep the original scan numbers which are useful for > traceability, but to generate unique spectrum ids. I thus propose that > the requirement for unique scanNumbers within an mzML file is removed. > However, spectra should not be repeated within the file, so this would > NOT be applicable to the dta to mzML conversion use case. > Would such a change generate problems for the readers? > How is this solved in MassWolf? > > > Regards > > Fredrik > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Matthew C. <mat...@va...> - 2008-03-03 16:25:31
|
Hi Josh, What design are you planning for the "native scan reference" in mzXML? It seems the same issues I just posted about in response to Darren will apply to the mzXML design as well. -Matt Joshua Tasman wrote: > Hi Fredrik, > > Catching up: massWolf simply renumbers all scans starting with "1" in the mzXML output. Like I said in a different post, we'll be adding a "native scan reference" to mzXML for each vendor software type. > > Josh > > > Fredrik Levander wrote: > >> Hi All, >> >> In QTOF files from Waters with mixed MS1 and MS2 data we have several >> parallel 'functions' with data being recorded into separate files. The >> scan numbers are only unique within each function. In the raw data >> folder we thus have several different spectra with the same scan number >> (but different source files). When converting this into an mzML file it >> would be good to keep the original scan numbers which are useful for >> traceability, but to generate unique spectrum ids. I thus propose that >> the requirement for unique scanNumbers within an mzML file is removed. >> However, spectra should not be repeated within the file, so this would >> NOT be applicable to the dta to mzML conversion use case. >> Would such a change generate problems for the readers? >> How is this solved in MassWolf? >> >> >> Regards >> >> Fredrik >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > |
From: Joshua T. <jt...@sy...> - 2008-03-03 18:39:16
|
Hi Matt, After the discussion here last week with you and Darren, it seemed an efficient way to deal with this would be to have each scan contain a string, and the header would have some description on how to parse this. Off the top of my head, you could have something in the head like: mzML-ish: in header: <nativeScanRefFormat> containing ordered list <cvParam with cv term for first axis /> <cvParam with cv term for first axis /> <cvParam with cv term for first axis /> </nativeScanRefFormat> in spectrum: a string represenation like "(1st,2nd,3rd)" mzXML-ish: header: <nativeScanRefFormat Vendor="VendorX"> containing ordered list <axis name="cycle"> <axis name="function"> <axis name="scan"> </nativeScanRefFormat> ... <scan ... nativeScanRef="(2,4,6") </scan> What do you think? Josh Matthew Chambers wrote: > Hi Josh, > > What design are you planning for the "native scan reference" in mzXML? > It seems the same issues I just posted about in response to Darren will > apply to the mzXML design as well. > > -Matt > > > Joshua Tasman wrote: >> Hi Fredrik, >> >> Catching up: massWolf simply renumbers all scans starting with "1" in the mzXML output. Like I said in a different post, we'll be adding a "native scan reference" to mzXML for each vendor software type. >> >> Josh >> >> >> Fredrik Levander wrote: >> >>> Hi All, >>> >>> In QTOF files from Waters with mixed MS1 and MS2 data we have several >>> parallel 'functions' with data being recorded into separate files. The >>> scan numbers are only unique within each function. In the raw data >>> folder we thus have several different spectra with the same scan number >>> (but different source files). When converting this into an mzML file it >>> would be good to keep the original scan numbers which are useful for >>> traceability, but to generate unique spectrum ids. I thus propose that >>> the requirement for unique scanNumbers within an mzML file is removed. >>> However, spectra should not be repeated within the file, so this would >>> NOT be applicable to the dta to mzML conversion use case. >>> Would such a change generate problems for the readers? >>> How is this solved in MassWolf? >>> >>> >>> Regards >>> >>> Fredrik >>> >>> ------------------------------------------------------------------------- >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> Psidev-ms-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Matthew C. <mat...@va...> - 2008-03-03 19:52:35
|
To be honest, for the mzML approach, I would prefer a single CV term describing the format and the axes it corresponds to. I see no reason to allow formats with arbitrary axes in a controlled nativeID system. I'm happy to restrict that capability to the arbitrary id string. Perhaps there is a reason though and I'm not seeing it. For mzXML, the axes definition block makes more sense to me. I would vote against flanking the id with parentheses though as that kind of makes them look like Cartesian coordinates. :) -Matt Joshua Tasman wrote: > Hi Matt, > > After the discussion here last week with you and Darren, it seemed an efficient way to deal with this would be to have each scan contain a string, and the header would have some description on how to parse this. > > Off the top of my head, you could have something in the head like: > > mzML-ish: > in header: > <nativeScanRefFormat> containing ordered list > <cvParam with cv term for first axis /> > <cvParam with cv term for first axis /> > <cvParam with cv term for first axis /> > </nativeScanRefFormat> > > in spectrum: a string represenation like "(1st,2nd,3rd)" > > mzXML-ish: > header: > <nativeScanRefFormat Vendor="VendorX"> containing ordered list > <axis name="cycle"> > <axis name="function"> > <axis name="scan"> > </nativeScanRefFormat> > > ... > <scan > ... > nativeScanRef="(2,4,6") > </scan> > > What do you think? > > Josh > > > > Matthew Chambers wrote: > >> Hi Josh, >> >> What design are you planning for the "native scan reference" in mzXML? >> It seems the same issues I just posted about in response to Darren will >> apply to the mzXML design as well. >> >> -Matt >> >> >> Joshua Tasman wrote: >> >>> Hi Fredrik, >>> >>> Catching up: massWolf simply renumbers all scans starting with "1" in the mzXML output. Like I said in a different post, we'll be adding a "native scan reference" to mzXML for each vendor software type. >>> >>> Josh >>> >>> >>> Fredrik Levander wrote: >>> >>> >>>> Hi All, >>>> >>>> In QTOF files from Waters with mixed MS1 and MS2 data we have several >>>> parallel 'functions' with data being recorded into separate files. The >>>> scan numbers are only unique within each function. In the raw data >>>> folder we thus have several different spectra with the same scan number >>>> (but different source files). When converting this into an mzML file it >>>> would be good to keep the original scan numbers which are useful for >>>> traceability, but to generate unique spectrum ids. I thus propose that >>>> the requirement for unique scanNumbers within an mzML file is removed. >>>> However, spectra should not be repeated within the file, so this would >>>> NOT be applicable to the dta to mzML conversion use case. >>>> Would such a change generate problems for the readers? >>>> How is this solved in MassWolf? >>>> >>>> >>>> Regards >>>> >>>> Fredrik >>>> >>>> ------------------------------------------------------------------------- >>>> This SF.net email is sponsored by: Microsoft >>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>> _______________________________________________ >>>> Psidev-ms-dev mailing list >>>> Psi...@li... >>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>> >>>> >>> ------------------------------------------------------------------------- >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> Psidev-ms-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>> >>> >>> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > |
From: Eric D. <ede...@sy...> - 2008-03-04 08:48:30
|
So it seems like the final suggestion is something like: <nativeScanRefFormat> <cvParam cvLabel="MS" accession="MS:1099580" name="Masswolf format nativeScanReference"/> </nativeScanRefFormat> <spectrum index="18" id="S2,4,6" > <scan nativeScanReference="2,4,6"> </scan> </spectrum> <offset index="18" id="S19" nativeScanReference="2,4,6">1234</offset> -------------- For Thermo, we would have: <nativeScanRefFormat> <cvParam cvLabel="MS" accession="MS:1099581" name="Thermo format nativeScanReference"/> </nativeScanRefFormat> <spectrum index="18" id="S19" > <scan nativeScanReference="19"> </scan> </spectrum> <offset index="18" id="S19" nativeScanReference="19">1234</offset> -------------- The one thing that concerns me is that there isn't much backward compatibility here. It would be nice to preserve one attribute that behaves in the same way it always did. If we made index start with 1 instead of 0, then that is probably as close to the traditional "scanNumber" as we can hope for. Would that offend you, Darren? -------------- My summary of the discussion goes like this: New thread on acquisitionNumbers - Darren posts example on what this would look like - Matt suggests that there should be no acquisitionNumber in <index> - Darren counters that having scanNumber aka acquisitionNumber in <index> is critical - Darren proffers: <offset id="S17" externalID="17">4826</offset> with externalID interpreted according to some other metadatum (original source file type, instrument vendor, something else...) - Matt: Why do you need to know scan number at open time? - Darren: the point is we *know* the scan number and need to seek to it - Mike brings up subsetting of one mzML file to another - Matt brings up the externalID idea - Darren offers a way to either annotate the assumption that externalID=scan number or have an optional scanNumber attribute. Neither is liked - Rune asks why preserving thermo scan number is important - Darren says that he has tools that also go back to a RAW file with a given scan number. So preserving the scan number is important. - Darren provides a dump of a huge number of obscure bits of configuration data availabel for each scan in Thermo RAW format. How to encode in mzML? cvParams? userParams? - Matt likes externalID instead of scanNumber and describes some possible naming conventions - Josh suggests that we need to encode a "native scan reference" to be able to go back to the vendor software - Darren agrees that native unique identifier needs to be in the index and the rest in <scan> - Josh suggests a slew of optional vendor-specific attributes in <index> - Darren is okay with that, but also suggests using cvParams in the <index> to define the meaning of externalID in a specific context, different for each vendor - Josh points out that some vendors have multi-part keys instead of a single one so this makes the above a lot trickier - Darren wonders if the multi-parts information needs to be in each <offset> tag or could be global to the index? <index externalIDTypeAccession1="cycle" externalIDTypeAccession2="scan"> <offset id="S19" externalID="(19,123)">4826</offset> - Josh agrees, although suggests nativeScanID - Darren votes for nativeID - Josh is fine with that - Matt seems in agreement - so the suggestions seemed to be: <index> <externalIDTypeList count="2"> <cvParam .../> <cvParam .../> </externalIDTypeList> ... <offset id="S19" nativeID="(19,123)">4826</offset> ... </index> -- Unique scan numbers thread - Josh points out that MassWolf just renumbers the scans But there will be a "native scan reference" in mzXML - Matt asks for details - Josh suggests something like: <nativeScanRefFormat> containing ordered list <cvParam cvLabel="MS" accession="MS:1099580" name="scan cycle number" value=""/> <cvParam cvLabel="MS" accession="MS:1099581" name="scan function number" value=""/> <cvParam cvLabel="MS" accession="MS:1099582" name="scan number" value=""/> </nativeScanRefFormat> <spectrum> <scan ... nativeScanRef="(2,4,6") </scan> </spectrum> - Matt suggests just having a single cvParam to describe "MassWolf nativeID format" -- Related to the above if a <scan> nativeID thread - Darren suggests: <offset id="S19" nativeID="19">1234</offset> It would also be convenient, and consistent, to have nativeID in <scan>: <spectrum index=0 id="S19" nativeID="19"> ... </spectrum> - Matt suggests that <offset> should be <spectrum_offset> - Darren says the important part of the discussion is that nativeID are both in <spectrum> and <index> - > -----Original Message----- > From: psi...@li... [mailto:psidev-ms-dev- > bo...@li...] On Behalf Of Matthew Chambers > Sent: Monday, March 03, 2008 11:52 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] Unique scan numbers > > To be honest, for the mzML approach, I would prefer a single CV term > describing the format and the axes it corresponds to. I see no reason to > allow formats with arbitrary axes in a controlled nativeID system. I'm > happy to restrict that capability to the arbitrary id string. Perhaps > there is a reason though and I'm not seeing it. > > For mzXML, the axes definition block makes more sense to me. I would > vote against flanking the id with parentheses though as that kind of > makes them look like Cartesian coordinates. :) > > -Matt > > > Joshua Tasman wrote: > > Hi Matt, > > > > After the discussion here last week with you and Darren, it seemed an > efficient way to deal with this would be to have each scan contain a > string, and the header would have some description on how to parse this. > > > > Off the top of my head, you could have something in the head like: > > > > mzML-ish: > > in header: > > <nativeScanRefFormat> containing ordered list > > <cvParam with cv term for first axis /> > > <cvParam with cv term for first axis /> > > <cvParam with cv term for first axis /> > > </nativeScanRefFormat> > > > > in spectrum: a string represenation like "(1st,2nd,3rd)" > > > > mzXML-ish: > > header: > > <nativeScanRefFormat Vendor="VendorX"> containing ordered list > > <axis name="cycle"> > > <axis name="function"> > > <axis name="scan"> > > </nativeScanRefFormat> > > > > ... > > <scan > > ... > > nativeScanRef="(2,4,6") > > </scan> > > > > What do you think? > > > > Josh > > > > > > > > Matthew Chambers wrote: > > > >> Hi Josh, > >> > >> What design are you planning for the "native scan reference" in mzXML? > >> It seems the same issues I just posted about in response to Darren will > >> apply to the mzXML design as well. > >> > >> -Matt > >> > >> > >> Joshua Tasman wrote: > >> > >>> Hi Fredrik, > >>> > >>> Catching up: massWolf simply renumbers all scans starting with "1" in > the mzXML output. Like I said in a different post, we'll be adding a > "native scan reference" to mzXML for each vendor software type. > >>> > >>> Josh > >>> > >>> > >>> Fredrik Levander wrote: > >>> > >>> > >>>> Hi All, > >>>> > >>>> In QTOF files from Waters with mixed MS1 and MS2 data we have several > >>>> parallel 'functions' with data being recorded into separate files. > The > >>>> scan numbers are only unique within each function. In the raw data > >>>> folder we thus have several different spectra with the same scan > number > >>>> (but different source files). When converting this into an mzML file > it > >>>> would be good to keep the original scan numbers which are useful for > >>>> traceability, but to generate unique spectrum ids. I thus propose > that > >>>> the requirement for unique scanNumbers within an mzML file is > removed. > >>>> However, spectra should not be repeated within the file, so this > would > >>>> NOT be applicable to the dta to mzML conversion use case. > >>>> Would such a change generate problems for the readers? > >>>> How is this solved in MassWolf? > >>>> > >>>> > >>>> Regards > >>>> > >>>> Fredrik > >>>> > >>>> --------------------------------------------------------------------- > ---- > >>>> This SF.net email is sponsored by: Microsoft > >>>> Defy all challenges. Microsoft(R) Visual Studio 2008. > >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>>> _______________________________________________ > >>>> Psidev-ms-dev mailing list > >>>> Psi...@li... > >>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >>>> > >>>> > >>> ---------------------------------------------------------------------- > --- > >>> This SF.net email is sponsored by: Microsoft > >>> Defy all challenges. Microsoft(R) Visual Studio 2008. > >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>> _______________________________________________ > >>> Psidev-ms-dev mailing list > >>> Psi...@li... > >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >>> > >>> > >>> > >> ----------------------------------------------------------------------- > -- > >> This SF.net email is sponsored by: Microsoft > >> Defy all challenges. Microsoft(R) Visual Studio 2008. > >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >> _______________________________________________ > >> Psidev-ms-dev mailing list > >> Psi...@li... > >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >> > > > > ------------------------------------------------------------------------ > - > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Rune S. P. <ru...@ph...> - 2008-03-04 10:29:28
|
Eric Deutsch wrote: > So it seems like the final suggestion is something like: > <spectrum index="18" id="S2,4,6" > > <scan nativeScanReference="2,4,6"> > </scan> > </spectrum> > that would be <spectrum><spectrumDescription><scan nativeScanReference="2,4,6"> This would work for cases when each spectrum refers to a single native scan. But aren't there support for having a spectrum that is for instance a combination of several native scans? [Term] id: MS:1000570 name: spectra combination def: "Method used to combine the mass spectra." [PSI:MS] relationship: part_of MS:1000442 ! spectrum What to do in this case? -- Rune |
From: Fredrik L. <Fre...@im...> - 2008-03-04 10:47:28
|
Hi Rune, In this case you will not have a <scan> element, or at least not the nativeScanReference, but rather an acquisitionList with acquisitions. Each of the acquisitions will have a scan reference, which is either a reference to an mzML scan, or to a vendor raw data file scan, if I get it right. Regards Fredrik Rune Schjellerup Philosof wrote: > Eric Deutsch wrote: > >> So it seems like the final suggestion is something like: >> <spectrum index="18" id="S2,4,6" > >> <scan nativeScanReference="2,4,6"> >> </scan> >> </spectrum> >> >> > > that would be > <spectrum><spectrumDescription><scan nativeScanReference="2,4,6"> > > This would work for cases when each spectrum refers to a single native scan. > But aren't there support for having a spectrum that is for instance a > combination of several native scans? > > [Term] > id: MS:1000570 > name: spectra combination > def: "Method used to combine the mass spectra." [PSI:MS] > relationship: part_of MS:1000442 ! spectrum > > > What to do in this case? > > > -- > Rune > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |