From: Fredrik L. <Fre...@im...> - 2008-03-04 11:31:19
|
I've just followed this thread briefly since I have no intention to generate these index files, so I should probably just keep quiet. Anyway, how would it be to enforce a naming convention for spectrum ids so that the native scans can be extracted in the case when there is just one scan as the source for the spectrum? I think this is what Matt proposes, and it seems fine to me. That way there is no need to change the main schema or idx. The spectrum id could be regex checked and that's all. For example: S1F1C1 would mean scan1, function 1, cycle 1, while S1 just means scan 1. One could add a regex check to the main schema for the spectrum id, for example "S[0-9]+(F[0-9])?" for scan and optional function. I am not a regex expert, but I am sure someone could come up with a regex which is at least rather future safe for native scan references, and which allows some local variants to encode other information in the spectrum ID. This could also be used for the acquisition external spectrum id ref. Regards Fredrik |
From: Darren K. <dke...@ya...> - 2008-03-04 13:00:52
|
Hi all, I believe that we (Matt, Josh, and I) were thinking of 'nativeID' as an attribute of <spectrum>, not <scan>, since that is what the <offset> refers to: <spectrum index=0 id="S19" nativeID="19"> <offset id="S19" nativeID="19">1234</offset> This nativeID can refer to a scan number (which can even be a cvParam in <scan>), or to acquisition numbers, depending on where the data array is coming from. I think Fredrik's idea of establishing a standard format with regex checking is a good one. I also think it's a bad idea to have the <spectrum> 'index' attribute be anything other than a zero-based index, and not just because it would offend my sense of computational morality ;) Having a 1-based index will definitely cause some bugs at some point in some software in the future, not to mention the grief it will cause the poor grad student who overlooks this fact and is up into the night debugging some script... I don't have any love for the traditional scan number, and 'nativeID' is enough backward compatibility for me. Darren |
From: Matthew C. <mat...@va...> - 2008-02-14 15:24:27
|
Hi Fredrik, Our group has a converter that does this conversion (to mzXML or mzData currently, not yet mzML, but they all have the same uniqueness constraints on scan numbers and they all support multiple precursors at least in theory); we went with solution 2 because solution 1 is invalid for all the XML formats (i.e. it would need a schema change and that change isn't likely to happen, whereas multiple sourceFileRefs would be understandable). As I understand it, sourceFileRef is optional ("<xs:attribute name="sourceFileRef" type="xs:anyURI" use="optional">"), so if you can't or don't want to encode it correctly, just don't include it. Our converter doesn't even bother to include the sourceFileRefs to the DTAs, it's not helpful information IMO. As long as the conversion is done without data loss, get it over with and then have mercy on your filesystem by deleting the DTAs. ;) -Matt Fredrik Levander wrote: > Hi All, > > In the Proteios platform we're including converters from some peak list > formats to mzData, and now also to mzML. It is clearly not optimal with > such conversion since instrument settings etcetera are lost. However, I > guess there will be need for such converters if someone wants to use > their old instruments with manufacturer peak picking algorithms. > > There are sample files generated from DTAs and ProteinLynx by the > converters (0.99.1) at: > http://trac.thep.lu.se/trac/fp6-prodac/browser/trunk/mzML > > The converters will be part of the new release of the Proteios Software > Environment, but if anyone would like to try them with their files, > there is a standalone package (mzMLconverters.zip) at the address above > which should work under Windows/Linux/OSX with Java 1.5 or higher. > > Please notice that the output files are not schematically valid since > some terms are still missing in the CV. > > For the conversion of multiple DTA files to one mzML file there is a > small problem which is related to how lcq_dta generates dta files: If > the charge state of the precursor can not be determined, a spectrum can > result in two DTA files which are identical apart from the precursor. > There are two solutions on how to handle this: > 1) Two spectra, with the same scanNumber but different spectrum Ids (The > solution used by the current converter) > 2) One spectrum, two precursors. However, this will not work with the > current schema since there can only be one sourceFileRef for a spectrum. > Do you all think solution 1 is fine, or is there a better solution? > Solution 2 seems to need schema changes. > Other comments are also welcome > > Thanks, > > Fredrik > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > |
From: Eric D. <ede...@sy...> - 2008-02-19 06:10:21
|
Hi everyone, regarding list dta to mzML conversion, here are my thoughts: 1) The current rule is that scanNumbers must be unique within a file and always increasing, although not necessarily sequentially. IDs must be unique within a file. I don't think should change for conversion from dta. 2) I would only encode the spectrum once, since as you say it is just one spectrum. 3) I don't even see why you need two precursors. When we convert dta to mzXML, duplicates were dropped and the actual observed precursor mass was put in the mzXML. Thus you are "losing" the information that the spectrum could be charge 2 or 3. However, this information was guessed in the first place, and most software I know that extracts a spectrum with no charge information will apply some rules to decide on what charges to search. So, I suggest that the conversion from dta to mzML is just the reverse of mzML to dta. One spectrum per scan. If only 1 charge (dta file) is provided, encode it at the user's discretion. If more than 1 charge (dta file) is provided, encode the spectrum without any charge information. For LCQ data, it would probably be reasonable to not encode *any* charge information in the mzML file at all. Because it doesn't come with any in the first place. We will be adding the functionality for multiple precursors anyway for the case when you have multiple peaks in your selection window as seen, e.g., in an orbitrap. I suppose there's no reason you couldn't take advantage of that to encode both the 2+ and 3+ although I wouldn't recommend it. Eric > -----Original Message----- > From: psi...@li... [mailto:psidev-ms-dev- > bo...@li...] On Behalf Of Fredrik Levander > Sent: Thursday, February 14, 2008 9:55 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] DTA to mzML conversion > > Hi Matt and Rune, > > Thanks for the comments. I agree that the important information is the > scan number, since this is what you would like to look up in the raw > data file. And it doesn't make much sense to have the scan repeated > twice in the file, so I think we'll go for solution 2 and just keep the > sourceFileRef to one of the files. > However, since we do have unique spectrum ids there should not be any > real need to stick to the unique scan number requirement from what I got > from the indexing discussion, even if it is still in the specs (?). > Couldn't there be cases when data is collected in different channels > where the scan numbers are the same in different channels? > > Regards > > Fredrik > > Matthew Chambers skrev: > > Hi Fredrik, > > > > Our group has a converter that does this conversion (to mzXML or mzData > > currently, not yet mzML, but they all have the same uniqueness > > constraints on scan numbers and they all support multiple precursors at > > least in theory); we went with solution 2 because solution 1 is invalid > > for all the XML formats (i.e. it would need a schema change and that > > change isn't likely to happen, whereas multiple sourceFileRefs would be > > understandable). As I understand it, sourceFileRef is optional > > ("<xs:attribute name="sourceFileRef" type="xs:anyURI" use="optional">"), > > so if you can't or don't want to encode it correctly, just don't include > > it. Our converter doesn't even bother to include the sourceFileRefs to > > the DTAs, it's not helpful information IMO. As long as the conversion is > > done without data loss, get it over with and then have mercy on your > > filesystem by deleting the DTAs. ;) > > > > -Matt > > > > > > Fredrik Levander wrote: > > > >> Hi All, > >> > >> In the Proteios platform we're including converters from some peak list > >> formats to mzData, and now also to mzML. It is clearly not optimal with > >> such conversion since instrument settings etcetera are lost. However, I > >> guess there will be need for such converters if someone wants to use > >> their old instruments with manufacturer peak picking algorithms. > >> > >> There are sample files generated from DTAs and ProteinLynx by the > >> converters (0.99.1) at: > >> http://trac.thep.lu.se/trac/fp6-prodac/browser/trunk/mzML > >> > >> The converters will be part of the new release of the Proteios Software > >> Environment, but if anyone would like to try them with their files, > >> there is a standalone package (mzMLconverters.zip) at the address above > >> which should work under Windows/Linux/OSX with Java 1.5 or higher. > >> > >> Please notice that the output files are not schematically valid since > >> some terms are still missing in the CV. > >> > >> For the conversion of multiple DTA files to one mzML file there is a > >> small problem which is related to how lcq_dta generates dta files: If > >> the charge state of the precursor can not be determined, a spectrum can > >> result in two DTA files which are identical apart from the precursor. > >> There are two solutions on how to handle this: > >> 1) Two spectra, with the same scanNumber but different spectrum Ids > (The > >> solution used by the current converter) > >> 2) One spectrum, two precursors. However, this will not work with the > >> current schema since there can only be one sourceFileRef for a > spectrum. > >> Do you all think solution 1 is fine, or is there a better solution? > >> Solution 2 seems to need schema changes. > >> Other comments are also welcome > >> > >> Thanks, > >> > >> Fredrik > >> > >> ----------------------------------------------------------------------- > -- > >> This SF.net email is sponsored by: Microsoft > >> Defy all challenges. Microsoft(R) Visual Studio 2008. > >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >> _______________________________________________ > >> Psidev-ms-dev mailing list > >> Psi...@li... > >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >> > >> > >> > > > > ------------------------------------------------------------------------ > - > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Fredrik L. <Fre...@im...> - 2008-02-19 15:05:36
|
Hi dta fans, I agree completely with 1 and 2. For 3 (several possible charge states), there seems to be two possibilities: a) Do not write the chargestate at all into the mzML in cases where there are multiple guesses. b) Put all the proposed values into one precursor. See line 206-207 at: http://trac.thep.lu.se/trac/fp6-prodac/browser/trunk/mzML/ADH071030_002.mzML?rev=26 Anyone else who would prefer either of a or b? At least some search engines would try both 2+ and 3+ if there is no charge state given in the file, so maybe solution a is better? Or does b have advantages? Fredrik Eric Deutsch wrote: > Hi everyone, regarding list dta to mzML conversion, here are my > thoughts: > > 1) The current rule is that scanNumbers must be unique within a file and > always increasing, although not necessarily sequentially. IDs must be > unique within a file. I don't think should change for conversion from > dta. > > 2) I would only encode the spectrum once, since as you say it is just > one spectrum. > > 3) I don't even see why you need two precursors. When we convert dta to > mzXML, duplicates were dropped and the actual observed precursor mass > was put in the mzXML. Thus you are "losing" the information that the > spectrum could be charge 2 or 3. However, this information was guessed > in the first place, and most software I know that extracts a spectrum > with no charge information will apply some rules to decide on what > charges to search. So, I suggest that the conversion from dta to mzML is > just the reverse of mzML to dta. One spectrum per scan. If only 1 charge > (dta file) is provided, encode it at the user's discretion. If more than > 1 charge (dta file) is provided, encode the spectrum without any charge > information. For LCQ data, it would probably be reasonable to not encode > *any* charge information in the mzML file at all. Because it doesn't > come with any in the first place. > > We will be adding the functionality for multiple precursors anyway for > the case when you have multiple peaks in your selection window as seen, > e.g., in an orbitrap. I suppose there's no reason you couldn't take > advantage of that to encode both the 2+ and 3+ although I wouldn't > recommend it. > > Eric > > > > >> -----Original Message----- >> From: psi...@li... >> > [mailto:psidev-ms-dev- > >> bo...@li...] On Behalf Of Fredrik Levander >> Sent: Thursday, February 14, 2008 9:55 AM >> To: Mass spectrometry standard development >> Subject: Re: [Psidev-ms-dev] DTA to mzML conversion >> >> Hi Matt and Rune, >> >> Thanks for the comments. I agree that the important information is the >> scan number, since this is what you would like to look up in the raw >> data file. And it doesn't make much sense to have the scan repeated >> twice in the file, so I think we'll go for solution 2 and just keep >> > the > >> sourceFileRef to one of the files. >> However, since we do have unique spectrum ids there should not be any >> real need to stick to the unique scan number requirement from what I >> > got > >> from the indexing discussion, even if it is still in the specs (?). >> Couldn't there be cases when data is collected in different channels >> where the scan numbers are the same in different channels? >> >> Regards >> >> Fredrik >> >> Matthew Chambers skrev: >> >>> Hi Fredrik, >>> >>> Our group has a converter that does this conversion (to mzXML or >>> > mzData > >>> currently, not yet mzML, but they all have the same uniqueness >>> constraints on scan numbers and they all support multiple precursors >>> > at > >>> least in theory); we went with solution 2 because solution 1 is >>> > invalid > >>> for all the XML formats (i.e. it would need a schema change and that >>> change isn't likely to happen, whereas multiple sourceFileRefs would >>> > be > >>> understandable). As I understand it, sourceFileRef is optional >>> ("<xs:attribute name="sourceFileRef" type="xs:anyURI" >>> > use="optional">"), > >>> so if you can't or don't want to encode it correctly, just don't >>> > include > >>> it. Our converter doesn't even bother to include the sourceFileRefs >>> > to > >>> the DTAs, it's not helpful information IMO. As long as the >>> > conversion is > >>> done without data loss, get it over with and then have mercy on your >>> filesystem by deleting the DTAs. ;) >>> >>> -Matt >>> >>> >>> Fredrik Levander wrote: >>> >>> >>>> Hi All, >>>> >>>> In the Proteios platform we're including converters from some peak >>>> > list > >>>> formats to mzData, and now also to mzML. It is clearly not optimal >>>> > with > >>>> such conversion since instrument settings etcetera are lost. >>>> > However, I > >>>> guess there will be need for such converters if someone wants to >>>> > use > >>>> their old instruments with manufacturer peak picking algorithms. >>>> >>>> There are sample files generated from DTAs and ProteinLynx by the >>>> converters (0.99.1) at: >>>> http://trac.thep.lu.se/trac/fp6-prodac/browser/trunk/mzML >>>> >>>> The converters will be part of the new release of the Proteios >>>> > Software > >>>> Environment, but if anyone would like to try them with their files, >>>> there is a standalone package (mzMLconverters.zip) at the address >>>> > above > >>>> which should work under Windows/Linux/OSX with Java 1.5 or higher. >>>> >>>> Please notice that the output files are not schematically valid >>>> > since > >>>> some terms are still missing in the CV. >>>> >>>> For the conversion of multiple DTA files to one mzML file there is >>>> > a > >>>> small problem which is related to how lcq_dta generates dta files: >>>> > If > >>>> the charge state of the precursor can not be determined, a spectrum >>>> > can > >>>> result in two DTA files which are identical apart from the >>>> > precursor. > >>>> There are two solutions on how to handle this: >>>> 1) Two spectra, with the same scanNumber but different spectrum Ids >>>> >> (The >> >>>> solution used by the current converter) >>>> 2) One spectrum, two precursors. However, this will not work with >>>> > the > >>>> current schema since there can only be one sourceFileRef for a >>>> >> spectrum. >> >>>> Do you all think solution 1 is fine, or is there a better solution? >>>> Solution 2 seems to need schema changes. >>>> Other comments are also welcome >>>> >>>> Thanks, >>>> >>>> Fredrik >>>> >>>> >>>> > ----------------------------------------------------------------------- > >> -- >> >>>> This SF.net email is sponsored by: Microsoft >>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>> _______________________________________________ >>>> Psidev-ms-dev mailing list >>>> Psi...@li... >>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>> >>>> >>>> >>>> >>> > ------------------------------------------------------------------------ > >> - >> >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> Psidev-ms-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>> >>> >> > ------------------------------------------------------------------------ > - > >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Coleman, M. <MK...@St...> - 2008-02-19 15:27:46
|
I'm strongly in favor of (b), i.e., keeping that charge state information. If the instrument software, or some other software upstream of the search engine has reason to believe that the charge for a particular spectrum is +2 or +3 but not +1, or +2 but not +1 or +3, or whatever, the search engine ought to be able to make use of this information. As a practical matter, the spectrum format we currently use here (ms2, very similar to dta) efficiently encodes this information, so not having it in mzML would be at least a minor argument for not converting. (We could, of course, simply duplicate the entire spectrum in this case, but this would further bloat the output, and still lose some important information.) Mike > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On > Behalf Of Fredrik Levander > Sent: Tuesday, February 19, 2008 9:04 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] DTA to mzML conversion > > > Hi dta fans, > > I agree completely with 1 and 2. For 3 (several possible > charge states), > there seems to be two possibilities: > a) Do not write the chargestate at all into the mzML in cases where > there are multiple guesses. > b) Put all the proposed values into one precursor. See line > 206-207 at: > http://trac.thep.lu.se/trac/fp6-prodac/browser/trunk/mzML/ADH0 > 71030_002.mzML?rev=26 > > Anyone else who would prefer either of a or b? At least some search > engines would try both 2+ and 3+ if there is no charge state given in > the file, so maybe solution a is better? Or does b have advantages? > > Fredrik > > Eric Deutsch wrote: > > Hi everyone, regarding list dta to mzML conversion, here are my > > thoughts: > > > > 1) The current rule is that scanNumbers must be unique > within a file and > > always increasing, although not necessarily sequentially. > IDs must be > > unique within a file. I don't think should change for > conversion from > > dta. > > > > 2) I would only encode the spectrum once, since as you say > it is just > > one spectrum. > > > > 3) I don't even see why you need two precursors. When we > convert dta to > > mzXML, duplicates were dropped and the actual observed > precursor mass > > was put in the mzXML. Thus you are "losing" the information that the > > spectrum could be charge 2 or 3. However, this information > was guessed > > in the first place, and most software I know that extracts > a spectrum > > with no charge information will apply some rules to decide on what > > charges to search. So, I suggest that the conversion from > dta to mzML is > > just the reverse of mzML to dta. One spectrum per scan. If > only 1 charge > > (dta file) is provided, encode it at the user's discretion. > If more than > > 1 charge (dta file) is provided, encode the spectrum > without any charge > > information. For LCQ data, it would probably be reasonable > to not encode > > *any* charge information in the mzML file at all. Because it doesn't > > come with any in the first place. > > > > We will be adding the functionality for multiple precursors > anyway for > > the case when you have multiple peaks in your selection > window as seen, > > e.g., in an orbitrap. I suppose there's no reason you couldn't take > > advantage of that to encode both the 2+ and 3+ although I wouldn't > > recommend it. > > > > Eric > > > > > > > > > >> -----Original Message----- > >> From: psi...@li... > >> > > [mailto:psidev-ms-dev- > > > >> bo...@li...] On Behalf Of Fredrik Levander > >> Sent: Thursday, February 14, 2008 9:55 AM > >> To: Mass spectrometry standard development > >> Subject: Re: [Psidev-ms-dev] DTA to mzML conversion > >> > >> Hi Matt and Rune, > >> > >> Thanks for the comments. I agree that the important > information is the > >> scan number, since this is what you would like to look up > in the raw > >> data file. And it doesn't make much sense to have the scan repeated > >> twice in the file, so I think we'll go for solution 2 and just keep > >> > > the > > > >> sourceFileRef to one of the files. > >> However, since we do have unique spectrum ids there should > not be any > >> real need to stick to the unique scan number requirement > from what I > >> > > got > > > >> from the indexing discussion, even if it is still in the specs (?). > >> Couldn't there be cases when data is collected in > different channels > >> where the scan numbers are the same in different channels? > >> > >> Regards > >> > >> Fredrik > >> > >> Matthew Chambers skrev: > >> > >>> Hi Fredrik, > >>> > >>> Our group has a converter that does this conversion (to mzXML or > >>> > > mzData > > > >>> currently, not yet mzML, but they all have the same uniqueness > >>> constraints on scan numbers and they all support multiple > precursors > >>> > > at > > > >>> least in theory); we went with solution 2 because solution 1 is > >>> > > invalid > > > >>> for all the XML formats (i.e. it would need a schema > change and that > >>> change isn't likely to happen, whereas multiple > sourceFileRefs would > >>> > > be > > > >>> understandable). As I understand it, sourceFileRef is optional > >>> ("<xs:attribute name="sourceFileRef" type="xs:anyURI" > >>> > > use="optional">"), > > > >>> so if you can't or don't want to encode it correctly, just don't > >>> > > include > > > >>> it. Our converter doesn't even bother to include the > sourceFileRefs > >>> > > to > > > >>> the DTAs, it's not helpful information IMO. As long as the > >>> > > conversion is > > > >>> done without data loss, get it over with and then have > mercy on your > >>> filesystem by deleting the DTAs. ;) > >>> > >>> -Matt > >>> > >>> > >>> Fredrik Levander wrote: > >>> > >>> > >>>> Hi All, > >>>> > >>>> In the Proteios platform we're including converters from > some peak > >>>> > > list > > > >>>> formats to mzData, and now also to mzML. It is clearly > not optimal > >>>> > > with > > > >>>> such conversion since instrument settings etcetera are lost. > >>>> > > However, I > > > >>>> guess there will be need for such converters if someone wants to > >>>> > > use > > > >>>> their old instruments with manufacturer peak picking algorithms. > >>>> > >>>> There are sample files generated from DTAs and ProteinLynx by the > >>>> converters (0.99.1) at: > >>>> http://trac.thep.lu.se/trac/fp6-prodac/browser/trunk/mzML > >>>> > >>>> The converters will be part of the new release of the Proteios > >>>> > > Software > > > >>>> Environment, but if anyone would like to try them with > their files, > >>>> there is a standalone package (mzMLconverters.zip) at the address > >>>> > > above > > > >>>> which should work under Windows/Linux/OSX with Java 1.5 > or higher. > >>>> > >>>> Please notice that the output files are not schematically valid > >>>> > > since > > > >>>> some terms are still missing in the CV. > >>>> > >>>> For the conversion of multiple DTA files to one mzML > file there is > >>>> > > a > > > >>>> small problem which is related to how lcq_dta generates > dta files: > >>>> > > If > > > >>>> the charge state of the precursor can not be determined, > a spectrum > >>>> > > can > > > >>>> result in two DTA files which are identical apart from the > >>>> > > precursor. > > > >>>> There are two solutions on how to handle this: > >>>> 1) Two spectra, with the same scanNumber but different > spectrum Ids > >>>> > >> (The > >> > >>>> solution used by the current converter) > >>>> 2) One spectrum, two precursors. However, this will not work with > >>>> > > the > > > >>>> current schema since there can only be one sourceFileRef for a > >>>> > >> spectrum. > >> > >>>> Do you all think solution 1 is fine, or is there a > better solution? > >>>> Solution 2 seems to need schema changes. > >>>> Other comments are also welcome > >>>> > >>>> Thanks, > >>>> > >>>> Fredrik > >>>> > >>>> > >>>> > > > -------------------------------------------------------------- > --------- > > > >> -- > >> > >>>> This SF.net email is sponsored by: Microsoft > >>>> Defy all challenges. Microsoft(R) Visual Studio 2008. > >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>>> _______________________________________________ > >>>> Psidev-ms-dev mailing list > >>>> Psi...@li... > >>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >>>> > >>>> > >>>> > >>>> > >>> > > > -------------------------------------------------------------- > ---------- > > > >> - > >> > >>> This SF.net email is sponsored by: Microsoft > >>> Defy all challenges. Microsoft(R) Visual Studio 2008. > >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>> _______________________________________________ > >>> Psidev-ms-dev mailing list > >>> Psi...@li... > >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >>> > >>> > >> > > > -------------------------------------------------------------- > ---------- > > - > > > >> This SF.net email is sponsored by: Microsoft > >> Defy all challenges. Microsoft(R) Visual Studio 2008. > >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >> _______________________________________________ > >> Psidev-ms-dev mailing list > >> Psi...@li... > >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >> > > > > > -------------------------------------------------------------- > ----------- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > -------------------------------------------------------------- > ----------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Matthew C. <mat...@va...> - 2008-02-19 15:44:27
|
Eh, I think we should leave it up to the implementor of the converter. Ideally the converter would be configurable to either keep the charge state information or discard it. In either case, the scan number would only appear as a single element. -Matt Coleman, Michael wrote: > I'm strongly in favor of (b), i.e., keeping that charge state > information. If the instrument software, or some other software > upstream of the search engine has reason to believe that the charge for > a particular spectrum is +2 or +3 but not +1, or +2 but not +1 or +3, or > whatever, the search engine ought to be able to make use of this > information. > > As a practical matter, the spectrum format we currently use here (ms2, > very similar to dta) efficiently encodes this information, so not having > it in mzML would be at least a minor argument for not converting. (We > could, of course, simply duplicate the entire spectrum in this case, but > this would further bloat the output, and still lose some important > information.) > > Mike > > > > > >> -----Original Message----- >> From: psi...@li... >> [mailto:psi...@li...] On >> Behalf Of Fredrik Levander >> Sent: Tuesday, February 19, 2008 9:04 AM >> To: Mass spectrometry standard development >> Subject: Re: [Psidev-ms-dev] DTA to mzML conversion >> >> >> Hi dta fans, >> >> I agree completely with 1 and 2. For 3 (several possible >> charge states), >> there seems to be two possibilities: >> a) Do not write the chargestate at all into the mzML in cases where >> there are multiple guesses. >> b) Put all the proposed values into one precursor. See line >> 206-207 at: >> http://trac.thep.lu.se/trac/fp6-prodac/browser/trunk/mzML/ADH0 >> 71030_002.mzML?rev=26 >> >> Anyone else who would prefer either of a or b? At least some search >> engines would try both 2+ and 3+ if there is no charge state given in >> the file, so maybe solution a is better? Or does b have advantages? >> >> Fredrik >> >> Eric Deutsch wrote: >> >>> Hi everyone, regarding list dta to mzML conversion, here are my >>> thoughts: >>> >>> 1) The current rule is that scanNumbers must be unique >>> >> within a file and >> >>> always increasing, although not necessarily sequentially. >>> >> IDs must be >> >>> unique within a file. I don't think should change for >>> >> conversion from >> >>> dta. >>> >>> 2) I would only encode the spectrum once, since as you say >>> >> it is just >> >>> one spectrum. >>> >>> 3) I don't even see why you need two precursors. When we >>> >> convert dta to >> >>> mzXML, duplicates were dropped and the actual observed >>> >> precursor mass >> >>> was put in the mzXML. Thus you are "losing" the information that the >>> spectrum could be charge 2 or 3. However, this information >>> >> was guessed >> >>> in the first place, and most software I know that extracts >>> >> a spectrum >> >>> with no charge information will apply some rules to decide on what >>> charges to search. So, I suggest that the conversion from >>> >> dta to mzML is >> >>> just the reverse of mzML to dta. One spectrum per scan. If >>> >> only 1 charge >> >>> (dta file) is provided, encode it at the user's discretion. >>> >> If more than >> >>> 1 charge (dta file) is provided, encode the spectrum >>> >> without any charge >> >>> information. For LCQ data, it would probably be reasonable >>> >> to not encode >> >>> *any* charge information in the mzML file at all. Because it doesn't >>> come with any in the first place. >>> >>> We will be adding the functionality for multiple precursors >>> >> anyway for >> >>> the case when you have multiple peaks in your selection >>> >> window as seen, >> >>> e.g., in an orbitrap. I suppose there's no reason you couldn't take >>> advantage of that to encode both the 2+ and 3+ although I wouldn't >>> recommend it. >>> >>> Eric >>> >>> >>> >>> >>> >>>> -----Original Message----- >>>> From: psi...@li... >>>> >>>> >>> [mailto:psidev-ms-dev- >>> >>> >>>> bo...@li...] On Behalf Of Fredrik Levander >>>> Sent: Thursday, February 14, 2008 9:55 AM >>>> To: Mass spectrometry standard development >>>> Subject: Re: [Psidev-ms-dev] DTA to mzML conversion >>>> >>>> Hi Matt and Rune, >>>> >>>> Thanks for the comments. I agree that the important >>>> >> information is the >> >>>> scan number, since this is what you would like to look up >>>> >> in the raw >> >>>> data file. And it doesn't make much sense to have the scan repeated >>>> twice in the file, so I think we'll go for solution 2 and just keep >>>> >>>> >>> the >>> >>> >>>> sourceFileRef to one of the files. >>>> However, since we do have unique spectrum ids there should >>>> >> not be any >> >>>> real need to stick to the unique scan number requirement >>>> >> from what I >> >>>> >>>> >>> got >>> >>> >>>> from the indexing discussion, even if it is still in the specs (?). >>>> Couldn't there be cases when data is collected in >>>> >> different channels >> >>>> where the scan numbers are the same in different channels? >>>> >>>> Regards >>>> >>>> Fredrik >>>> >>>> Matthew Chambers skrev: >>>> >>>> >>>>> Hi Fredrik, >>>>> >>>>> Our group has a converter that does this conversion (to mzXML or >>>>> >>>>> >>> mzData >>> >>> >>>>> currently, not yet mzML, but they all have the same uniqueness >>>>> constraints on scan numbers and they all support multiple >>>>> >> precursors >> >>>>> >>>>> >>> at >>> >>> >>>>> least in theory); we went with solution 2 because solution 1 is >>>>> >>>>> >>> invalid >>> >>> >>>>> for all the XML formats (i.e. it would need a schema >>>>> >> change and that >> >>>>> change isn't likely to happen, whereas multiple >>>>> >> sourceFileRefs would >> >>>>> >>>>> >>> be >>> >>> >>>>> understandable). As I understand it, sourceFileRef is optional >>>>> ("<xs:attribute name="sourceFileRef" type="xs:anyURI" >>>>> >>>>> >>> use="optional">"), >>> >>> >>>>> so if you can't or don't want to encode it correctly, just don't >>>>> >>>>> >>> include >>> >>> >>>>> it. Our converter doesn't even bother to include the >>>>> >> sourceFileRefs >> >>>>> >>>>> >>> to >>> >>> >>>>> the DTAs, it's not helpful information IMO. As long as the >>>>> >>>>> >>> conversion is >>> >>> >>>>> done without data loss, get it over with and then have >>>>> >> mercy on your >> >>>>> filesystem by deleting the DTAs. ;) >>>>> >>>>> -Matt >>>>> >>>>> >>>>> Fredrik Levander wrote: >>>>> >>>>> >>>>> >>>>>> Hi All, >>>>>> >>>>>> In the Proteios platform we're including converters from >>>>>> >> some peak >> >>>>>> >>>>>> >>> list >>> >>> >>>>>> formats to mzData, and now also to mzML. It is clearly >>>>>> >> not optimal >> >>>>>> >>>>>> >>> with >>> >>> >>>>>> such conversion since instrument settings etcetera are lost. >>>>>> >>>>>> >>> However, I >>> >>> >>>>>> guess there will be need for such converters if someone wants to >>>>>> >>>>>> >>> use >>> >>> >>>>>> their old instruments with manufacturer peak picking algorithms. >>>>>> >>>>>> There are sample files generated from DTAs and ProteinLynx by the >>>>>> converters (0.99.1) at: >>>>>> http://trac.thep.lu.se/trac/fp6-prodac/browser/trunk/mzML >>>>>> >>>>>> The converters will be part of the new release of the Proteios >>>>>> >>>>>> >>> Software >>> >>> >>>>>> Environment, but if anyone would like to try them with >>>>>> >> their files, >> >>>>>> there is a standalone package (mzMLconverters.zip) at the address >>>>>> >>>>>> >>> above >>> >>> >>>>>> which should work under Windows/Linux/OSX with Java 1.5 >>>>>> >> or higher. >> >>>>>> Please notice that the output files are not schematically valid >>>>>> >>>>>> >>> since >>> >>> >>>>>> some terms are still missing in the CV. >>>>>> >>>>>> For the conversion of multiple DTA files to one mzML >>>>>> >> file there is >> >>>>>> >>>>>> >>> a >>> >>> >>>>>> small problem which is related to how lcq_dta generates >>>>>> >> dta files: >> >>>>>> >>>>>> >>> If >>> >>> >>>>>> the charge state of the precursor can not be determined, >>>>>> >> a spectrum >> >>>>>> >>>>>> >>> can >>> >>> >>>>>> result in two DTA files which are identical apart from the >>>>>> >>>>>> >>> precursor. >>> >>> >>>>>> There are two solutions on how to handle this: >>>>>> 1) Two spectra, with the same scanNumber but different >>>>>> >> spectrum Ids >> >>>>>> >>>>>> >>>> (The >>>> >>>> >>>>>> solution used by the current converter) >>>>>> 2) One spectrum, two precursors. However, this will not work with >>>>>> >>>>>> >>> the >>> >>> >>>>>> current schema since there can only be one sourceFileRef for a >>>>>> >>>>>> >>>> spectrum. >>>> >>>> >>>>>> Do you all think solution 1 is fine, or is there a >>>>>> >> better solution? >> >>>>>> Solution 2 seems to need schema changes. >>>>>> Other comments are also welcome >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Fredrik >>>>>> >>>>>> >>>>>> >>>>>> >> -------------------------------------------------------------- >> --------- >> >>> >>> >>>> -- >>>> >>>> >>>>>> This SF.net email is sponsored by: Microsoft >>>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>>> _______________________________________________ >>>>>> Psidev-ms-dev mailing list >>>>>> Psi...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >> -------------------------------------------------------------- >> ---------- >> >>> >>> >>>> - >>>> >>>> >>>>> This SF.net email is sponsored by: Microsoft >>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>> _______________________________________________ >>>>> Psidev-ms-dev mailing list >>>>> Psi...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>>> >>>>> >>>>> >>>> >>>> >> -------------------------------------------------------------- >> ---------- >> >>> - >>> >>> >>>> This SF.net email is sponsored by: Microsoft >>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>> _______________________________________________ >>>> Psidev-ms-dev mailing list >>>> Psi...@li... >>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>> >>>> >>> >> -------------------------------------------------------------- >> ----------- >> >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> Psidev-ms-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>> >>> >> -------------------------------------------------------------- >> ----------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > |
From: Coleman, M. <MK...@St...> - 2008-02-19 16:00:15
|
To clarify, I think having the converter be able to keep or discard the multiple charge information is fine. What I'm against is *forcing* this information to be discarded, which is how I interpret option (a) below. Mike > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On > Behalf Of Matthew Chambers > Sent: Tuesday, February 19, 2008 9:44 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] DTA to mzML conversion > > > Eh, I think we should leave it up to the implementor of the > converter. > Ideally the converter would be configurable to either keep the charge > state information or discard it. In either case, the scan > number would > only appear as a single element. > > -Matt > > > Coleman, Michael wrote: > > I'm strongly in favor of (b), i.e., keeping that charge state > > information. If the instrument software, or some other software > > upstream of the search engine has reason to believe that > the charge for > > a particular spectrum is +2 or +3 but not +1, or +2 but not > +1 or +3, or > > whatever, the search engine ought to be able to make use of this > > information. > > > > As a practical matter, the spectrum format we currently use > here (ms2, > > very similar to dta) efficiently encodes this information, > so not having > > it in mzML would be at least a minor argument for not > converting. (We > > could, of course, simply duplicate the entire spectrum in > this case, but > > this would further bloat the output, and still lose some important > > information.) > > > > Mike > > > > > > > > > > > >> -----Original Message----- > >> From: psi...@li... > >> [mailto:psi...@li...] On > >> Behalf Of Fredrik Levander > >> Sent: Tuesday, February 19, 2008 9:04 AM > >> To: Mass spectrometry standard development > >> Subject: Re: [Psidev-ms-dev] DTA to mzML conversion > >> > >> > >> Hi dta fans, > >> > >> I agree completely with 1 and 2. For 3 (several possible > >> charge states), > >> there seems to be two possibilities: > >> a) Do not write the chargestate at all into the mzML in > cases where > >> there are multiple guesses. > >> b) Put all the proposed values into one precursor. See line > >> 206-207 at: > >> http://trac.thep.lu.se/trac/fp6-prodac/browser/trunk/mzML/ADH0 > >> 71030_002.mzML?rev=26 > >> > >> Anyone else who would prefer either of a or b? At least > some search > >> engines would try both 2+ and 3+ if there is no charge > state given in > >> the file, so maybe solution a is better? Or does b have advantages? > >> > >> Fredrik > >> > >> Eric Deutsch wrote: > >> > >>> Hi everyone, regarding list dta to mzML conversion, here are my > >>> thoughts: > >>> > >>> 1) The current rule is that scanNumbers must be unique > >>> > >> within a file and > >> > >>> always increasing, although not necessarily sequentially. > >>> > >> IDs must be > >> > >>> unique within a file. I don't think should change for > >>> > >> conversion from > >> > >>> dta. > >>> > >>> 2) I would only encode the spectrum once, since as you say > >>> > >> it is just > >> > >>> one spectrum. > >>> > >>> 3) I don't even see why you need two precursors. When we > >>> > >> convert dta to > >> > >>> mzXML, duplicates were dropped and the actual observed > >>> > >> precursor mass > >> > >>> was put in the mzXML. Thus you are "losing" the > information that the > >>> spectrum could be charge 2 or 3. However, this information > >>> > >> was guessed > >> > >>> in the first place, and most software I know that extracts > >>> > >> a spectrum > >> > >>> with no charge information will apply some rules to decide on what > >>> charges to search. So, I suggest that the conversion from > >>> > >> dta to mzML is > >> > >>> just the reverse of mzML to dta. One spectrum per scan. If > >>> > >> only 1 charge > >> > >>> (dta file) is provided, encode it at the user's discretion. > >>> > >> If more than > >> > >>> 1 charge (dta file) is provided, encode the spectrum > >>> > >> without any charge > >> > >>> information. For LCQ data, it would probably be reasonable > >>> > >> to not encode > >> > >>> *any* charge information in the mzML file at all. Because > it doesn't > >>> come with any in the first place. > >>> > >>> We will be adding the functionality for multiple precursors > >>> > >> anyway for > >> > >>> the case when you have multiple peaks in your selection > >>> > >> window as seen, > >> > >>> e.g., in an orbitrap. I suppose there's no reason you > couldn't take > >>> advantage of that to encode both the 2+ and 3+ although I wouldn't > >>> recommend it. > >>> > >>> Eric > >>> > >>> > >>> > >>> > >>> > >>>> -----Original Message----- > >>>> From: psi...@li... > >>>> > >>>> > >>> [mailto:psidev-ms-dev- > >>> > >>> > >>>> bo...@li...] On Behalf Of Fredrik Levander > >>>> Sent: Thursday, February 14, 2008 9:55 AM > >>>> To: Mass spectrometry standard development > >>>> Subject: Re: [Psidev-ms-dev] DTA to mzML conversion > >>>> > >>>> Hi Matt and Rune, > >>>> > >>>> Thanks for the comments. I agree that the important > >>>> > >> information is the > >> > >>>> scan number, since this is what you would like to look up > >>>> > >> in the raw > >> > >>>> data file. And it doesn't make much sense to have the > scan repeated > >>>> twice in the file, so I think we'll go for solution 2 > and just keep > >>>> > >>>> > >>> the > >>> > >>> > >>>> sourceFileRef to one of the files. > >>>> However, since we do have unique spectrum ids there should > >>>> > >> not be any > >> > >>>> real need to stick to the unique scan number requirement > >>>> > >> from what I > >> > >>>> > >>>> > >>> got > >>> > >>> > >>>> from the indexing discussion, even if it is still in the > specs (?). > >>>> Couldn't there be cases when data is collected in > >>>> > >> different channels > >> > >>>> where the scan numbers are the same in different channels? > >>>> > >>>> Regards > >>>> > >>>> Fredrik > >>>> > >>>> Matthew Chambers skrev: > >>>> > >>>> > >>>>> Hi Fredrik, > >>>>> > >>>>> Our group has a converter that does this conversion (to mzXML or > >>>>> > >>>>> > >>> mzData > >>> > >>> > >>>>> currently, not yet mzML, but they all have the same uniqueness > >>>>> constraints on scan numbers and they all support multiple > >>>>> > >> precursors > >> > >>>>> > >>>>> > >>> at > >>> > >>> > >>>>> least in theory); we went with solution 2 because solution 1 is > >>>>> > >>>>> > >>> invalid > >>> > >>> > >>>>> for all the XML formats (i.e. it would need a schema > >>>>> > >> change and that > >> > >>>>> change isn't likely to happen, whereas multiple > >>>>> > >> sourceFileRefs would > >> > >>>>> > >>>>> > >>> be > >>> > >>> > >>>>> understandable). As I understand it, sourceFileRef is optional > >>>>> ("<xs:attribute name="sourceFileRef" type="xs:anyURI" > >>>>> > >>>>> > >>> use="optional">"), > >>> > >>> > >>>>> so if you can't or don't want to encode it correctly, just don't > >>>>> > >>>>> > >>> include > >>> > >>> > >>>>> it. Our converter doesn't even bother to include the > >>>>> > >> sourceFileRefs > >> > >>>>> > >>>>> > >>> to > >>> > >>> > >>>>> the DTAs, it's not helpful information IMO. As long as the > >>>>> > >>>>> > >>> conversion is > >>> > >>> > >>>>> done without data loss, get it over with and then have > >>>>> > >> mercy on your > >> > >>>>> filesystem by deleting the DTAs. ;) > >>>>> > >>>>> -Matt > >>>>> > >>>>> > >>>>> Fredrik Levander wrote: > >>>>> > >>>>> > >>>>> > >>>>>> Hi All, > >>>>>> > >>>>>> In the Proteios platform we're including converters from > >>>>>> > >> some peak > >> > >>>>>> > >>>>>> > >>> list > >>> > >>> > >>>>>> formats to mzData, and now also to mzML. It is clearly > >>>>>> > >> not optimal > >> > >>>>>> > >>>>>> > >>> with > >>> > >>> > >>>>>> such conversion since instrument settings etcetera are lost. > >>>>>> > >>>>>> > >>> However, I > >>> > >>> > >>>>>> guess there will be need for such converters if > someone wants to > >>>>>> > >>>>>> > >>> use > >>> > >>> > >>>>>> their old instruments with manufacturer peak picking > algorithms. > >>>>>> > >>>>>> There are sample files generated from DTAs and > ProteinLynx by the > >>>>>> converters (0.99.1) at: > >>>>>> http://trac.thep.lu.se/trac/fp6-prodac/browser/trunk/mzML > >>>>>> > >>>>>> The converters will be part of the new release of the Proteios > >>>>>> > >>>>>> > >>> Software > >>> > >>> > >>>>>> Environment, but if anyone would like to try them with > >>>>>> > >> their files, > >> > >>>>>> there is a standalone package (mzMLconverters.zip) at > the address > >>>>>> > >>>>>> > >>> above > >>> > >>> > >>>>>> which should work under Windows/Linux/OSX with Java 1.5 > >>>>>> > >> or higher. > >> > >>>>>> Please notice that the output files are not schematically valid > >>>>>> > >>>>>> > >>> since > >>> > >>> > >>>>>> some terms are still missing in the CV. > >>>>>> > >>>>>> For the conversion of multiple DTA files to one mzML > >>>>>> > >> file there is > >> > >>>>>> > >>>>>> > >>> a > >>> > >>> > >>>>>> small problem which is related to how lcq_dta generates > >>>>>> > >> dta files: > >> > >>>>>> > >>>>>> > >>> If > >>> > >>> > >>>>>> the charge state of the precursor can not be determined, > >>>>>> > >> a spectrum > >> > >>>>>> > >>>>>> > >>> can > >>> > >>> > >>>>>> result in two DTA files which are identical apart from the > >>>>>> > >>>>>> > >>> precursor. > >>> > >>> > >>>>>> There are two solutions on how to handle this: > >>>>>> 1) Two spectra, with the same scanNumber but different > >>>>>> > >> spectrum Ids > >> > >>>>>> > >>>>>> > >>>> (The > >>>> > >>>> > >>>>>> solution used by the current converter) > >>>>>> 2) One spectrum, two precursors. However, this will > not work with > >>>>>> > >>>>>> > >>> the > >>> > >>> > >>>>>> current schema since there can only be one sourceFileRef for a > >>>>>> > >>>>>> > >>>> spectrum. > >>>> > >>>> > >>>>>> Do you all think solution 1 is fine, or is there a > >>>>>> > >> better solution? > >> > >>>>>> Solution 2 seems to need schema changes. > >>>>>> Other comments are also welcome > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> Fredrik > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >> -------------------------------------------------------------- > >> --------- > >> > >>> > >>> > >>>> -- > >>>> > >>>> > >>>>>> This SF.net email is sponsored by: Microsoft > >>>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. > >>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>>>>> _______________________________________________ > >>>>>> Psidev-ms-dev mailing list > >>>>>> Psi...@li... > >>>>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >> -------------------------------------------------------------- > >> ---------- > >> > >>> > >>> > >>>> - > >>>> > >>>> > >>>>> This SF.net email is sponsored by: Microsoft > >>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. > >>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>>>> _______________________________________________ > >>>>> Psidev-ms-dev mailing list > >>>>> Psi...@li... > >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >>>>> > >>>>> > >>>>> > >>>> > >>>> > >> -------------------------------------------------------------- > >> ---------- > >> > >>> - > >>> > >>> > >>>> This SF.net email is sponsored by: Microsoft > >>>> Defy all challenges. Microsoft(R) Visual Studio 2008. > >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>>> _______________________________________________ > >>>> Psidev-ms-dev mailing list > >>>> Psi...@li... > >>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >>>> > >>>> > >>> > >> -------------------------------------------------------------- > >> ----------- > >> > >>> This SF.net email is sponsored by: Microsoft > >>> Defy all challenges. Microsoft(R) Visual Studio 2008. > >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>> _______________________________________________ > >>> Psidev-ms-dev mailing list > >>> Psi...@li... > >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >>> > >>> > >> -------------------------------------------------------------- > >> ----------- > >> This SF.net email is sponsored by: Microsoft > >> Defy all challenges. Microsoft(R) Visual Studio 2008. > >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >> _______________________________________________ > >> Psidev-ms-dev mailing list > >> Psi...@li... > >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >> > >> > > > > > -------------------------------------------------------------- > ----------- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > > -------------------------------------------------------------- > ----------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Eric D. <ede...@sy...> - 2008-02-26 19:09:10
|
Okay, I think it is quite reasonable to support this then. So currently we have: <ionSelection> <cvParam cvLabel="MS" accession="MS:1000040" name="m/z" value="445.34"/> <cvParam cvLabel="MS" accession="MS:1000041" name="charge state" value="2"/> </ionSelection> Or <ionSelection> <cvParam cvLabel="MS" accession="MS:1000040" name="m/z" value="445.34"/> </ionSelection> How about we support this: <ionSelection> <cvParam cvLabel="MS" accession="MS:1000040" name="m/z" value="445.34"/> <cvParam cvLabel="MS" accession="MS:100xxxx" name="possible charge state" value="2"/> <cvParam cvLabel="MS" accession="MS:100xxxx" name="possible charge state" value="3"/> </ionSelection> So this means adding a term for a possible charge state instead of a known charge state and then allowing multiple cvParams. I would say the validation rule is that only 0-1 "charge state" is allowed, or alternatively 0-N "multiple charge state" is allowed. Seem okay? Thanks, Eric > From: psi...@li... [mailto:psidev-ms-dev- > bo...@li...] On Behalf Of Coleman, Michael > Sent: Tuesday, February 19, 2008 7:59 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] DTA to mzML conversion > > To clarify, I think having the converter be able to keep or discard the > multiple charge information is fine. What I'm against is *forcing* this > information to be discarded, which is how I interpret option (a) below. > > Mike > > > > > -----Original Message----- > > From: psi...@li... > > [mailto:psi...@li...] On > > Behalf Of Matthew Chambers > > Sent: Tuesday, February 19, 2008 9:44 AM > > To: Mass spectrometry standard development > > Subject: Re: [Psidev-ms-dev] DTA to mzML conversion > > > > > > Eh, I think we should leave it up to the implementor of the > > converter. > > Ideally the converter would be configurable to either keep the charge > > state information or discard it. In either case, the scan > > number would > > only appear as a single element. > > > > -Matt > > > > > > Coleman, Michael wrote: > > > I'm strongly in favor of (b), i.e., keeping that charge state > > > information. If the instrument software, or some other software > > > upstream of the search engine has reason to believe that > > the charge for > > > a particular spectrum is +2 or +3 but not +1, or +2 but not > > +1 or +3, or > > > whatever, the search engine ought to be able to make use of this > > > information. > > > > > > As a practical matter, the spectrum format we currently use > > here (ms2, > > > very similar to dta) efficiently encodes this information, > > so not having > > > it in mzML would be at least a minor argument for not > > converting. (We > > > could, of course, simply duplicate the entire spectrum in > > this case, but > > > this would further bloat the output, and still lose some important > > > information.) > > > > > > Mike > > > > > > > > > > > > > > > > > >> -----Original Message----- > > >> From: psi...@li... > > >> [mailto:psi...@li...] On > > >> Behalf Of Fredrik Levander > > >> Sent: Tuesday, February 19, 2008 9:04 AM > > >> To: Mass spectrometry standard development > > >> Subject: Re: [Psidev-ms-dev] DTA to mzML conversion > > >> > > >> > > >> Hi dta fans, > > >> > > >> I agree completely with 1 and 2. For 3 (several possible > > >> charge states), > > >> there seems to be two possibilities: > > >> a) Do not write the chargestate at all into the mzML in > > cases where > > >> there are multiple guesses. > > >> b) Put all the proposed values into one precursor. See line > > >> 206-207 at: > > >> http://trac.thep.lu.se/trac/fp6-prodac/browser/trunk/mzML/ADH0 > > >> 71030_002.mzML?rev=26 > > >> > > >> Anyone else who would prefer either of a or b? At least > > some search > > >> engines would try both 2+ and 3+ if there is no charge > > state given in > > >> the file, so maybe solution a is better? Or does b have advantages? > > >> > > >> Fredrik > > >> > > >> Eric Deutsch wrote: > > >> > > >>> Hi everyone, regarding list dta to mzML conversion, here are my > > >>> thoughts: > > >>> > > >>> 1) The current rule is that scanNumbers must be unique > > >>> > > >> within a file and > > >> > > >>> always increasing, although not necessarily sequentially. > > >>> > > >> IDs must be > > >> > > >>> unique within a file. I don't think should change for > > >>> > > >> conversion from > > >> > > >>> dta. > > >>> > > >>> 2) I would only encode the spectrum once, since as you say > > >>> > > >> it is just > > >> > > >>> one spectrum. > > >>> > > >>> 3) I don't even see why you need two precursors. When we > > >>> > > >> convert dta to > > >> > > >>> mzXML, duplicates were dropped and the actual observed > > >>> > > >> precursor mass > > >> > > >>> was put in the mzXML. Thus you are "losing" the > > information that the > > >>> spectrum could be charge 2 or 3. However, this information > > >>> > > >> was guessed > > >> > > >>> in the first place, and most software I know that extracts > > >>> > > >> a spectrum > > >> > > >>> with no charge information will apply some rules to decide on what > > >>> charges to search. So, I suggest that the conversion from > > >>> > > >> dta to mzML is > > >> > > >>> just the reverse of mzML to dta. One spectrum per scan. If > > >>> > > >> only 1 charge > > >> > > >>> (dta file) is provided, encode it at the user's discretion. > > >>> > > >> If more than > > >> > > >>> 1 charge (dta file) is provided, encode the spectrum > > >>> > > >> without any charge > > >> > > >>> information. For LCQ data, it would probably be reasonable > > >>> > > >> to not encode > > >> > > >>> *any* charge information in the mzML file at all. Because > > it doesn't > > >>> come with any in the first place. > > >>> > > >>> We will be adding the functionality for multiple precursors > > >>> > > >> anyway for > > >> > > >>> the case when you have multiple peaks in your selection > > >>> > > >> window as seen, > > >> > > >>> e.g., in an orbitrap. I suppose there's no reason you > > couldn't take > > >>> advantage of that to encode both the 2+ and 3+ although I wouldn't > > >>> recommend it. > > >>> > > >>> Eric > > >>> > > >>> > > >>> > > >>> > > >>> > > >>>> -----Original Message----- > > >>>> From: psi...@li... > > >>>> > > >>>> > > >>> [mailto:psidev-ms-dev- > > >>> > > >>> > > >>>> bo...@li...] On Behalf Of Fredrik Levander > > >>>> Sent: Thursday, February 14, 2008 9:55 AM > > >>>> To: Mass spectrometry standard development > > >>>> Subject: Re: [Psidev-ms-dev] DTA to mzML conversion > > >>>> > > >>>> Hi Matt and Rune, > > >>>> > > >>>> Thanks for the comments. I agree that the important > > >>>> > > >> information is the > > >> > > >>>> scan number, since this is what you would like to look up > > >>>> > > >> in the raw > > >> > > >>>> data file. And it doesn't make much sense to have the > > scan repeated > > >>>> twice in the file, so I think we'll go for solution 2 > > and just keep > > >>>> > > >>>> > > >>> the > > >>> > > >>> > > >>>> sourceFileRef to one of the files. > > >>>> However, since we do have unique spectrum ids there should > > >>>> > > >> not be any > > >> > > >>>> real need to stick to the unique scan number requirement > > >>>> > > >> from what I > > >> > > >>>> > > >>>> > > >>> got > > >>> > > >>> > > >>>> from the indexing discussion, even if it is still in the > > specs (?). > > >>>> Couldn't there be cases when data is collected in > > >>>> > > >> different channels > > >> > > >>>> where the scan numbers are the same in different channels? > > >>>> > > >>>> Regards > > >>>> > > >>>> Fredrik > > >>>> > > >>>> Matthew Chambers skrev: > > >>>> > > >>>> > > >>>>> Hi Fredrik, > > >>>>> > > >>>>> Our group has a converter that does this conversion (to mzXML or > > >>>>> > > >>>>> > > >>> mzData > > >>> > > >>> > > >>>>> currently, not yet mzML, but they all have the same uniqueness > > >>>>> constraints on scan numbers and they all support multiple > > >>>>> > > >> precursors > > >> > > >>>>> > > >>>>> > > >>> at > > >>> > > >>> > > >>>>> least in theory); we went with solution 2 because solution 1 is > > >>>>> > > >>>>> > > >>> invalid > > >>> > > >>> > > >>>>> for all the XML formats (i.e. it would need a schema > > >>>>> > > >> change and that > > >> > > >>>>> change isn't likely to happen, whereas multiple > > >>>>> > > >> sourceFileRefs would > > >> > > >>>>> > > >>>>> > > >>> be > > >>> > > >>> > > >>>>> understandable). As I understand it, sourceFileRef is optional > > >>>>> ("<xs:attribute name="sourceFileRef" type="xs:anyURI" > > >>>>> > > >>>>> > > >>> use="optional">"), > > >>> > > >>> > > >>>>> so if you can't or don't want to encode it correctly, just don't > > >>>>> > > >>>>> > > >>> include > > >>> > > >>> > > >>>>> it. Our converter doesn't even bother to include the > > >>>>> > > >> sourceFileRefs > > >> > > >>>>> > > >>>>> > > >>> to > > >>> > > >>> > > >>>>> the DTAs, it's not helpful information IMO. As long as the > > >>>>> > > >>>>> > > >>> conversion is > > >>> > > >>> > > >>>>> done without data loss, get it over with and then have > > >>>>> > > >> mercy on your > > >> > > >>>>> filesystem by deleting the DTAs. ;) > > >>>>> > > >>>>> -Matt > > >>>>> > > >>>>> > > >>>>> Fredrik Levander wrote: > > >>>>> > > >>>>> > > >>>>> > > >>>>>> Hi All, > > >>>>>> > > >>>>>> In the Proteios platform we're including converters from > > >>>>>> > > >> some peak > > >> > > >>>>>> > > >>>>>> > > >>> list > > >>> > > >>> > > >>>>>> formats to mzData, and now also to mzML. It is clearly > > >>>>>> > > >> not optimal > > >> > > >>>>>> > > >>>>>> > > >>> with > > >>> > > >>> > > >>>>>> such conversion since instrument settings etcetera are lost. > > >>>>>> > > >>>>>> > > >>> However, I > > >>> > > >>> > > >>>>>> guess there will be need for such converters if > > someone wants to > > >>>>>> > > >>>>>> > > >>> use > > >>> > > >>> > > >>>>>> their old instruments with manufacturer peak picking > > algorithms. > > >>>>>> > > >>>>>> There are sample files generated from DTAs and > > ProteinLynx by the > > >>>>>> converters (0.99.1) at: > > >>>>>> http://trac.thep.lu.se/trac/fp6-prodac/browser/trunk/mzML > > >>>>>> > > >>>>>> The converters will be part of the new release of the Proteios > > >>>>>> > > >>>>>> > > >>> Software > > >>> > > >>> > > >>>>>> Environment, but if anyone would like to try them with > > >>>>>> > > >> their files, > > >> > > >>>>>> there is a standalone package (mzMLconverters.zip) at > > the address > > >>>>>> > > >>>>>> > > >>> above > > >>> > > >>> > > >>>>>> which should work under Windows/Linux/OSX with Java 1.5 > > >>>>>> > > >> or higher. > > >> > > >>>>>> Please notice that the output files are not schematically valid > > >>>>>> > > >>>>>> > > >>> since > > >>> > > >>> > > >>>>>> some terms are still missing in the CV. > > >>>>>> > > >>>>>> For the conversion of multiple DTA files to one mzML > > >>>>>> > > >> file there is > > >> > > >>>>>> > > >>>>>> > > >>> a > > >>> > > >>> > > >>>>>> small problem which is related to how lcq_dta generates > > >>>>>> > > >> dta files: > > >> > > >>>>>> > > >>>>>> > > >>> If > > >>> > > >>> > > >>>>>> the charge state of the precursor can not be determined, > > >>>>>> > > >> a spectrum > > >> > > >>>>>> > > >>>>>> > > >>> can > > >>> > > >>> > > >>>>>> result in two DTA files which are identical apart from the > > >>>>>> > > >>>>>> > > >>> precursor. > > >>> > > >>> > > >>>>>> There are two solutions on how to handle this: > > >>>>>> 1) Two spectra, with the same scanNumber but different > > >>>>>> > > >> spectrum Ids > > >> > > >>>>>> > > >>>>>> > > >>>> (The > > >>>> > > >>>> > > >>>>>> solution used by the current converter) > > >>>>>> 2) One spectrum, two precursors. However, this will > > not work with > > >>>>>> > > >>>>>> > > >>> the > > >>> > > >>> > > >>>>>> current schema since there can only be one sourceFileRef for a > > >>>>>> > > >>>>>> > > >>>> spectrum. > > >>>> > > >>>> > > >>>>>> Do you all think solution 1 is fine, or is there a > > >>>>>> > > >> better solution? > > >> > > >>>>>> Solution 2 seems to need schema changes. > > >>>>>> Other comments are also welcome > > >>>>>> > > >>>>>> Thanks, > > >>>>>> > > >>>>>> Fredrik > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >> -------------------------------------------------------------- > > >> --------- > > >> > > >>> > > >>> > > >>>> -- > > >>>> > > >>>> > > >>>>>> This SF.net email is sponsored by: Microsoft > > >>>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. > > >>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > >>>>>> _______________________________________________ > > >>>>>> Psidev-ms-dev mailing list > > >>>>>> Psi...@li... > > >>>>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>> > > >>>>> > > >> -------------------------------------------------------------- > > >> ---------- > > >> > > >>> > > >>> > > >>>> - > > >>>> > > >>>> > > >>>>> This SF.net email is sponsored by: Microsoft > > >>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. > > >>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > >>>>> _______________________________________________ > > >>>>> Psidev-ms-dev mailing list > > >>>>> Psi...@li... > > >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > >>>>> > > >>>>> > > >>>>> > > >>>> > > >>>> > > >> -------------------------------------------------------------- > > >> ---------- > > >> > > >>> - > > >>> > > >>> > > >>>> This SF.net email is sponsored by: Microsoft > > >>>> Defy all challenges. Microsoft(R) Visual Studio 2008. > > >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > >>>> _______________________________________________ > > >>>> Psidev-ms-dev mailing list > > >>>> Psi...@li... > > >>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > >>>> > > >>>> > > >>> > > >> -------------------------------------------------------------- > > >> ----------- > > >> > > >>> This SF.net email is sponsored by: Microsoft > > >>> Defy all challenges. Microsoft(R) Visual Studio 2008. > > >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > >>> _______________________________________________ > > >>> Psidev-ms-dev mailing list > > >>> Psi...@li... > > >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > >>> > > >>> > > >> -------------------------------------------------------------- > > >> ----------- > > >> This SF.net email is sponsored by: Microsoft > > >> Defy all challenges. Microsoft(R) Visual Studio 2008. > > >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > >> _______________________________________________ > > >> Psidev-ms-dev mailing list > > >> Psi...@li... > > >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > >> > > >> > > > > > > > > -------------------------------------------------------------- > > ----------- > > > This SF.net email is sponsored by: Microsoft > > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > > _______________________________________________ > > > Psidev-ms-dev mailing list > > > Psi...@li... > > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > > > > > > -------------------------------------------------------------- > > ----------- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Fredrik L. <Fre...@im...> - 2008-03-04 12:40:39
|
I think this a reasonable solution. Only minor objection is that 'charge state' is to some degree 'possible charge state' in many cases, and it will be up to implementers to decide which term to use in the case where there is just one value to report. But this is maybe even a nice feature to be able to flag that the charge state determination was not certain. This 'possible charge state' term seems much better than stating that a peak has two charge states, which just feels wrong (if there is not two overlapping peaks). Just wanted to comment on this since I cannot make the telecon tonight (or this morning). Fredrik Eric Deutsch wrote: > Okay, I think it is quite reasonable to support this then. So currently > we have: > > <ionSelection> > <cvParam cvLabel="MS" accession="MS:1000040" name="m/z" > value="445.34"/> > <cvParam cvLabel="MS" accession="MS:1000041" name="charge state" > value="2"/> > </ionSelection> > > Or > > <ionSelection> > <cvParam cvLabel="MS" accession="MS:1000040" name="m/z" > value="445.34"/> > </ionSelection> > > How about we support this: > > <ionSelection> > <cvParam cvLabel="MS" accession="MS:1000040" name="m/z" > value="445.34"/> > <cvParam cvLabel="MS" accession="MS:100xxxx" name="possible charge > state" value="2"/> > <cvParam cvLabel="MS" accession="MS:100xxxx" name="possible charge > state" value="3"/> > </ionSelection> > > So this means adding a term for a possible charge state instead of a > known charge state and then allowing multiple cvParams. > > I would say the validation rule is that only 0-1 "charge state" is > allowed, or alternatively 0-N "multiple charge state" is allowed. > > Seem okay? > > Thanks, > Eric > > |
From: Rune S. P. <ru...@ph...> - 2008-03-04 10:50:24
|
And in the index you would of course have to leave out the nativeScanReference attribute -- Rune Fredrik Levander wrote: > Hi Rune, > > In this case you will not have a <scan> element, or at least not the > nativeScanReference, but rather an acquisitionList with acquisitions. > Each of the acquisitions will have a scan reference, which is either a > reference to an mzML scan, or to a vendor raw data file scan, if I get > it right. > > Regards > > Fredrik > > Rune Schjellerup Philosof wrote: > >> Eric Deutsch wrote: >> >> >>> So it seems like the final suggestion is something like: >>> <spectrum index="18" id="S2,4,6" > >>> <scan nativeScanReference="2,4,6"> >>> </scan> >>> </spectrum> >>> >>> >>> >> that would be >> <spectrum><spectrumDescription><scan nativeScanReference="2,4,6"> >> >> This would work for cases when each spectrum refers to a single native scan. >> But aren't there support for having a spectrum that is for instance a >> combination of several native scans? >> >> [Term] >> id: MS:1000570 >> name: spectra combination >> def: "Method used to combine the mass spectra." [PSI:MS] >> relationship: part_of MS:1000442 ! spectrum >> >> >> What to do in this case? >> >> >> -- >> Rune >> |
From: Fredrik L. <Fre...@im...> - 2008-03-04 11:02:21
|
Either that, or keep nativeScanReference pointing to the first acquisition in the acquisitionList. I just think that it would get unnecessarily complex to reference all native scans from the index in such a file, but that's just my opinion. Fredrik Rune Schjellerup Philosof wrote: > And in the index you would of course have to leave out the > > nativeScanReference attribute > > -- > Rune > > Fredrik Levander wrote: > >> Hi Rune, >> >> In this case you will not have a <scan> element, or at least not the >> nativeScanReference, but rather an acquisitionList with acquisitions. >> Each of the acquisitions will have a scan reference, which is either a >> reference to an mzML scan, or to a vendor raw data file scan, if I get >> it right. >> >> Regards >> >> Fredrik >> >> Rune Schjellerup Philosof wrote: >> >> >>> Eric Deutsch wrote: >>> >>> >>> >>>> So it seems like the final suggestion is something like: >>>> <spectrum index="18" id="S2,4,6" > >>>> <scan nativeScanReference="2,4,6"> >>>> </scan> >>>> </spectrum> >>>> >>>> >>>> >>>> >>> that would be >>> <spectrum><spectrumDescription><scan nativeScanReference="2,4,6"> >>> >>> This would work for cases when each spectrum refers to a single native scan. >>> But aren't there support for having a spectrum that is for instance a >>> combination of several native scans? >>> >>> [Term] >>> id: MS:1000570 >>> name: spectra combination >>> def: "Method used to combine the mass spectra." [PSI:MS] >>> relationship: part_of MS:1000442 ! spectrum >>> >>> >>> What to do in this case? >>> >>> >>> -- >>> Rune >>> >>> > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Matt C. <mat...@va...> - 2008-03-04 14:20:32
|
That's an interesting edge case for the nativeID. I am inclined to suggest that the nativeID be put in the acquisition, or that spectral combination be rethought entirely. Since a combined spectrum is essentially a "meta-spectrum" and not a real spectrum (at least I would think about it that way), we could use a separate element entirely to encode combinations. The combinations themselves would not have any of the normal attributes or cvParams of a <spectrum>, just a list of references to the actual <spectrum> elements elsewhere in the file. This would be a better way in terms of avoiding (meta)data loss. It would be reasonable to allow the <metaspectrum> or <combined_spectrum> to have binaryDataArrays containing the combined data, though, since that would take the most time to regenerate. But if the spectra are combined before putting them in the file, you lose the data from the individual acquisitions. -Matt Rune Schjellerup Philosof wrote: > Eric Deutsch wrote: > >> So it seems like the final suggestion is something like: >> <spectrum index="18" id="S2,4,6" > >> <scan nativeScanReference="2,4,6"> >> </scan> >> </spectrum> >> >> > > that would be > <spectrum><spectrumDescription><scan nativeScanReference="2,4,6"> > > This would work for cases when each spectrum refers to a single native scan. > But aren't there support for having a spectrum that is for instance a > combination of several native scans? > > [Term] > id: MS:1000570 > name: spectra combination > def: "Method used to combine the mass spectra." [PSI:MS] > relationship: part_of MS:1000442 ! spectrum > > > What to do in this case? > > > -- > Rune |
From: Fredrik L. <Fre...@im...> - 2008-03-04 15:13:29
|
As I get it nativeID is the about the same as what we currently have in acquistion number + spectrumRef + sourceFileId (or number + externalSpectrumID + sourceFileId), if the source file is a vendor raw data file. If the source file is a mzML file, the externalSpectrumRef would equal the spectrum id in that file. If the spectrum id could be parsed to get native scan id everything is there to get the native scan. <acquisition number="1" externalSpectrumRef="S1F1" sourceFileRef="SF1"/> where SF1 is a MassLynx raw data folder. If SF1 is an mzML file and the same convention was used for the spectrum id as for the externalSpectrumRef, we could easily retrieve the native scan. The question is what to put into spectrum id if the spectrum (or probably peak list) is a combination of spectra. It could be the first external spectrum id in the acquisitionList plus maybe a letter to indicate combination. I see no need to further complicate things by introducing a <metaspectrum> or similar though, the current <spectrum> works fine for both single scans or combined spectra. Fredrik Matt Chambers wrote: > That's an interesting edge case for the nativeID. I am inclined to > suggest that the nativeID be put in the acquisition, or that spectral > combination be rethought entirely. Since a combined spectrum is > essentially a "meta-spectrum" and not a real spectrum (at least I would > think about it that way), we could use a separate element entirely to > encode combinations. The combinations themselves would not have any of > the normal attributes or cvParams of a <spectrum>, just a list of > references to the actual <spectrum> elements elsewhere in the file. This > would be a better way in terms of avoiding (meta)data loss. It would be > reasonable to allow the <metaspectrum> or <combined_spectrum> to have > binaryDataArrays containing the combined data, though, since that would > take the most time to regenerate. But if the spectra are combined before > putting them in the file, you lose the data from the individual > acquisitions. > > -Matt > > > Rune Schjellerup Philosof wrote: > >> Eric Deutsch wrote: >> >> >>> So it seems like the final suggestion is something like: >>> <spectrum index="18" id="S2,4,6" > >>> <scan nativeScanReference="2,4,6"> >>> </scan> >>> </spectrum> >>> >>> >>> >> that would be >> <spectrum><spectrumDescription><scan nativeScanReference="2,4,6"> >> >> This would work for cases when each spectrum refers to a single native scan. >> But aren't there support for having a spectrum that is for instance a >> combination of several native scans? >> >> [Term] >> id: MS:1000570 >> name: spectra combination >> def: "Method used to combine the mass spectra." [PSI:MS] >> relationship: part_of MS:1000442 ! spectrum >> >> >> What to do in this case? >> >> >> -- >> Rune >> > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Eric D. <ede...@sy...> - 2008-03-04 16:52:47
|
Hi everyone, I then revise my previous pseudo XML to: <nativeScanRefFormat> <cvParam cvLabel="MS" accession="MS:1099580" name="Masswolf format nativeScanReference"/> </nativeScanRefFormat> <spectrum index="18" id="C2F4S6" nativeID="C2F4S6"> </spectrum> <offset index="18" id="S19" nativeID="C2F4S6">1234</offset> For Thermo, we would have: <nativeScanRefFormat> <cvParam cvLabel="MS" accession="MS:1099581" name="Thermo format nativeScanReference"/> </nativeScanRefFormat> <spectrum index="18" id="S19" nativeID="19"> </spectrum> <offset index="18" id="S19" nativeID="19">1234</offset> Are we converging on this? ----------- Thread summary: - Rune asks what would happen for combined spectra? - Fredrik says that in that case, there would be not <scan> element anyway, just an acquisitionList - and Rune adds that then nativeScanReference would be left out of the index - Or Fredrik offers that you could keep nativeScanReference pointing to the first scan - Fredrik asks how it would be to force a naming convention for spectrum ids? e.g. S1F1C1 would mean scan1, function 1, cycle 1, while S1 just means scan 1 - Darren suggests that nativeID would be an attribute of <spectrum> not scan This nativeID can refer to a scan number or to acquisition numbers - Darren also strongly favors a zero-based index instead of a 1-based index - Matt suggests a metaspectrum - Fredrik counters that metaspectrum is not needed. Just: <acquisition number="1" externalSpectrumRef="S1F1" sourceFileRef="SF1"/> where SF1 is a MassLynx raw data folder. If SF1 is an mzML file and the same convention was used for the spectrum id as for the externalSpectrumRef, we could easily retrieve the native scan. - Eric revises: <nativeScanRefFormat> <cvParam cvLabel="MS" accession="MS:1099580" name="Masswolf format nativeScanReference"/> </nativeScanRefFormat> <spectrum index="18" id="C2F4S6" nativeID="C2F4S6"> </spectrum> <offset index="18" id="S19" nativeID="C2F4S6">1234</offset> For Thermo, we would have: <nativeScanRefFormat> <cvParam cvLabel="MS" accession="MS:1099581" name="Thermo format nativeScanReference"/> </nativeScanRefFormat> <spectrum index="18" id="S19" nativeID="19"> </spectrum> <offset index="18" id="S19" nativeID="19">1234</offset> > -----Original Message----- > From: psi...@li... [mailto:psidev-ms-dev- > bo...@li...] On Behalf Of Fredrik Levander > Sent: Tuesday, March 04, 2008 7:12 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] Unique scan numbers > > As I get it nativeID is the about the same as what we currently have in > acquistion number + spectrumRef + sourceFileId > (or number + externalSpectrumID + sourceFileId), if the source file is a > vendor raw data file. If the source file is a mzML file, the > externalSpectrumRef would equal the spectrum id in that file. If the > spectrum id could be parsed to get native scan id everything is there to > get the native scan. > > <acquisition number="1" externalSpectrumRef="S1F1" sourceFileRef="SF1"/> > > where SF1 is a MassLynx raw data folder. > > If SF1 is an mzML file and the same convention was used for the spectrum > id as for the externalSpectrumRef, we could easily retrieve the native > scan. > > The question is what to put into spectrum id if the spectrum (or > probably peak list) is a combination of spectra. It could be the first > external spectrum id in the acquisitionList plus maybe a letter to > indicate combination. > I see no need to further complicate things by introducing a > <metaspectrum> or similar though, the current <spectrum> works fine for > both single scans or combined spectra. > > Fredrik > > Matt Chambers wrote: > > That's an interesting edge case for the nativeID. I am inclined to > > suggest that the nativeID be put in the acquisition, or that spectral > > combination be rethought entirely. Since a combined spectrum is > > essentially a "meta-spectrum" and not a real spectrum (at least I would > > think about it that way), we could use a separate element entirely to > > encode combinations. The combinations themselves would not have any of > > the normal attributes or cvParams of a <spectrum>, just a list of > > references to the actual <spectrum> elements elsewhere in the file. This > > would be a better way in terms of avoiding (meta)data loss. It would be > > reasonable to allow the <metaspectrum> or <combined_spectrum> to have > > binaryDataArrays containing the combined data, though, since that would > > take the most time to regenerate. But if the spectra are combined before > > putting them in the file, you lose the data from the individual > > acquisitions. > > > > -Matt > > > > > > Rune Schjellerup Philosof wrote: > > > >> Eric Deutsch wrote: > >> > >> > >>> So it seems like the final suggestion is something like: > >>> <spectrum index="18" id="S2,4,6" > > >>> <scan nativeScanReference="2,4,6"> > >>> </scan> > >>> </spectrum> > >>> > >>> > >>> > >> that would be > >> <spectrum><spectrumDescription><scan nativeScanReference="2,4,6"> > >> > >> This would work for cases when each spectrum refers to a single native > scan. > >> But aren't there support for having a spectrum that is for instance a > >> combination of several native scans? > >> > >> [Term] > >> id: MS:1000570 > >> name: spectra combination > >> def: "Method used to combine the mass spectra." [PSI:MS] > >> relationship: part_of MS:1000442 ! spectrum > >> > >> > >> What to do in this case? > >> > >> > >> -- > >> Rune > >> > > > > > > ------------------------------------------------------------------------ > - > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Kessner, D. E. <Dar...@cs...> - 2008-03-04 16:55:55
|
>Are we converging on this? That looks good to me. Darren IMPORTANT WARNING: This message is intended for the use of the person or entity to which it is addressed and may contain information that is privileged and confidential, the disclosure of which is governed by applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this information is STRICTLY PROHIBITED. If you have received this message in error, please notify us immediately by calling (310) 423-6428 and destroy the related message. Thank You for your cooperation. |