From: Fredrik L. <Fre...@im...> - 2008-02-19 15:05:36
|
Hi dta fans, I agree completely with 1 and 2. For 3 (several possible charge states), there seems to be two possibilities: a) Do not write the chargestate at all into the mzML in cases where there are multiple guesses. b) Put all the proposed values into one precursor. See line 206-207 at: http://trac.thep.lu.se/trac/fp6-prodac/browser/trunk/mzML/ADH071030_002.mzML?rev=26 Anyone else who would prefer either of a or b? At least some search engines would try both 2+ and 3+ if there is no charge state given in the file, so maybe solution a is better? Or does b have advantages? Fredrik Eric Deutsch wrote: > Hi everyone, regarding list dta to mzML conversion, here are my > thoughts: > > 1) The current rule is that scanNumbers must be unique within a file and > always increasing, although not necessarily sequentially. IDs must be > unique within a file. I don't think should change for conversion from > dta. > > 2) I would only encode the spectrum once, since as you say it is just > one spectrum. > > 3) I don't even see why you need two precursors. When we convert dta to > mzXML, duplicates were dropped and the actual observed precursor mass > was put in the mzXML. Thus you are "losing" the information that the > spectrum could be charge 2 or 3. However, this information was guessed > in the first place, and most software I know that extracts a spectrum > with no charge information will apply some rules to decide on what > charges to search. So, I suggest that the conversion from dta to mzML is > just the reverse of mzML to dta. One spectrum per scan. If only 1 charge > (dta file) is provided, encode it at the user's discretion. If more than > 1 charge (dta file) is provided, encode the spectrum without any charge > information. For LCQ data, it would probably be reasonable to not encode > *any* charge information in the mzML file at all. Because it doesn't > come with any in the first place. > > We will be adding the functionality for multiple precursors anyway for > the case when you have multiple peaks in your selection window as seen, > e.g., in an orbitrap. I suppose there's no reason you couldn't take > advantage of that to encode both the 2+ and 3+ although I wouldn't > recommend it. > > Eric > > > > >> -----Original Message----- >> From: psi...@li... >> > [mailto:psidev-ms-dev- > >> bo...@li...] On Behalf Of Fredrik Levander >> Sent: Thursday, February 14, 2008 9:55 AM >> To: Mass spectrometry standard development >> Subject: Re: [Psidev-ms-dev] DTA to mzML conversion >> >> Hi Matt and Rune, >> >> Thanks for the comments. I agree that the important information is the >> scan number, since this is what you would like to look up in the raw >> data file. And it doesn't make much sense to have the scan repeated >> twice in the file, so I think we'll go for solution 2 and just keep >> > the > >> sourceFileRef to one of the files. >> However, since we do have unique spectrum ids there should not be any >> real need to stick to the unique scan number requirement from what I >> > got > >> from the indexing discussion, even if it is still in the specs (?). >> Couldn't there be cases when data is collected in different channels >> where the scan numbers are the same in different channels? >> >> Regards >> >> Fredrik >> >> Matthew Chambers skrev: >> >>> Hi Fredrik, >>> >>> Our group has a converter that does this conversion (to mzXML or >>> > mzData > >>> currently, not yet mzML, but they all have the same uniqueness >>> constraints on scan numbers and they all support multiple precursors >>> > at > >>> least in theory); we went with solution 2 because solution 1 is >>> > invalid > >>> for all the XML formats (i.e. it would need a schema change and that >>> change isn't likely to happen, whereas multiple sourceFileRefs would >>> > be > >>> understandable). As I understand it, sourceFileRef is optional >>> ("<xs:attribute name="sourceFileRef" type="xs:anyURI" >>> > use="optional">"), > >>> so if you can't or don't want to encode it correctly, just don't >>> > include > >>> it. Our converter doesn't even bother to include the sourceFileRefs >>> > to > >>> the DTAs, it's not helpful information IMO. As long as the >>> > conversion is > >>> done without data loss, get it over with and then have mercy on your >>> filesystem by deleting the DTAs. ;) >>> >>> -Matt >>> >>> >>> Fredrik Levander wrote: >>> >>> >>>> Hi All, >>>> >>>> In the Proteios platform we're including converters from some peak >>>> > list > >>>> formats to mzData, and now also to mzML. It is clearly not optimal >>>> > with > >>>> such conversion since instrument settings etcetera are lost. >>>> > However, I > >>>> guess there will be need for such converters if someone wants to >>>> > use > >>>> their old instruments with manufacturer peak picking algorithms. >>>> >>>> There are sample files generated from DTAs and ProteinLynx by the >>>> converters (0.99.1) at: >>>> http://trac.thep.lu.se/trac/fp6-prodac/browser/trunk/mzML >>>> >>>> The converters will be part of the new release of the Proteios >>>> > Software > >>>> Environment, but if anyone would like to try them with their files, >>>> there is a standalone package (mzMLconverters.zip) at the address >>>> > above > >>>> which should work under Windows/Linux/OSX with Java 1.5 or higher. >>>> >>>> Please notice that the output files are not schematically valid >>>> > since > >>>> some terms are still missing in the CV. >>>> >>>> For the conversion of multiple DTA files to one mzML file there is >>>> > a > >>>> small problem which is related to how lcq_dta generates dta files: >>>> > If > >>>> the charge state of the precursor can not be determined, a spectrum >>>> > can > >>>> result in two DTA files which are identical apart from the >>>> > precursor. > >>>> There are two solutions on how to handle this: >>>> 1) Two spectra, with the same scanNumber but different spectrum Ids >>>> >> (The >> >>>> solution used by the current converter) >>>> 2) One spectrum, two precursors. However, this will not work with >>>> > the > >>>> current schema since there can only be one sourceFileRef for a >>>> >> spectrum. >> >>>> Do you all think solution 1 is fine, or is there a better solution? >>>> Solution 2 seems to need schema changes. >>>> Other comments are also welcome >>>> >>>> Thanks, >>>> >>>> Fredrik >>>> >>>> >>>> > ----------------------------------------------------------------------- > >> -- >> >>>> This SF.net email is sponsored by: Microsoft >>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>> _______________________________________________ >>>> Psidev-ms-dev mailing list >>>> Psi...@li... >>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>> >>>> >>>> >>>> >>> > ------------------------------------------------------------------------ > >> - >> >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> Psidev-ms-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>> >>> >> > ------------------------------------------------------------------------ > - > >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |