Re: [Psidev-ms-dev] [proteowizard-developer] mzML output and the
mzML specification: problem, maybe?
From: Eric D. <ede...@sy...> - 2015-05-05 04:09:02
|
Hi Bryson, I believe that your impression is correct. In practice, most documents use the spectrumRef. The alternate use of (sourceFileRef and externalSpectrumID) is rare, but supported. My very vague recollection is that it is not possible in XML schema to enforce mandatory attribute (A and B) or C. So our solution is to make them all optional. Originally we had only spectrumRefs. But then someone what the ability to refer to spectra external to the document. There is a semantic validator software that is supposed to flag incorrect use of this as an error, but I am not certain that it does. And it is not universally used. This is a good lesson in “any disagreeable shortcut left unblocked will be taken!”. So is your question: is the fact that some writers like Bruker software do not write spectrumRefs (or the alternative) a problem? Regards Eric *From:* Gibbons, Bryson C [mailto:bry...@pn...] *Sent:* Monday, May 4, 2015 3:19 PM *To:* pro...@li... *Subject:* [proteowizard-developer] mzML output and the mzML specification: problem, maybe? In some of my improvements to the mzRefinery filter (in particular for adjusting precursor metadata using the scan start time from the precursor itself) I ended up breaking part of its functionality: updating precursor metadata for MSn scans. I hadn’t noticed this previously because I was primarily using it on mzML files produced from Thermo instruments, but when I tried it again recently on some Bruker QqTOF I found that precursor metadata was not being updated. After some testing to determine why this was happening, I found that while the Thermo reader populates precursor element spectrumRef attribute, the Bruker reader (and apparently most of the other vendor readers as well) do not, instead leaving no attributes on the precursor element. I’m assuming that a good part of this is probably because the vendor files/dlls don’t provide a good process to determine which scan is the precursor. The mzML 1.1.0 specification for the <precursor> element states: ( http://www.peptideatlas.org/tmp/mzML1.1.0.html#precursor) *Attribute Name* *Data Type* *Use* *Definition* externalSpectrumID xs:string optional For precursor spectra that are external to this document, this string MUST correspond to the 'id' attribute of a spectrum in the external document indicated by 'sourceFileRef'. sourceFileRef xs:IDREF optional For precursor spectra that are external to this document, this attribute MUST reference the 'id' attribute of a sourceFile representing that external document. spectrumRef xs:string optional For precursor spectra that are local to this document, this attribute MUST be used to reference the 'id' attribute of the spectrum corresponding to the precursor spectrum. Though all are listed as optional attributes, the descriptions leave the impression of “you must either use externalSpectrumID and sourceFileRef, or use spectrumRef”. I don’t know if that is what it is supposed to mean, because the mzML 1.0.0 specification left the “spectrumRef” attribute optional with the definition of “Reference to the id attribute of the spectrum from which the precursor was selected.” Is this a problem, or am I misreading the mzML specification? Thank you, Bryson Gibbons |
Re: [Psidev-ms-dev] [proteowizard-developer] mzML output and the
mzML specification: problem, maybe?
From: Steffen N. <sne...@ip...> - 2015-05-05 12:12:19
Attachments:
signature.asc
|
Hi, On Mo, 2015-05-04 at 21:08 -0700, Eric Deutsch wrote: ... > So is your question: is the fact that some writers like Bruker > software do not write spectrumRefs (or the alternative) a problem? we're using Bruker QqTOF as well, and the issue might lay deeper than just the writer. We have some mzML files which are MS2 *only*, so the *is no* precursor spectrum at all. I am unsure if you can do MS2 or MSn only measurements in Orbitraps, but it might be that there are cases where spectrumRef must be left out. Yours, Steffen -- IPB Halle AG Massenspektrometrie & Bioinformatik Dr. Steffen Neumann http://www.IPB-Halle.DE Weinberg 3 http://msbi.bic-gh.de 06120 Halle Tel. +49 (0) 345 5582 - 1470 +49 (0) 345 5582 - 0 sneumann(at)IPB-Halle.DE Fax. +49 (0) 345 5582 - 1409 |
Re: [Psidev-ms-dev] [proteowizard-developer] mzML output and the
mzML specification: problem, maybe?
From: Gibbons, B. C <bry...@pn...> - 2015-05-05 17:21:01
|
I do mean that when I use msconvert to read a non-Thermo raw binary file (in this case, a Bruker .d folder) and write mzML, the resulting mzML does not have precursor spectrumRefs. The reason I referenced "ProteoWizard readers", is that the responsibility for adding that information should be with the vendor dll interface code, or "vendor readers". The code that adds the spectrumRef to the Thermo files is part of the code that reads the spectra using the Thermo dlls. I can definitely understand the problem when the mzML file is MS2 only, which you can create using msconvert. Two options could be to still have the spectrumRef there (if msconvert created it) even though the precursor spectrum is not in the mzML file (which unfortunately could break something that assumes that if it is present, it is in the same file); or the external id could be used, which presents the same set of issues, in addition to the external reference requirements. It could also not have the spectrumRef, as it currently is. If you create a MS2-only mzML file from Thermo data it still contains the spectrumRef (as well as saying the file still includes MS1 spectra, which is a different problem). Bryson -----Original Message----- From: Steffen Neumann [mailto:sne...@ip...] Sent: Tuesday, May 05, 2015 4:49 AM To: pro...@li... Cc: psi...@li... Subject: Re: [proteowizard-developer] mzML output and the mzML specification: problem, maybe? Hi, On Mo, 2015-05-04 at 21:08 -0700, Eric Deutsch wrote: ... > So is your question: is the fact that some writers like Bruker > software do not write spectrumRefs (or the alternative) a problem? we're using Bruker QqTOF as well, and the issue might lay deeper than just the writer. We have some mzML files which are MS2 *only*, so the *is no* precursor spectrum at all. I am unsure if you can do MS2 or MSn only measurements in Orbitraps, but it might be that there are cases where spectrumRef must be left out. Yours, Steffen -- IPB Halle AG Massenspektrometrie & Bioinformatik Dr. Steffen Neumann http://www.IPB-Halle.DE Weinberg 3 http://msbi.bic-gh.de 06120 Halle Tel. +49 (0) 345 5582 - 1470 +49 (0) 345 5582 - 0 sneumann(at)IPB-Halle.DE Fax. +49 (0) 345 5582 - 1409 |
Re: [Psidev-ms-dev] [proteowizard-developer] mzML output and the
mzML specification: problem, maybe?
From: Eric D. <ede...@sy...> - 2015-05-06 04:49:06
|
Thanks, Bryson, I think I understand now. As we've seen, the situation is complex. I think the intent when we designed it is that if the precursors are present in the file they must be referenced (although this is not enforced in XML schema because we couldn't). If there are known precursors in the raw file that are not msconverted by choice, they really should have proper external references, but I am not surprised if they don't. And yes, there are reasonably use cases where you legitimately have neither. And the schema as designed allows this. So, I think there's not much to be done. The design is okay and flexible, but the enforcement is such that available information may be left out because it is sometimes truly not available. If you have very specific examples that you can provide that show that a raw file does have links between precursor and product spectra that are not being written to mzML, then we might be able to find someone motivated to fix the problem. But otherwise, I think the current state is the best we can do. I imagine that msconvert should not be guessing which is the precursor spectrum if the information is not available in the raw files. Regards, Eric -----Original Message----- From: Gibbons, Bryson C [mailto:bry...@pn...] Sent: Tuesday, May 5, 2015 10:21 AM To: pro...@li... Cc: psi...@li... Subject: Re: [proteowizard-developer] mzML output and the mzML specification: problem, maybe? I do mean that when I use msconvert to read a non-Thermo raw binary file (in this case, a Bruker .d folder) and write mzML, the resulting mzML does not have precursor spectrumRefs. The reason I referenced "ProteoWizard readers", is that the responsibility for adding that information should be with the vendor dll interface code, or "vendor readers". The code that adds the spectrumRef to the Thermo files is part of the code that reads the spectra using the Thermo dlls. I can definitely understand the problem when the mzML file is MS2 only, which you can create using msconvert. Two options could be to still have the spectrumRef there (if msconvert created it) even though the precursor spectrum is not in the mzML file (which unfortunately could break something that assumes that if it is present, it is in the same file); or the external id could be used, which presents the same set of issues, in addition to the external reference requirements. It could also not have the spectrumRef, as it currently is. If you create a MS2-only mzML file from Thermo data it still contains the spectrumRef (as well as saying the file still includes MS1 spectra, which is a different problem). Bryson -----Original Message----- From: Steffen Neumann [mailto:sne...@ip...] Sent: Tuesday, May 05, 2015 4:49 AM To: pro...@li... Cc: psi...@li... Subject: Re: [proteowizard-developer] mzML output and the mzML specification: problem, maybe? Hi, On Mo, 2015-05-04 at 21:08 -0700, Eric Deutsch wrote: ... > So is your question: is the fact that some writers like Bruker > software do not write spectrumRefs (or the alternative) a problem? we're using Bruker QqTOF as well, and the issue might lay deeper than just the writer. We have some mzML files which are MS2 *only*, so the *is no* precursor spectrum at all. I am unsure if you can do MS2 or MSn only measurements in Orbitraps, but it might be that there are cases where spectrumRef must be left out. Yours, Steffen -- IPB Halle AG Massenspektrometrie & Bioinformatik Dr. Steffen Neumann http://www.IPB-Halle.DE Weinberg 3 http://msbi.bic-gh.de 06120 Halle Tel. +49 (0) 345 5582 - 1470 +49 (0) 345 5582 - 0 sneumann(at)IPB-Halle.DE Fax. +49 (0) 345 5582 - 1409 -------------------------------------------------------------------------- ---- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ proteowizard-developer mailing list pro...@li... https://lists.sourceforge.net/lists/listinfo/proteowizard-developer |