Re: [Psidev-ms-dev] Spectra from summed acquisitions in mzML 0.99.10

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Eric and others,

All nice thoughts and proposals. I would also add an example 4 which is:

Example 4: (the current spectrum is a sum of two scans which had
nativeIDs func1scan19 and func1scan20, the source file is found in the source file list)

<acquisitionList count="2">
  <cvParam cvLabel="MS" accession="MS:1000571" name="sum of spectra"/>
  <acquisition nativeID="func1scan19" sourceFileRef="rawFile1" />
  <acquisition nativeID="func1scan20" sourceFileRef="rawFile1" />
</acquisitionList>

It would all be covered by the following xsd:

    <xs:complexType name="AcquisitionType">
        <xs:annotation>
            <xs:documentation>Scan or acquisition from original file 
used to 
create this peak list.  A spectrumRef or an externalSpectrumID or nativeID
plus sourceFileRef should be given .</xs:documentation>
        </xs:annotation>
        <xs:complexContent>
            <xs:extension base="dx:ParamGroupType">
                <xs:attribute name="number" type="xs:int" use="required">
                    <xs:annotation>
                        <xs:documentation>A number for this acquisition.</
xs:documentation>
                    </xs:annotation>
                </xs:attribute>
                <xs:attribute name="spectrumRef" type="xs:IDREF" 
use="optional">
                    <xs:annotation>
                        <xs:documentation>This attribute must reference 
the 'id' 
attribute of the appropriate spectrum. </xs:documentation>
                    </xs:annotation>
                </xs:attribute>
                <xs:attribute name="nativeID" type="xs:string" 
use="optional">
                    <xs:annotation>
                        <xs:documentation>This attribute references the 
native
spectrum identifier in a raw file. </xs:documentation>
                    </xs:annotation>
                </xs:attribute>
               <xs:attribute name="externalSpectrumID" type="xs:string" 
use="optional">
                    <xs:annotation>
                        <xs:documentation>This attribute must reference 
the 'id' 
attribute of the appropriate spectrum in an external mzML 
file. </xs:documentation>
                    </xs:annotation>
                </xs:attribute>
                <xs:attribute name="sourceFileRef" type="xs:IDREF" 
use="optional">
                    <xs:annotation>
                        <xs:documentation>This attribute must reference 
the 'id' 
attribute of the appropriate sourceFile.
                        </xs:documentation>
                    </xs:annotation>
                </xs:attribute>
            </xs:extension>
        </xs:complexContent>
    </xs:complexType>

The semantic validation would be to check that at least one of the 
attributes spectrumRef, externalSpectrumID (+sourceFileRef) or nativeID 
is given.

Or maybe there is even more to consider?

Fredrik

Eric Deutsch skrev:
> Hi Fredrik and everyone, thank you for thinking about these last few
> problems. It seems that there are several different ways in which one
> might want to reference the source scans for a summed scan. Based on
> what's been said so far, here are my thoughts and proposal. What do you
> think?
>
> Currently for acquisitionList we have (in XML/XSD hybrid):
> ---------------
> <acquisitionList count="2">
>   <cvParam cvLabel="MS" accession="MS:1000571" name="sum of spectra"/>
>   <acquisition number="xs:int" sourceFileRef="xs:IDREF"
> spectrumRef="xs:IDREF">
>     <cvParam cvLabel="MS" (optional child of scan attribute)/>
>   </acquisition>
>   <acquisition number="xs:int" sourceFileRef="xs:IDREF"
> spectrumRef="xs:IDREF">
> </acquisitionList>
> ---------------
>
> Frederik suggests spectrumRef -> xs:string
> externalSpectrumID="xs:string"
>
> This brought up the question of how would sourceFileRef reference itself
> if everything were in the same file?
>
> Rune points out that we want nativeID references, like for Waters:
>   function1scan2, func1scan2, 1.2 or ....
>
> Darren suggests:
>   externalSpectrumID="URI"
>   externalNativeID="xs:string"
>
> -------------------------------------------
>
> So, I suggest something like this (in XML/XSD hybrid):
>
> <acquisitionList count="2">
>   <cvParam cvLabel="MS" accession="MS:1000571" name="sum of spectra"/>
> (three possible options:)
>   <acquisition spectrumRef="xs:IDREF">
>   <acquisition nativeID="xs:string">
>   <acquisition sourceFileRef="xs:IDREF" externalSpectrumID="xs:string">
>   <acquisition 
> </acquisitionList>
>
> ---
>
> Example 1: (the current spectrum is a sum of two scans which are also
> present in the current file as ids S57 and S58.)
>
> <acquisitionList count="2">
>   <cvParam cvLabel="MS" accession="MS:1000571" name="sum of spectra"/>
>   <acquisition spectrumRef="S57">
>   <acquisition spectrumRef="S58">
> </acquisitionList>
>
> ---
>
> Example 2: (the current spectrum is a sum of two scans which had
> nativeIDs func1scan19 and func1scan20, the exact location of which
> are not specifiable)
>
> <acquisitionList count="2">
>   <cvParam cvLabel="MS" accession="MS:1000571" name="sum of spectra"/>
>   <acquisition nativeID="func1scan19">
>   <acquisition nativeID="func1scan20">
> </acquisitionList>
>
> ---
>
> Example 3: (the current spectrum is a sum of two scans which are
> explicitly referenced externally by a specific file previously
> defined in the current document and with IDs in that other file)
>
> <acquisitionList count="2">
>   <cvParam cvLabel="MS" accession="MS:1000571" name="sum of spectra"/>
>   <acquisition sourceFileRef="mzMLsF01" externalSpectrumID="S57">
>   <acquisition sourceFileRef="mzMLsF01" externalSpectrumID="S58">
> </acquisitionList>
>
> ---
>
> Also okay is 1+2:
>
>   <acquisition spectrumRef="S57" nativeID="func1scan19">
>   <acquisition spectrumRef="S58" nativeID="func1scan20">
>
> ---
>
> Also okay is 2+3:
>
>   <acquisition nativeID="func1scan19" sourceFileRef="mzMLsF01"
> externalSpectrumID="S57">
>   <acquisition nativeID="func1scan20" sourceFileRef="mzMLsF01"
> externalSpectrumID="S58">
>
> ---
>
> Thus, all four possible attributes are optional, and we would rely on
> the sematic validator to enforce:
>
>    spectrumRef alone
> OR nativeID alone
> OR spectrumRef AND nativeID
> OR sourceFileRef AND externalSpectrumID
> OR sourceFileRef AND externalSpectrumID AND nativeID
>
> What do you think? A little unpleasant to have several different
> options, but I don't see how we could practically exclude any of the
> options.
>
>
> As a related side note, Matt also asks if we can handle the case where
> MS1 scans have been stripped out of a file, but the the MS2 scans still
> need to say something useful about their precursor scan (IDREF not
> possible).
>
> I have not checked this, but we should spend some time thinking about
> that once we have solved this problem.
>
> Thanks,
> Eric
>
>
>   
>> -----Original Message-----
>> From: psi...@li...
>>     
> [mailto:psidev-ms-dev-
>   
>> bo...@li...] On Behalf Of Darren Kessner
>> Sent: Friday, May 02, 2008 9:13 AM
>> To: Mass spectrometry standard development
>> Subject: Re: [Psidev-ms-dev] Spectra from summed acquisitions in mzML
>> 0.99.10
>>
>> I think it's a bad idea to have a spectrumRef to a spectrum that isn't
>> in the file.  We should be consistent in the use of internal
>> references, all of which use IDREF so that dangling references can be
>> caught during XML validation.
>>
>> External references, including those to spectra that have been removed
>> for space reasons, should be indicated specifically (e.g. with
>> externalSpectrumID or externalNativeID) so that the reader (human or
>> software) knows that they'll have to do some extra work to find the
>> referent.
>>
>>
>> Darren
>>
>>
>>
>> On May 2, 2008, at 6:17 AM, Matt Chambers wrote:
>>
>>     
>>> Perhaps we should follow the same logic of the spectrum element
>>> itself,
>>> where id is required but nativeID is optional. Thus, id is required
>>> for
>>> a spectrum reference and a reference to the nativeID would be
>>>       
> optional
>   
>>> (but recommended!). We can't use the same attribute to point to
>>>       
> either
>   
>>> id or nativeID because then it won't be known which is being
>>>       
> referred
>   
>>> to, unless I'm missing something in Fredrik's proposal.
>>>
>>> To deal with IDs that aren't in the file, I agree with Fredrik. Any
>>> time
>>> we have a reference that can be to a non-existent or external
>>>       
> element,
>   
>>> we can't use IDREF. We either need to switch to xlink (in which case
>>> IDREF would probably still be ok) or fall back to string. However,
>>> since
>>> I think that acquisitions should be able to reference spectra in the
>>> current file, we shouldn't change it to externalSpectrumID. We
>>>       
> should
>   
>>> just document that all spectrumRef and spectrumNativeID(Ref?)
>>> attributes
>>> may refer to a spectrum in the current file or in another one, and
>>>       
> in
>   
>>> the former case the spectrum might not actually be there (i.e. MSn
>>> spectra referencing precursor spectra that have been stripped out to
>>> conserve space).
>>>
>>> -Matt
>>>
>>>
>>> Rune Schjellerup Philosof wrote:
>>>       
>>>> I think the format by which external file elements are reference
>>>> should
>>>> be defined.
>>>> For instance, a reference to a Waters raw file, should that be
>>>> function1scan2, func1scan2, 1.2 or ....
>>>>
>>>> --
>>>> Rune
>>>>
>>>> Fredrik Levander wrote:
>>>>
>>>>         
>>>>> Hi All,
>>>>>
>>>>> There is an issue with the current mzML schema (0.99.10) when it
>>>>> comes
>>>>> to referencing the origin of acquisitions in an acquistionList of
>>>>> summed spectra. In most cases the referenced spectra will not be
>>>>>           
> in
>   
>>>>> the current mzML file, and thus the spectrumRef cannot be of the
>>>>> type
>>>>> xs:IDREF, but should be xs:string to also cover references to
>>>>> spectrum
>>>>> IDs in other mzML files, or native IDs in vendor files.
>>>>>
>>>>> The following would cover both internal and external spectrum
>>>>> referencing:
>>>>>
>>>>>
>>>>> 	<xs:complexType name="AcquisitionType">
>>>>> 		<xs:annotation>
>>>>> 			<xs:documentation>Scan or acquisition from
>>>>>           
> original raw
>   
>> file used
>>     
>>>>> to create this peak list, as specified in sourceFile.</
>>>>> xs:documentation>
>>>>> 		</xs:annotation>
>>>>> 		<xs:complexContent>
>>>>> 			<xs:extension base="dx:ParamGroupType">
>>>>> 				<xs:attribute name="number"
>>>>>           
> type="xs:int"
>   
>> use="required">
>>     
>>>>> 					<xs:annotation>
>>>>> 						<xs:documentation>A
>>>>>           
> number for this
>   
>> acquisition.</
>>     
>>>>> xs:documentation>
>>>>> 					</xs:annotation>
>>>>> 				</xs:attribute>
>>>>> 				<xs:attribute name="spectrumRef"
>>>>>           
> type="xs:string"
>   
>>>>> use="required">
>>>>> 					<xs:annotation>
>>>>> 						<xs:documentation>This
>>>>>           
> attribute must
>   
>> reference the 'id'
>>     
>>>>> attribute of the appropriate spectrum if found within an mzML
>>>>> file, or
>>>>> the native spectrum identifier in a raw file in another format. </
>>>>> xs:documentation>
>>>>> 					</xs:annotation>
>>>>> 				</xs:attribute>
>>>>> 				<xs:attribute name="sourceFileRef"
>>>>>           
> type="xs:IDREF"
>   
>>>>> use="required">
>>>>> 					<xs:annotation>
>>>>> 						<xs:documentation>This
>>>>>           
> attribute must
>   
>> reference the 'id'
>>     
>>>>> attribute of the appropriate sourceFile. It can also refer to the
>>>>> present mzML file.</xs:documentation>
>>>>> 					</xs:annotation>
>>>>> 				</xs:attribute>
>>>>> 			</xs:extension>
>>>>> 		</xs:complexContent>
>>>>> 	</xs:complexType>
>>>>>
>>>>> However, I am not sure if there are any use cases when both the
>>>>> summed
>>>>> spectrum and the original spectra are found in the same file. If
>>>>> not,
>>>>> the spectrumRef attribute should maybe be renamed to
>>>>> 'externalSpectrumID' or something else, since 'spectrumRef'
>>>>>           
> somehow
>   
>>>>> indicates that the spectrum is in the present file.
>>>>>
>>>>> If there is a need to reference spectra within the file the
>>>>> following
>>>>> may be an alternative (I think Darren proposed something similar):
>>>>>
>>>>> 	<xs:complexType name="AcquisitionType">
>>>>> 		<xs:annotation>
>>>>> 			<xs:documentation>Scan or acquisition from
>>>>>           
> original file
>   
>> used to
>>     
>>>>> create this peak list. Either a spectrumRef or an
>>>>>           
> externalSpectrumID
>   
>>>>> plus sourceFileRef should be given .</xs:documentation>
>>>>> 		</xs:annotation>
>>>>> 		<xs:complexContent>
>>>>> 			<xs:extension base="dx:ParamGroupType">
>>>>> 				<xs:attribute name="number"
>>>>>           
> type="xs:int"
>   
>> use="required">
>>     
>>>>> 					<xs:annotation>
>>>>> 						<xs:documentation>A
>>>>>           
> number for this
>   
>> acquisition.</
>>     
>>>>> xs:documentation>
>>>>> 					</xs:annotation>
>>>>> 				</xs:attribute>
>>>>> 				<xs:attribute name="spectrumRef"
>>>>>           
> type="xs:IDREF"
>   
>> use="optional">
>>     
>>>>> 					<xs:annotation>
>>>>> 						<xs:documentation>This
>>>>>           
> attribute must
>   
>> reference the 'id'
>>     
>>>>> attribute of the appropriate spectrum. </xs:documentation>
>>>>> 					</xs:annotation>
>>>>> 				</xs:attribute>
>>>>> 				<xs:attribute name="externalSpectrumID"
>>>>>           
>> type="xs:string"
>>     
>>>>> use="optional">
>>>>> 					<xs:annotation>
>>>>> 						<xs:documentation>This
>>>>>           
> attribute must
>   
>> reference the 'id'
>>     
>>>>> attribute of the appropriate spectrum if found within an external
>>>>> mzML
>>>>> file, or the native spectrum identifier in a raw file in another
>>>>> format. </xs:documentation>
>>>>> 					</xs:annotation>
>>>>> 				</xs:attribute>
>>>>> <xs:attribute name="sourceFileRef" type="xs:IDREF" use="optional">
>>>>> 					<xs:annotation>
>>>>> 						<xs:documentation>This
>>>>>           
> attribute must
>   
>> reference the 'id'
>>     
>>>>> attribute of the appropriate sourceFile. It can also refer to the
>>>>> present mzML file.</xs:documentation>
>>>>> 					</xs:annotation>
>>>>> 				</xs:attribute>
>>>>> 			</xs:extension>
>>>>> 		</xs:complexContent>
>>>>> 	</xs:complexType>
>>>>>
>>>>> So, main question: are there any use cases with the original scans
>>>>> and
>>>>> the summed spectrum in the same file, and is there in this case a
>>>>> need
>>>>> to distinguish clearly between external and internal referencing
>>>>> (second schema alternative)?
>>>>>
>>>>> A minor point is also that the documentation of the spectrum
>>>>> attribute
>>>>> nativeID should be updated to something like:
>>>>>
>>>>> The native identifier for the spectrum, used by the acquisition
>>>>> software. If the spectrum is reconstructed from more than one
>>>>> spectrum, the native identifier of the first acquisition in time
>>>>> should be used.
>>>>>
>>>>> Regards
>>>>>
>>>>> Fredrik
>>>>>
>>>>>
>>>>>           
>>>
>>>       
> ------------------------------------------------------------------------
>   
>> -
>>     
>>> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
>>> Don't miss this year's exciting event. There's still time to save
>>> $100.
>>> Use priority code J8TL2D2.
>>>
>>>       
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/j
> av
>   
>> aone
>>     
>>> _______________________________________________
>>> Psidev-ms-dev mailing list
>>> Psi...@li...
>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev
>>>       
>>
>>     
> ------------------------------------------------------------------------
> -
>   
>> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
>> Don't miss this year's exciting event. There's still time to save
>>     
> $100.
>   
>> Use priority code J8TL2D2.
>>
>>     
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/j
> av
>   
>> aone
>> _______________________________________________
>> Psidev-ms-dev mailing list
>> Psi...@li...
>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev
>>     
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
> Don't miss this year's exciting event. There's still time to save $100. 
> Use priority code J8TL2D2. 
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
> _______________________________________________
> Psidev-ms-dev mailing list
> Psi...@li...
> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev
>