From: Eric D. <ede...@sy...> - 2008-05-02 17:56:40
|
Hi Fredrik and everyone, thank you for thinking about these last few problems. It seems that there are several different ways in which one might want to reference the source scans for a summed scan. Based on what's been said so far, here are my thoughts and proposal. What do you think? Currently for acquisitionList we have (in XML/XSD hybrid): --------------- <acquisitionList count="2"> <cvParam cvLabel="MS" accession="MS:1000571" name="sum of spectra"/> <acquisition number="xs:int" sourceFileRef="xs:IDREF" spectrumRef="xs:IDREF"> <cvParam cvLabel="MS" (optional child of scan attribute)/> </acquisition> <acquisition number="xs:int" sourceFileRef="xs:IDREF" spectrumRef="xs:IDREF"> </acquisitionList> --------------- Frederik suggests spectrumRef -> xs:string externalSpectrumID="xs:string" This brought up the question of how would sourceFileRef reference itself if everything were in the same file? Rune points out that we want nativeID references, like for Waters: function1scan2, func1scan2, 1.2 or .... Darren suggests: externalSpectrumID="URI" externalNativeID="xs:string" ------------------------------------------- So, I suggest something like this (in XML/XSD hybrid): <acquisitionList count="2"> <cvParam cvLabel="MS" accession="MS:1000571" name="sum of spectra"/> (three possible options:) <acquisition spectrumRef="xs:IDREF"> <acquisition nativeID="xs:string"> <acquisition sourceFileRef="xs:IDREF" externalSpectrumID="xs:string"> <acquisition </acquisitionList> --- Example 1: (the current spectrum is a sum of two scans which are also present in the current file as ids S57 and S58.) <acquisitionList count="2"> <cvParam cvLabel="MS" accession="MS:1000571" name="sum of spectra"/> <acquisition spectrumRef="S57"> <acquisition spectrumRef="S58"> </acquisitionList> --- Example 2: (the current spectrum is a sum of two scans which had nativeIDs func1scan19 and func1scan20, the exact location of which are not specifiable) <acquisitionList count="2"> <cvParam cvLabel="MS" accession="MS:1000571" name="sum of spectra"/> <acquisition nativeID="func1scan19"> <acquisition nativeID="func1scan20"> </acquisitionList> --- Example 3: (the current spectrum is a sum of two scans which are explicitly referenced externally by a specific file previously defined in the current document and with IDs in that other file) <acquisitionList count="2"> <cvParam cvLabel="MS" accession="MS:1000571" name="sum of spectra"/> <acquisition sourceFileRef="mzMLsF01" externalSpectrumID="S57"> <acquisition sourceFileRef="mzMLsF01" externalSpectrumID="S58"> </acquisitionList> --- Also okay is 1+2: <acquisition spectrumRef="S57" nativeID="func1scan19"> <acquisition spectrumRef="S58" nativeID="func1scan20"> --- Also okay is 2+3: <acquisition nativeID="func1scan19" sourceFileRef="mzMLsF01" externalSpectrumID="S57"> <acquisition nativeID="func1scan20" sourceFileRef="mzMLsF01" externalSpectrumID="S58"> --- Thus, all four possible attributes are optional, and we would rely on the sematic validator to enforce: spectrumRef alone OR nativeID alone OR spectrumRef AND nativeID OR sourceFileRef AND externalSpectrumID OR sourceFileRef AND externalSpectrumID AND nativeID What do you think? A little unpleasant to have several different options, but I don't see how we could practically exclude any of the options. As a related side note, Matt also asks if we can handle the case where MS1 scans have been stripped out of a file, but the the MS2 scans still need to say something useful about their precursor scan (IDREF not possible). I have not checked this, but we should spend some time thinking about that once we have solved this problem. Thanks, Eric > -----Original Message----- > From: psi...@li... [mailto:psidev-ms-dev- > bo...@li...] On Behalf Of Darren Kessner > Sent: Friday, May 02, 2008 9:13 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] Spectra from summed acquisitions in mzML > 0.99.10 > > I think it's a bad idea to have a spectrumRef to a spectrum that isn't > in the file. We should be consistent in the use of internal > references, all of which use IDREF so that dangling references can be > caught during XML validation. > > External references, including those to spectra that have been removed > for space reasons, should be indicated specifically (e.g. with > externalSpectrumID or externalNativeID) so that the reader (human or > software) knows that they'll have to do some extra work to find the > referent. > > > Darren > > > > On May 2, 2008, at 6:17 AM, Matt Chambers wrote: > > > Perhaps we should follow the same logic of the spectrum element > > itself, > > where id is required but nativeID is optional. Thus, id is required > > for > > a spectrum reference and a reference to the nativeID would be optional > > (but recommended!). We can't use the same attribute to point to either > > id or nativeID because then it won't be known which is being referred > > to, unless I'm missing something in Fredrik's proposal. > > > > To deal with IDs that aren't in the file, I agree with Fredrik. Any > > time > > we have a reference that can be to a non-existent or external element, > > we can't use IDREF. We either need to switch to xlink (in which case > > IDREF would probably still be ok) or fall back to string. However, > > since > > I think that acquisitions should be able to reference spectra in the > > current file, we shouldn't change it to externalSpectrumID. We should > > just document that all spectrumRef and spectrumNativeID(Ref?) > > attributes > > may refer to a spectrum in the current file or in another one, and in > > the former case the spectrum might not actually be there (i.e. MSn > > spectra referencing precursor spectra that have been stripped out to > > conserve space). > > > > -Matt > > > > > > Rune Schjellerup Philosof wrote: > >> I think the format by which external file elements are reference > >> should > >> be defined. > >> For instance, a reference to a Waters raw file, should that be > >> function1scan2, func1scan2, 1.2 or .... > >> > >> -- > >> Rune > >> > >> Fredrik Levander wrote: > >> > >>> Hi All, > >>> > >>> There is an issue with the current mzML schema (0.99.10) when it > >>> comes > >>> to referencing the origin of acquisitions in an acquistionList of > >>> summed spectra. In most cases the referenced spectra will not be in > >>> the current mzML file, and thus the spectrumRef cannot be of the > >>> type > >>> xs:IDREF, but should be xs:string to also cover references to > >>> spectrum > >>> IDs in other mzML files, or native IDs in vendor files. > >>> > >>> The following would cover both internal and external spectrum > >>> referencing: > >>> > >>> > >>> <xs:complexType name="AcquisitionType"> > >>> <xs:annotation> > >>> <xs:documentation>Scan or acquisition from original raw > file used > >>> to create this peak list, as specified in sourceFile.</ > >>> xs:documentation> > >>> </xs:annotation> > >>> <xs:complexContent> > >>> <xs:extension base="dx:ParamGroupType"> > >>> <xs:attribute name="number" type="xs:int" > use="required"> > >>> <xs:annotation> > >>> <xs:documentation>A number for this > acquisition.</ > >>> xs:documentation> > >>> </xs:annotation> > >>> </xs:attribute> > >>> <xs:attribute name="spectrumRef" type="xs:string" > >>> use="required"> > >>> <xs:annotation> > >>> <xs:documentation>This attribute must > reference the 'id' > >>> attribute of the appropriate spectrum if found within an mzML > >>> file, or > >>> the native spectrum identifier in a raw file in another format. </ > >>> xs:documentation> > >>> </xs:annotation> > >>> </xs:attribute> > >>> <xs:attribute name="sourceFileRef" type="xs:IDREF" > >>> use="required"> > >>> <xs:annotation> > >>> <xs:documentation>This attribute must > reference the 'id' > >>> attribute of the appropriate sourceFile. It can also refer to the > >>> present mzML file.</xs:documentation> > >>> </xs:annotation> > >>> </xs:attribute> > >>> </xs:extension> > >>> </xs:complexContent> > >>> </xs:complexType> > >>> > >>> However, I am not sure if there are any use cases when both the > >>> summed > >>> spectrum and the original spectra are found in the same file. If > >>> not, > >>> the spectrumRef attribute should maybe be renamed to > >>> 'externalSpectrumID' or something else, since 'spectrumRef' somehow > >>> indicates that the spectrum is in the present file. > >>> > >>> If there is a need to reference spectra within the file the > >>> following > >>> may be an alternative (I think Darren proposed something similar): > >>> > >>> <xs:complexType name="AcquisitionType"> > >>> <xs:annotation> > >>> <xs:documentation>Scan or acquisition from original file > used to > >>> create this peak list. Either a spectrumRef or an externalSpectrumID > >>> plus sourceFileRef should be given .</xs:documentation> > >>> </xs:annotation> > >>> <xs:complexContent> > >>> <xs:extension base="dx:ParamGroupType"> > >>> <xs:attribute name="number" type="xs:int" > use="required"> > >>> <xs:annotation> > >>> <xs:documentation>A number for this > acquisition.</ > >>> xs:documentation> > >>> </xs:annotation> > >>> </xs:attribute> > >>> <xs:attribute name="spectrumRef" type="xs:IDREF" > use="optional"> > >>> <xs:annotation> > >>> <xs:documentation>This attribute must > reference the 'id' > >>> attribute of the appropriate spectrum. </xs:documentation> > >>> </xs:annotation> > >>> </xs:attribute> > >>> <xs:attribute name="externalSpectrumID" > type="xs:string" > >>> use="optional"> > >>> <xs:annotation> > >>> <xs:documentation>This attribute must > reference the 'id' > >>> attribute of the appropriate spectrum if found within an external > >>> mzML > >>> file, or the native spectrum identifier in a raw file in another > >>> format. </xs:documentation> > >>> </xs:annotation> > >>> </xs:attribute> > >>> <xs:attribute name="sourceFileRef" type="xs:IDREF" use="optional"> > >>> <xs:annotation> > >>> <xs:documentation>This attribute must > reference the 'id' > >>> attribute of the appropriate sourceFile. It can also refer to the > >>> present mzML file.</xs:documentation> > >>> </xs:annotation> > >>> </xs:attribute> > >>> </xs:extension> > >>> </xs:complexContent> > >>> </xs:complexType> > >>> > >>> So, main question: are there any use cases with the original scans > >>> and > >>> the summed spectrum in the same file, and is there in this case a > >>> need > >>> to distinguish clearly between external and internal referencing > >>> (second schema alternative)? > >>> > >>> A minor point is also that the documentation of the spectrum > >>> attribute > >>> nativeID should be updated to something like: > >>> > >>> The native identifier for the spectrum, used by the acquisition > >>> software. If the spectrum is reconstructed from more than one > >>> spectrum, the native identifier of the first acquisition in time > >>> should be used. > >>> > >>> Regards > >>> > >>> Fredrik > >>> > >>> > >> > > > > > > ------------------------------------------------------------------------ > - > > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > > Don't miss this year's exciting event. There's still time to save > > $100. > > Use priority code J8TL2D2. > > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/j av > aone > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > ------------------------------------------------------------------------ - > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/j av > aone > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |