From: Jones, A. <And...@li...> - 2008-08-01 13:57:38
|
Hi Matt, > I'm (still) not sure what the use case is for all the "extra" > measurements that seem to me to be redundant with the label, but if > reporting those is the decision of the implementor, I'm happy with > having that capability. On the call, it was discussed that some implementers may wish to report additional scores associated with particular peaks e.g. why a particular peak was identified as a particular ion (for example related to the abundance of an adjacent peak). I think this is a niche use case but this proposal would allow it to be done. Also see the other values in comment 5 of issue 28 e.g. product ion m/z error > My modified proposal (with some extra compactness possible by opting to > leave out the extra measurements): > > <Fragmentation> > > <IonType cvLabel="Waters" accession="PLGS:00035" name="y ion" > > values="1 2 3"/> > > <IonType cvLabel="Waters" accession="PLGS:00032" name="b ion" > > values="4 5 6"/> > > </Fragmentation> We can definitely do it this way in the XSD. The only minor advantage of doing it the other way is that the format (xsd:list) can be verified by an XML parser rather than relying on the validator software, either is fine though. > My modified proposal (with some extra compactness possible by opting to > leave out the extra measurements): > > <Fragmentation> > > <IonType cvLabel="Waters" accession="PLGS:00035" name="y ion" > > values="1 2 3"/> > > <IonType cvLabel="Waters" accession="PLGS:00032" name="b ion" > > values="4 5 6"/> > > </Fragmentation> I agree this is fine for a viewer, in that it tells you the expected ion types that were detected, but it doesn't tell you which observed ion types were matched to them. If multiple peaks fall in the same range near an expected peak, you've lost information. > Although even then, it's about 10 times less compact than the formatted > attribute proposal, but which would only work by explicitly denying the > storage of extra measurements. For reference, using the same conditions > for my approximate calculations above, MyriMatch's output by the > formatted attribute method would have about 1.5mb of fragmentation info. > > fragmentEvidence="y1 y2 y3 b4 b5 b6" Although initially I was in favour of this approach, it suffers from the problem that we have to decide now (in the schema documentation) on all ion types and definitions. I'm not even sure if there is universal agreement on what constitutes each ion type, see comment from David: >"What about internal fragments, immonium ions, side chain cleavages?" I don't even know what an immonium ion is so I don't want to have to sign off on a perfect list of all ion types in the analysisXML documentation! By using ontology terms we can leave flexibility in there so that implementers can report whatever ion types they like. IMO being as compact as possible is not really a big deal? Cheers Andy > -----Original Message----- > From: psi...@li... [mailto:psidev-pi-dev- > bo...@li...] On Behalf Of Matt Chambers > Sent: 01 August 2008 14:24 > To: psi...@li... > Subject: Re: [Psidev-pi-dev] Fragmentation Ions > > Hi all, > > I'm (still) not sure what the use case is for all the "extra" > measurements that seem to me to be redundant with the label, but if > reporting those is the decision of the implementor, I'm happy with > having that capability. Some rough calculation tells me that if I was to > write this format from MyriMatch with 10k spectra with 5 results each > and an average of 2 y ions and 2 b ions matched, that would be about > 16mb of fragmentation data (leaving out the "extra" measurements). That > is a lot better than where we were before. But I think we can compact it > some more. IIRC, other places in the schema have elements that > essentially subclass cvParam, is that right? It would compact things to > make IonType such a subclass with the intention that the accession > attribute point to an ion CV term and an extra attribute would > correspond with the FragArrayIndex. > > The current proposal: > > <Fragmentation> > > <IonType> > > <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion"/> > > <FragArrayIndex values="1 2 3"/> > > </IonType> > > <IonType> > > <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion"/> > > <FragArrayIndex values="4 5 6"/> > > </IonType> > > </Fragmentation> > > My modified proposal (with some extra compactness possible by opting to > leave out the extra measurements): > > <Fragmentation> > > <IonType cvLabel="Waters" accession="PLGS:00035" name="y ion" > > values="1 2 3"/> > > <IonType cvLabel="Waters" accession="PLGS:00032" name="b ion" > > values="4 5 6"/> > > </Fragmentation> > > This method would not interfere with the capability of having extra > measurements, and it provides roughly 30% more compact way of annotating > an ion series. > > Although even then, it's about 10 times less compact than the formatted > attribute proposal, but which would only work by explicitly denying the > storage of extra measurements. For reference, using the same conditions > for my approximate calculations above, MyriMatch's output by the > formatted attribute method would have about 1.5mb of fragmentation info. > > fragmentEvidence="y1 y2 y3 b4 b5 b6" > > -Matt > > > > Jones, Andy wrote: > >> If this is describing three Y-H20 ions, 3, 8 and 10 (i.e. all of the > >> Y-H20 ions for this peptide identification) then the attribute > >> value="3" on the cvParam element should be removed - or have I > >> misunderstood how this works? > >> > > > > Correct, my mistake. The example says we have found y3-H2O y8-H2O and > y10-H2O, the cvParam should not have had the value > > > > > > <Fragmentation> > > <IonType> > > <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion - > H2O"/> > > <FragArrayIndex values = "3 8 10"/> > > <FragArray Measure_ref = "m1" values = "379.2215 457.12345 > 540.234"/> > > <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> > > <!-- and so on for other measures as defined in the > FragmentationTable --> > > </IonType> > > <IonType> > > <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion"/> > > <FragArrayIndex values = "2 12 14"/> > > <FragArray Measure_ref = "m1" values = "560.153 859.111 > 945.653"/> > > <FragArray Measure_ref = "m2" values = "502.0 330.5 559.5"/> > > <!-- and so on for other measures as defined in the > FragmentationTable --> > > </IonType> > > > > </Fragmentation> > > > > > > > >> Please excuse me for stating the obvious, but... there is no reason > >> why the pointers m1, m2, m3, m4 could not be more human readable, so > >> changed in this example to mz, inten, mz_error, ret_error for example. > >> (To help implementors understand the mechanism). > >> > > > > Good suggestion. > > > > Cheers > > Andy > > > > > > > > > >> -----Original Message----- > >> From: phi...@go... [mailto:phi...@go...] > On > >> Behalf Of Phil Jones @ EBI > >> Sent: 01 August 2008 11:23 > >> To: Jones, Andy; psi...@li... > >> Subject: Re: [Psidev-pi-dev] Fragmentation Ions > >> > >> Hi Andy, > >> > >> This looks really good - both flexible and compact. > >> > >> Just to clarify - in your example: > >> > >> <IonType> > >> <cvParam cvLabel="Waters" accession="PLGS:00035" > >> name="y ion -H2O" value="3"/> > >> <FragArrayIndex values = "3 8 10"/> > >> <FragArray Measure_ref = "m1" values = "379.2215 > >> 457.1234 540.234"/> > >> <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> > >> <!-- and so on for other measures as defined in the > >> FragmentationTable --> > >> </IonType> > >> > >> If this is describing three Y-H20 ions, 3, 8 and 10 (i.e. all of the > >> Y-H20 ions for this peptide identification) then the attribute > >> value="3" on the cvParam element should be removed - or have I > >> misunderstood how this works? > >> > >> Please excuse me for stating the obvious, but... there is no reason > >> why the pointers m1, m2, m3, m4 could not be more human readable, so > >> changed in this example to mz, inten, mz_error, ret_error for example. > >> (To help implementors understand the mechanism). > >> > >> best regards, > >> > >> Phil. > >> > >> > >> > >> 2008/8/1 Jones, Andy <And...@li...>: > >> > >>> Hi all, > >>> > >>> Here's a proposal for fragmentation ions as discussed on the call that's > halfway > >>> > >> between using cvParams for all values and using an array based encoding. I > think > >> it's pretty flexible and concise. > >> > >>> First up, setup a FragmentationTable for the entire list of the spectra, which > says > >>> > >> the kinds of measures you're going to report lower down: > >> > >>> <SpectrumIdentificationList id="MASCOT_results"> > >>> <FragmentationTable> > >>> <Measures> > >>> <Measure id = "m1"> > >>> <cvParam cvLabel="Waters" accession="PLGS:00024" > >>> > >> name="product ion m/z"/> > >> > >>> </Measure> > >>> <Measure id = "m2"> > >>> <cvParam cvLabel="Waters" accession="PLGS:00025" > >>> > >> name="product ion intensity"/> > >> > >>> </Measure> > >>> <Measure id = "m3"> > >>> <cvParam cvLabel="Waters" accession="PLGS:00026" > >>> > >> name="product ion m/z error"/> > >> > >>> </Measure> > >>> <Measure id = "m4"> > >>> <cvParam cvLabel="Waters" accession="PLGS:00027" > >>> > >> name="product ion retention time error"/> > >> > >>> </Measure> > >>> </Measures> > >>> </FragmentationTable> > >>> > >>> Then for each SpectrumIdentificationItem, you reference back to these > >>> > >> Measures > >> > >>> <SpectrumIdentificationItem id="SEQ_spec1_pep1" > Peptide_ref="prot1_pep1" > >>> > >> chargeState="1"> > >> > >>> <PeptideEvidence id="PE1_SEQ_spec1_pep1" start="67" pre="-" > end="79" > >>> > >> isDecoy="false" /> > >> > >>> ... > >>> > >>> <Fragmentation> > >>> <IonType> > >>> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion - > >>> > >> H2O" value="3"/> > >> > >>> <FragArrayIndex values = "3 8 10"/> > >>> <FragArray Measure_ref = "m1" values = "379.2215 457.1234 > >>> > >> 540.234"/> > >> > >>> <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> > >>> <!-- and so on for other measures as defined in the > >>> > >> FragmentationTable --> > >> > >>> </IonType> > >>> <IonType> > >>> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > >>> > >> value="4"/> > >> > >>> <FragArrayIndex values = "2 12 14"/> > >>> <FragArray Measure_ref = "m1" values = "560.153 859.111 > >>> > >> 945.653"/> > >> > >>> <FragArray Measure_ref = "m2" values = "502.0 330.5 559.5"/> > >>> <!-- and so on for other measures as defined in the > >>> > >> FragmentationTable --> > >> > >>> </IonType> > >>> > >>> </Fragmentation> > >>> > >>> > >>> Each array contains space separated values (i.e. an xsd:list). The > FragArrayIndex > >>> > >> tells you which ions you've found i.e. for the second IonType we have b2 b12 > and > >> b14 which have the m/z and intensity values in the m1 and m2 arrays. This will > >> save a lot of space if there are many ions of the same type in each array and I > >> think it is fairly easy to read as well. Slightly more space could be saved by > >> defining the ion types in the FragmentationTable but not much really once > you've > >> added a reference back up to it. > >> > >>> Cheers > >>> Andy > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>>> -----Original Message----- > >>>> From: psi...@li... [mailto:psidev-pi-dev- > >>>> bo...@li...] On Behalf Of Matthew Chambers > >>>> Sent: 18 July 2008 16:00 > >>>> To: psi...@li... > >>>> Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is > currently > >>>> handled in PRIDE (Issue 28) > >>>> > >>>> I also agree that anything beyond an array is far too verbose. To answer > >>>> this question, I think we need to decide the scope of the problem. What > >>>> do we want fragment ion information to represent? I think analysis > >>>> software is too diverse to use it for anything more than basic > >>>> annotation, but basic annotation is important. If there are ways people > >>>> want it to be usable beyond that, speak up. :) > >>>> > >>>> For basic annotation, all I think is needed is the fragment type, series > >>>> number, charge state, and possibly any modification like a neutral loss > >>>> or radical. The array can be an attribute or text node. We can use a > >>>> grammar for each term, where each term represents an ion and terms are > >>>> space delimited. The grammar might look like: <a|b|c|x|y|z><# between 1 > >>>> and peptide_length>[<+|-><formula>][,(<+|-><charge>] > >>>> We could make the charge part mandatory or if it was optional, assume a > >>>> +1 charge (or possibly allow the charge to be based on the polarity of > >>>> the source scan?). I assume there is a standard chemical formula format > >>>> that can be represented compactly in ASCII text, but I don't know it. > >>>> An example to show how compact it could be: > >>>> fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" > >>>> > >>>> For basic annotation, the masses are not necessary I think. Expected > >>>> mass can be recomputed if all the label metadata is complete and > >>>> regular, and the observed mass is unimportant for annotation (IMO). > >>>> > >>>> -Matt > >>>> > >>>> > >>>> David Creasy wrote: > >>>> > >>>>> Hi Phil, > >>>>> > >>>>> Just to be sure I've not misunderstood... from below, each fragment ion > >>>>> takes approx 500 bytes. Lets assume a conservative average of 20 > >>>>> fragment matches per spectrum and a modest search with 100k spectra. > >>>>> Assuming that we just report fragment matches for the top match for each > >>>>> spectrum, this would result in a file that is 500 x 20 x 100,000 = 1Gb. > >>>>> If we reported fragment matches for the the top 10 matches for each > >>>>> spectrum, this would be 10Gb. Is this reasonable and acceptable? > >>>>> > >>>>> David > >>>>> > >>>>> > >>>>> > >>>>> Phil Jones @ EBI wrote: > >>>>> > >>>>> > >>>>>> Hi, > >>>>>> > >>>>>> Regarding Issue 28 > >>>>>> <http://code.google.com/p/psi-pi/issues/detail?id=28> "support > >>>>>> reporting of fragment ions" > >>>>>> > >>>>>> As a suggestion of how this might be tackled: > >>>>>> > >>>>>> The latest development version of the PRIDE database includes a very > >>>>>> simple mechanism > >>>>>> for recording fragment ion information, illustrated below. (Please > >>>>>> note - made up data.) > >>>>>> > >>>>>> In this example, CV terms are used to define the type of ion and > >>>>>> related information > >>>>>> / annotation. Note that this is even more simple that the suggestion > >>>>>> made by Andy > >>>>>> above - no attempt is made here to indicate which residue has been > >>>>>> called for each > >>>>>> fragment ion - it is just listing the ions. > >>>>>> > >>>>>> Also note that while the PeptideItem is referencing the mass spectrum > >>>>>> > >> (which is > >> > >>>>>> reported in detail in the associated mzData file), the individual > >>>>>> fragment ions are > >>>>>> just reporting the m/z value and not attempting to make any kind of > >>>>>> hard reference to > >>>>>> the spectrum. > >>>>>> > >>>>>> As you can see, this has been developed in collaboration with Waters, > >>>>>> with output > >>>>>> from the ProteinLynx Global Server. (Actual values / sequence have > >>>>>> been changed). > >>>>>> > >>>>>> One possible change would be to make the m/z value an attribute of the > >>>>>> FragmentIon element, as this value will be mandatory and required to > >>>>>> relate the fragment ion to the correct peak on the mass spectrum. The > >>>>>> CV used for the annotation would also need to be part of the PI CV ?? > >>>>>> > >>>>>> Note that in the existing model, there are other terms available, to > >>>>>> allow any kind of fragment ion to be described (not just B and Y ions) > >>>>>> > >>>>>> In the context of analysisXML, the <FragmentIon/> elements would be > >>>>>> children of a <SpectrumIdentificationResultItem/> > >>>>>> > >>>>>> best regards, > >>>>>> > >>>>>> Phil. > >>>>>> > >>>>>> <PeptideItem> > >>>>>> <Sequence>LFQQSQWTREVFSNSCK</Sequence> > >>>>>> <Start>435</Start> > >>>>>> <End>460</End> > >>>>>> <SpectrumReference>123</SpectrumReference> > >>>>>> <FragmentIon> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > >>>>>> > >>>> value="3"/> > >>>> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >>>>>> m/z" value="379.2215"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >>>>>> intensity" value="1382.0"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion > >>>>>> > >> m/z > >> > >>>>>> error" value="-7.1543"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >>>>>> retention time error" value="0.0207"/> > >>>>>> </FragmentIon> > >>>>>> <FragmentIon> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > >>>>>> > >>>> value="4"/> > >>>> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >>>>>> m/z" value="534.2811"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >>>>>> intensity" value="1242.0"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion > >>>>>> > >> m/z > >> > >>>>>> error" value="-8.2315"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >>>>>> retention time error" value="0.0029"/> > >>>>>> </FragmentIon> > >>>>>> <FragmentIon> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" > >>>>>> > >>>> value="3"/> > >>>> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >>>>>> m/z" value="394.1813"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >>>>>> intensity" value="1917.0"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion > >>>>>> > >> m/z > >> > >>>>>> error" value="-14.7098"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >>>>>> retention time error" value="-0.0013"/> > >>>>>> </FragmentIon> > >>>>>> <FragmentIon> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" > >>>>>> > >>>> value="3"/> > >>>> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >>>>>> m/z" value="367.1669"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >>>>>> intensity" value="345.0"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion > >>>>>> > >> m/z > >> > >>>>>> error" value="-18.767"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >>>>>> retention time error" value="0.0025"/> > >>>>>> </FragmentIon> > >>>>>> <additional> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor > >>>>>> > >> mass" > >> > >>>>>> value="1971.9194"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor > >>>>>> intensity" value="181349.0"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor > >>>>>> > >> error > >> > >>>>>> in ppm" value="0.8043"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor > >>>>>> retention time in minutes" value="57.3537"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion > >>>>>> mass RMS error" value="14.5969"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion > >>>>>> retention time RMS error" value="0.0093"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted > >>>>>> average charge state" value="2.2"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one > >>>>>> > >> match" > >> > >>>>>> value="" /> > >>>>>> </additional> > >>>>>> </PeptideItem> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Phil Jones > >>>>>> Senior Software Engineer > >>>>>> PRIDE Project Team > >>>>>> PANDA Group, EMBL-EBI > >>>>>> Wellcome Trust Genome Campus > >>>>>> Hinxton, Cambridge, CB10 1SD > >>>>>> UK. > >>>>>> > >>>>>> Work phone: +44 1223 492662 (NEW NUMBER) > >>>>>> Skype: philip-jones > >>>>>> > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev |