From: Jones, A. <And...@li...> - 2008-08-01 10:03:49
|
Hi all, Here's a proposal for fragmentation ions as discussed on the call that's halfway between using cvParams for all values and using an array based encoding. I think it's pretty flexible and concise. First up, setup a FragmentationTable for the entire list of the spectra, which says the kinds of measures you're going to report lower down: <SpectrumIdentificationList id="MASCOT_results"> <FragmentationTable> <Measures> <Measure id = "m1"> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion m/z"/> </Measure> <Measure id = "m2"> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion intensity"/> </Measure> <Measure id = "m3"> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z error"/> </Measure> <Measure id = "m4"> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion retention time error"/> </Measure> </Measures> </FragmentationTable> Then for each SpectrumIdentificationItem, you reference back to these Measures <SpectrumIdentificationItem id="SEQ_spec1_pep1" Peptide_ref="prot1_pep1" chargeState="1"> <PeptideEvidence id="PE1_SEQ_spec1_pep1" start="67" pre="-" end="79" isDecoy="false" /> ... <Fragmentation> <IonType> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" value="3"/> <FragArrayIndex values = "3 8 10"/> <FragArray Measure_ref = "m1" values = "379.2215 457.1234 540.234"/> <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> <!-- and so on for other measures as defined in the FragmentationTable --> </IonType> <IonType> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="4"/> <FragArrayIndex values = "2 12 14"/> <FragArray Measure_ref = "m1" values = "560.153 859.111 945.653"/> <FragArray Measure_ref = "m2" values = "502.0 330.5 559.5"/> <!-- and so on for other measures as defined in the FragmentationTable --> </IonType> </Fragmentation> Each array contains space separated values (i.e. an xsd:list). The FragArrayIndex tells you which ions you've found i.e. for the second IonType we have b2 b12 and b14 which have the m/z and intensity values in the m1 and m2 arrays. This will save a lot of space if there are many ions of the same type in each array and I think it is fairly easy to read as well. Slightly more space could be saved by defining the ion types in the FragmentationTable but not much really once you've added a reference back up to it. Cheers Andy > -----Original Message----- > From: psi...@li... [mailto:psidev-pi-dev- > bo...@li...] On Behalf Of Matthew Chambers > Sent: 18 July 2008 16:00 > To: psi...@li... > Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is currently > handled in PRIDE (Issue 28) > > I also agree that anything beyond an array is far too verbose. To answer > this question, I think we need to decide the scope of the problem. What > do we want fragment ion information to represent? I think analysis > software is too diverse to use it for anything more than basic > annotation, but basic annotation is important. If there are ways people > want it to be usable beyond that, speak up. :) > > For basic annotation, all I think is needed is the fragment type, series > number, charge state, and possibly any modification like a neutral loss > or radical. The array can be an attribute or text node. We can use a > grammar for each term, where each term represents an ion and terms are > space delimited. The grammar might look like: <a|b|c|x|y|z><# between 1 > and peptide_length>[<+|-><formula>][,(<+|-><charge>] > We could make the charge part mandatory or if it was optional, assume a > +1 charge (or possibly allow the charge to be based on the polarity of > the source scan?). I assume there is a standard chemical formula format > that can be represented compactly in ASCII text, but I don't know it. > An example to show how compact it could be: > fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" > > For basic annotation, the masses are not necessary I think. Expected > mass can be recomputed if all the label metadata is complete and > regular, and the observed mass is unimportant for annotation (IMO). > > -Matt > > > David Creasy wrote: > > Hi Phil, > > > > Just to be sure I've not misunderstood... from below, each fragment ion > > takes approx 500 bytes. Lets assume a conservative average of 20 > > fragment matches per spectrum and a modest search with 100k spectra. > > Assuming that we just report fragment matches for the top match for each > > spectrum, this would result in a file that is 500 x 20 x 100,000 = 1Gb. > > If we reported fragment matches for the the top 10 matches for each > > spectrum, this would be 10Gb. Is this reasonable and acceptable? > > > > David > > > > > > > > Phil Jones @ EBI wrote: > > > >> Hi, > >> > >> Regarding Issue 28 > >> <http://code.google.com/p/psi-pi/issues/detail?id=28> "support > >> reporting of fragment ions" > >> > >> As a suggestion of how this might be tackled: > >> > >> The latest development version of the PRIDE database includes a very > >> simple mechanism > >> for recording fragment ion information, illustrated below. (Please > >> note - made up data.) > >> > >> In this example, CV terms are used to define the type of ion and > >> related information > >> / annotation. Note that this is even more simple that the suggestion > >> made by Andy > >> above - no attempt is made here to indicate which residue has been > >> called for each > >> fragment ion - it is just listing the ions. > >> > >> Also note that while the PeptideItem is referencing the mass spectrum (which is > >> reported in detail in the associated mzData file), the individual > >> fragment ions are > >> just reporting the m/z value and not attempting to make any kind of > >> hard reference to > >> the spectrum. > >> > >> As you can see, this has been developed in collaboration with Waters, > >> with output > >> from the ProteinLynx Global Server. (Actual values / sequence have > >> been changed). > >> > >> One possible change would be to make the m/z value an attribute of the > >> FragmentIon element, as this value will be mandatory and required to > >> relate the fragment ion to the correct peak on the mass spectrum. The > >> CV used for the annotation would also need to be part of the PI CV ?? > >> > >> Note that in the existing model, there are other terms available, to > >> allow any kind of fragment ion to be described (not just B and Y ions) > >> > >> In the context of analysisXML, the <FragmentIon/> elements would be > >> children of a <SpectrumIdentificationResultItem/> > >> > >> best regards, > >> > >> Phil. > >> > >> <PeptideItem> > >> <Sequence>LFQQSQWTREVFSNSCK</Sequence> > >> <Start>435</Start> > >> <End>460</End> > >> <SpectrumReference>123</SpectrumReference> > >> <FragmentIon> > >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > value="3"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> m/z" value="379.2215"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> intensity" value="1382.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > >> error" value="-7.1543"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> retention time error" value="0.0207"/> > >> </FragmentIon> > >> <FragmentIon> > >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > value="4"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> m/z" value="534.2811"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> intensity" value="1242.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > >> error" value="-8.2315"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> retention time error" value="0.0029"/> > >> </FragmentIon> > >> <FragmentIon> > >> <cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" > value="3"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> m/z" value="394.1813"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> intensity" value="1917.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > >> error" value="-14.7098"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> retention time error" value="-0.0013"/> > >> </FragmentIon> > >> <FragmentIon> > >> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" > value="3"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> m/z" value="367.1669"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> intensity" value="345.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > >> error" value="-18.767"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> retention time error" value="0.0025"/> > >> </FragmentIon> > >> <additional> > >> <cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor mass" > >> value="1971.9194"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor > >> intensity" value="181349.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor error > >> in ppm" value="0.8043"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor > >> retention time in minutes" value="57.3537"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion > >> mass RMS error" value="14.5969"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion > >> retention time RMS error" value="0.0093"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted > >> average charge state" value="2.2"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one match" > >> value="" /> > >> </additional> > >> </PeptideItem> > >> > >> > >> -- > >> Phil Jones > >> Senior Software Engineer > >> PRIDE Project Team > >> PANDA Group, EMBL-EBI > >> Wellcome Trust Genome Campus > >> Hinxton, Cambridge, CB10 1SD > >> UK. > >> > >> Work phone: +44 1223 492662 (NEW NUMBER) > >> Skype: philip-jones > >> > >> ------------------------------------------------------------------------- > >> This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > >> Build the coolest Linux based applications with Moblin SDK & win great prizes > >> Grand prize is a trip for two to an Open Source event anywhere in the world > >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ > >> _______________________________________________ > >> Psidev-pi-dev mailing list > >> Psi...@li... > >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >> > > > > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev |