From: Jones, A. <And...@li...> - 2008-07-21 12:26:09
|
Hi all, > An example to show how compact it could be: > fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" I have a couple of queries about this proposal... Given a peptide sequence, we would be able to work out what were the expected masses of these fragments, assuming a standard method of calculating the masses of the b and y ions (and losses) - do all search engines use the same equation to calculate ion masses? We wouldn't really know which peaks in the source spectrum corresponded with which ion. For many of the peaks we would be able to make a fair guess i.e. there is an observed peak within the tolerance window matching the expected mass but this doesn't help when there are multiple peaks within the window - I don't think we could correctly assume it would always be the most abundant peak...? In other words, we still have information loss. Perhaps one way forward would be for us to list the use cases that fragment ions must be reported for - do we have a list of use cases anywhere? I think getting this right will be a long process, so we have to make sure that we have a strong enough use case if we really want to get this into analysisXML version1. Cheers Andy > -----Original Message----- > From: psi...@li... [mailto:psidev-pi-dev- > bo...@li...] On Behalf Of Matthew Chambers > Sent: 18 July 2008 16:00 > To: psi...@li... > Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is currently > handled in PRIDE (Issue 28) > > I also agree that anything beyond an array is far too verbose. To answer > this question, I think we need to decide the scope of the problem. What > do we want fragment ion information to represent? I think analysis > software is too diverse to use it for anything more than basic > annotation, but basic annotation is important. If there are ways people > want it to be usable beyond that, speak up. :) > > For basic annotation, all I think is needed is the fragment type, series > number, charge state, and possibly any modification like a neutral loss > or radical. The array can be an attribute or text node. We can use a > grammar for each term, where each term represents an ion and terms are > space delimited. The grammar might look like: <a|b|c|x|y|z><# between 1 > and peptide_length>[<+|-><formula>][,(<+|-><charge>] > We could make the charge part mandatory or if it was optional, assume a > +1 charge (or possibly allow the charge to be based on the polarity of > the source scan?). I assume there is a standard chemical formula format > that can be represented compactly in ASCII text, but I don't know it. > An example to show how compact it could be: > fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" > > For basic annotation, the masses are not necessary I think. Expected > mass can be recomputed if all the label metadata is complete and > regular, and the observed mass is unimportant for annotation (IMO). > > -Matt > > > David Creasy wrote: > > Hi Phil, > > > > Just to be sure I've not misunderstood... from below, each fragment ion > > takes approx 500 bytes. Lets assume a conservative average of 20 > > fragment matches per spectrum and a modest search with 100k spectra. > > Assuming that we just report fragment matches for the top match for each > > spectrum, this would result in a file that is 500 x 20 x 100,000 = 1Gb. > > If we reported fragment matches for the the top 10 matches for each > > spectrum, this would be 10Gb. Is this reasonable and acceptable? > > > > David > > > > > > > > Phil Jones @ EBI wrote: > > > >> Hi, > >> > >> Regarding Issue 28 > >> <http://code.google.com/p/psi-pi/issues/detail?id=28> "support > >> reporting of fragment ions" > >> > >> As a suggestion of how this might be tackled: > >> > >> The latest development version of the PRIDE database includes a very > >> simple mechanism > >> for recording fragment ion information, illustrated below. (Please > >> note - made up data.) > >> > >> In this example, CV terms are used to define the type of ion and > >> related information > >> / annotation. Note that this is even more simple that the suggestion > >> made by Andy > >> above - no attempt is made here to indicate which residue has been > >> called for each > >> fragment ion - it is just listing the ions. > >> > >> Also note that while the PeptideItem is referencing the mass spectrum (which is > >> reported in detail in the associated mzData file), the individual > >> fragment ions are > >> just reporting the m/z value and not attempting to make any kind of > >> hard reference to > >> the spectrum. > >> > >> As you can see, this has been developed in collaboration with Waters, > >> with output > >> from the ProteinLynx Global Server. (Actual values / sequence have > >> been changed). > >> > >> One possible change would be to make the m/z value an attribute of the > >> FragmentIon element, as this value will be mandatory and required to > >> relate the fragment ion to the correct peak on the mass spectrum. The > >> CV used for the annotation would also need to be part of the PI CV ?? > >> > >> Note that in the existing model, there are other terms available, to > >> allow any kind of fragment ion to be described (not just B and Y ions) > >> > >> In the context of analysisXML, the <FragmentIon/> elements would be > >> children of a <SpectrumIdentificationResultItem/> > >> > >> best regards, > >> > >> Phil. > >> > >> <PeptideItem> > >> <Sequence>LFQQSQWTREVFSNSCK</Sequence> > >> <Start>435</Start> > >> <End>460</End> > >> <SpectrumReference>123</SpectrumReference> > >> <FragmentIon> > >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > value="3"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> m/z" value="379.2215"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> intensity" value="1382.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > >> error" value="-7.1543"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> retention time error" value="0.0207"/> > >> </FragmentIon> > >> <FragmentIon> > >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > value="4"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> m/z" value="534.2811"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> intensity" value="1242.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > >> error" value="-8.2315"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> retention time error" value="0.0029"/> > >> </FragmentIon> > >> <FragmentIon> > >> <cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" > value="3"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> m/z" value="394.1813"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> intensity" value="1917.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > >> error" value="-14.7098"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> retention time error" value="-0.0013"/> > >> </FragmentIon> > >> <FragmentIon> > >> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" > value="3"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> m/z" value="367.1669"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> intensity" value="345.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > >> error" value="-18.767"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> retention time error" value="0.0025"/> > >> </FragmentIon> > >> <additional> > >> <cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor mass" > >> value="1971.9194"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor > >> intensity" value="181349.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor error > >> in ppm" value="0.8043"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor > >> retention time in minutes" value="57.3537"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion > >> mass RMS error" value="14.5969"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion > >> retention time RMS error" value="0.0093"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted > >> average charge state" value="2.2"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one match" > >> value="" /> > >> </additional> > >> </PeptideItem> > >> > >> > >> -- > >> Phil Jones > >> Senior Software Engineer > >> PRIDE Project Team > >> PANDA Group, EMBL-EBI > >> Wellcome Trust Genome Campus > >> Hinxton, Cambridge, CB10 1SD > >> UK. > >> > >> Work phone: +44 1223 492662 (NEW NUMBER) > >> Skype: philip-jones > >> > >> ------------------------------------------------------------------------- > >> This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > >> Build the coolest Linux based applications with Moblin SDK & win great prizes > >> Grand prize is a trip for two to an Open Source event anywhere in the world > >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ > >> _______________________________________________ > >> Psidev-pi-dev mailing list > >> Psi...@li... > >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >> > > > > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev |