From: Phil J. @ E. <pj...@eb...> - 2008-07-18 10:44:24
|
Hi, Regarding Issue 28 <http://code.google.com/p/psi-pi/issues/detail?id=28> "support reporting of fragment ions" As a suggestion of how this might be tackled: The latest development version of the PRIDE database includes a very simple mechanism for recording fragment ion information, illustrated below. (Please note - made up data.) In this example, CV terms are used to define the type of ion and related information / annotation. Note that this is even more simple that the suggestion made by Andy above - no attempt is made here to indicate which residue has been called for each fragment ion - it is just listing the ions. Also note that while the PeptideItem is referencing the mass spectrum (which is reported in detail in the associated mzData file), the individual fragment ions are just reporting the m/z value and not attempting to make any kind of hard reference to the spectrum. As you can see, this has been developed in collaboration with Waters, with output from the ProteinLynx Global Server. (Actual values / sequence have been changed). One possible change would be to make the m/z value an attribute of the FragmentIon element, as this value will be mandatory and required to relate the fragment ion to the correct peak on the mass spectrum. The CV used for the annotation would also need to be part of the PI CV ?? Note that in the existing model, there are other terms available, to allow any kind of fragment ion to be described (not just B and Y ions) In the context of analysisXML, the <FragmentIon/> elements would be children of a <SpectrumIdentificationResultItem/> best regards, Phil. <PeptideItem> <Sequence>LFQQSQWTREVFSNSCK</Sequence> <Start>435</Start> <End>460</End> <SpectrumReference>123</SpectrumReference> <FragmentIon> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="3"/> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion m/z" value="379.2215"/> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion intensity" value="1382.0"/> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z error" value="-7.1543"/> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion retention time error" value="0.0207"/> </FragmentIon> <FragmentIon> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="4"/> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion m/z" value="534.2811"/> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion intensity" value="1242.0"/> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z error" value="-8.2315"/> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion retention time error" value="0.0029"/> </FragmentIon> <FragmentIon> <cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" value="3"/> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion m/z" value="394.1813"/> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion intensity" value="1917.0"/> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z error" value="-14.7098"/> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion retention time error" value="-0.0013"/> </FragmentIon> <FragmentIon> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" value="3"/> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion m/z" value="367.1669"/> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion intensity" value="345.0"/> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z error" value="-18.767"/> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion retention time error" value="0.0025"/> </FragmentIon> <additional> <cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor mass" value="1971.9194"/> <cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor intensity" value="181349.0"/> <cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor error in ppm" value="0.8043"/> <cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor retention time in minutes" value="57.3537"/> <cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion mass RMS error" value="14.5969"/> <cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion retention time RMS error" value="0.0093"/> <cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted average charge state" value="2.2"/> <cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one match" value="" /> </additional> </PeptideItem> -- Phil Jones Senior Software Engineer PRIDE Project Team PANDA Group, EMBL-EBI Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD UK. Work phone: +44 1223 492662 (NEW NUMBER) Skype: philip-jones |
From: David C. <dc...@ma...> - 2008-07-18 11:24:44
|
Hi Phil, Just to be sure I've not misunderstood... from below, each fragment ion takes approx 500 bytes. Lets assume a conservative average of 20 fragment matches per spectrum and a modest search with 100k spectra. Assuming that we just report fragment matches for the top match for each spectrum, this would result in a file that is 500 x 20 x 100,000 = 1Gb. If we reported fragment matches for the the top 10 matches for each spectrum, this would be 10Gb. Is this reasonable and acceptable? David Phil Jones @ EBI wrote: > Hi, > > Regarding Issue 28 > <http://code.google.com/p/psi-pi/issues/detail?id=28> "support > reporting of fragment ions" > > As a suggestion of how this might be tackled: > > The latest development version of the PRIDE database includes a very > simple mechanism > for recording fragment ion information, illustrated below. (Please > note - made up data.) > > In this example, CV terms are used to define the type of ion and > related information > / annotation. Note that this is even more simple that the suggestion > made by Andy > above - no attempt is made here to indicate which residue has been > called for each > fragment ion - it is just listing the ions. > > Also note that while the PeptideItem is referencing the mass spectrum (which is > reported in detail in the associated mzData file), the individual > fragment ions are > just reporting the m/z value and not attempting to make any kind of > hard reference to > the spectrum. > > As you can see, this has been developed in collaboration with Waters, > with output > from the ProteinLynx Global Server. (Actual values / sequence have > been changed). > > One possible change would be to make the m/z value an attribute of the > FragmentIon element, as this value will be mandatory and required to > relate the fragment ion to the correct peak on the mass spectrum. The > CV used for the annotation would also need to be part of the PI CV ?? > > Note that in the existing model, there are other terms available, to > allow any kind of fragment ion to be described (not just B and Y ions) > > In the context of analysisXML, the <FragmentIon/> elements would be > children of a <SpectrumIdentificationResultItem/> > > best regards, > > Phil. > > <PeptideItem> > <Sequence>LFQQSQWTREVFSNSCK</Sequence> > <Start>435</Start> > <End>460</End> > <SpectrumReference>123</SpectrumReference> > <FragmentIon> > <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="3"/> > <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > m/z" value="379.2215"/> > <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > intensity" value="1382.0"/> > <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > error" value="-7.1543"/> > <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > retention time error" value="0.0207"/> > </FragmentIon> > <FragmentIon> > <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="4"/> > <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > m/z" value="534.2811"/> > <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > intensity" value="1242.0"/> > <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > error" value="-8.2315"/> > <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > retention time error" value="0.0029"/> > </FragmentIon> > <FragmentIon> > <cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" value="3"/> > <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > m/z" value="394.1813"/> > <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > intensity" value="1917.0"/> > <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > error" value="-14.7098"/> > <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > retention time error" value="-0.0013"/> > </FragmentIon> > <FragmentIon> > <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" value="3"/> > <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > m/z" value="367.1669"/> > <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > intensity" value="345.0"/> > <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > error" value="-18.767"/> > <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > retention time error" value="0.0025"/> > </FragmentIon> > <additional> > <cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor mass" > value="1971.9194"/> > <cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor > intensity" value="181349.0"/> > <cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor error > in ppm" value="0.8043"/> > <cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor > retention time in minutes" value="57.3537"/> > <cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion > mass RMS error" value="14.5969"/> > <cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion > retention time RMS error" value="0.0093"/> > <cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted > average charge state" value="2.2"/> > <cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one match" > value="" /> > </additional> > </PeptideItem> > > > -- > Phil Jones > Senior Software Engineer > PRIDE Project Team > PANDA Group, EMBL-EBI > Wellcome Trust Genome Campus > Hinxton, Cambridge, CB10 1SD > UK. > > Work phone: +44 1223 492662 (NEW NUMBER) > Skype: philip-jones > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: Pierre-Alain B. <pie...@is...> - 2008-07-18 12:49:30
|
Hi Phil, to my opinion also, really too verbose. Typically a place where arrays can be used efficiently. In principle, the way I had shown with the phenyx example can probably be better encoded in single dimension or even multy dimension arrays (just like mzXML for m/z-I pairs). Just my thoughts Pierre-Alain David Creasy wrote: > Hi Phil, > > Just to be sure I've not misunderstood... from below, each fragment ion > takes approx 500 bytes. Lets assume a conservative average of 20 > fragment matches per spectrum and a modest search with 100k spectra. > Assuming that we just report fragment matches for the top match for each > spectrum, this would result in a file that is 500 x 20 x 100,000 = 1Gb. > If we reported fragment matches for the the top 10 matches for each > spectrum, this would be 10Gb. Is this reasonable and acceptable? > > David > > > > Phil Jones @ EBI wrote: > >> Hi, >> >> Regarding Issue 28 >> <http://code.google.com/p/psi-pi/issues/detail?id=28> "support >> reporting of fragment ions" >> >> As a suggestion of how this might be tackled: >> >> The latest development version of the PRIDE database includes a very >> simple mechanism >> for recording fragment ion information, illustrated below. (Please >> note - made up data.) >> >> In this example, CV terms are used to define the type of ion and >> related information >> / annotation. Note that this is even more simple that the suggestion >> made by Andy >> above - no attempt is made here to indicate which residue has been >> called for each >> fragment ion - it is just listing the ions. >> >> Also note that while the PeptideItem is referencing the mass spectrum (which is >> reported in detail in the associated mzData file), the individual >> fragment ions are >> just reporting the m/z value and not attempting to make any kind of >> hard reference to >> the spectrum. >> >> As you can see, this has been developed in collaboration with Waters, >> with output >> from the ProteinLynx Global Server. (Actual values / sequence have >> been changed). >> >> One possible change would be to make the m/z value an attribute of the >> FragmentIon element, as this value will be mandatory and required to >> relate the fragment ion to the correct peak on the mass spectrum. The >> CV used for the annotation would also need to be part of the PI CV ?? >> >> Note that in the existing model, there are other terms available, to >> allow any kind of fragment ion to be described (not just B and Y ions) >> >> In the context of analysisXML, the <FragmentIon/> elements would be >> children of a <SpectrumIdentificationResultItem/> >> >> best regards, >> >> Phil. >> >> <PeptideItem> >> <Sequence>LFQQSQWTREVFSNSCK</Sequence> >> <Start>435</Start> >> <End>460</End> >> <SpectrumReference>123</SpectrumReference> >> <FragmentIon> >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="3"/> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >> m/z" value="379.2215"/> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >> intensity" value="1382.0"/> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z >> error" value="-7.1543"/> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >> retention time error" value="0.0207"/> >> </FragmentIon> >> <FragmentIon> >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="4"/> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >> m/z" value="534.2811"/> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >> intensity" value="1242.0"/> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z >> error" value="-8.2315"/> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >> retention time error" value="0.0029"/> >> </FragmentIon> >> <FragmentIon> >> <cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" value="3"/> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >> m/z" value="394.1813"/> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >> intensity" value="1917.0"/> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z >> error" value="-14.7098"/> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >> retention time error" value="-0.0013"/> >> </FragmentIon> >> <FragmentIon> >> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" value="3"/> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >> m/z" value="367.1669"/> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >> intensity" value="345.0"/> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z >> error" value="-18.767"/> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >> retention time error" value="0.0025"/> >> </FragmentIon> >> <additional> >> <cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor mass" >> value="1971.9194"/> >> <cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor >> intensity" value="181349.0"/> >> <cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor error >> in ppm" value="0.8043"/> >> <cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor >> retention time in minutes" value="57.3537"/> >> <cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion >> mass RMS error" value="14.5969"/> >> <cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion >> retention time RMS error" value="0.0093"/> >> <cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted >> average charge state" value="2.2"/> >> <cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one match" >> value="" /> >> </additional> >> </PeptideItem> >> >> >> -- >> Phil Jones >> Senior Software Engineer >> PRIDE Project Team >> PANDA Group, EMBL-EBI >> Wellcome Trust Genome Campus >> Hinxton, Cambridge, CB10 1SD >> UK. >> >> Work phone: +44 1223 492662 (NEW NUMBER) >> Skype: philip-jones >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >> Build the coolest Linux based applications with Moblin SDK & win great prizes >> Grand prize is a trip for two to an Open Source event anywhere in the world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> > > |
From: Matthew C. <mat...@va...> - 2008-07-18 14:59:48
|
I also agree that anything beyond an array is far too verbose. To answer this question, I think we need to decide the scope of the problem. What do we want fragment ion information to represent? I think analysis software is too diverse to use it for anything more than basic annotation, but basic annotation is important. If there are ways people want it to be usable beyond that, speak up. :) For basic annotation, all I think is needed is the fragment type, series number, charge state, and possibly any modification like a neutral loss or radical. The array can be an attribute or text node. We can use a grammar for each term, where each term represents an ion and terms are space delimited. The grammar might look like: <a|b|c|x|y|z><# between 1 and peptide_length>[<+|-><formula>][,(<+|-><charge>] We could make the charge part mandatory or if it was optional, assume a +1 charge (or possibly allow the charge to be based on the polarity of the source scan?). I assume there is a standard chemical formula format that can be represented compactly in ASCII text, but I don't know it. An example to show how compact it could be: fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" For basic annotation, the masses are not necessary I think. Expected mass can be recomputed if all the label metadata is complete and regular, and the observed mass is unimportant for annotation (IMO). -Matt David Creasy wrote: > Hi Phil, > > Just to be sure I've not misunderstood... from below, each fragment ion > takes approx 500 bytes. Lets assume a conservative average of 20 > fragment matches per spectrum and a modest search with 100k spectra. > Assuming that we just report fragment matches for the top match for each > spectrum, this would result in a file that is 500 x 20 x 100,000 = 1Gb. > If we reported fragment matches for the the top 10 matches for each > spectrum, this would be 10Gb. Is this reasonable and acceptable? > > David > > > > Phil Jones @ EBI wrote: > >> Hi, >> >> Regarding Issue 28 >> <http://code.google.com/p/psi-pi/issues/detail?id=28> "support >> reporting of fragment ions" >> >> As a suggestion of how this might be tackled: >> >> The latest development version of the PRIDE database includes a very >> simple mechanism >> for recording fragment ion information, illustrated below. (Please >> note - made up data.) >> >> In this example, CV terms are used to define the type of ion and >> related information >> / annotation. Note that this is even more simple that the suggestion >> made by Andy >> above - no attempt is made here to indicate which residue has been >> called for each >> fragment ion - it is just listing the ions. >> >> Also note that while the PeptideItem is referencing the mass spectrum (which is >> reported in detail in the associated mzData file), the individual >> fragment ions are >> just reporting the m/z value and not attempting to make any kind of >> hard reference to >> the spectrum. >> >> As you can see, this has been developed in collaboration with Waters, >> with output >> from the ProteinLynx Global Server. (Actual values / sequence have >> been changed). >> >> One possible change would be to make the m/z value an attribute of the >> FragmentIon element, as this value will be mandatory and required to >> relate the fragment ion to the correct peak on the mass spectrum. The >> CV used for the annotation would also need to be part of the PI CV ?? >> >> Note that in the existing model, there are other terms available, to >> allow any kind of fragment ion to be described (not just B and Y ions) >> >> In the context of analysisXML, the <FragmentIon/> elements would be >> children of a <SpectrumIdentificationResultItem/> >> >> best regards, >> >> Phil. >> >> <PeptideItem> >> <Sequence>LFQQSQWTREVFSNSCK</Sequence> >> <Start>435</Start> >> <End>460</End> >> <SpectrumReference>123</SpectrumReference> >> <FragmentIon> >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="3"/> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >> m/z" value="379.2215"/> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >> intensity" value="1382.0"/> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z >> error" value="-7.1543"/> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >> retention time error" value="0.0207"/> >> </FragmentIon> >> <FragmentIon> >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="4"/> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >> m/z" value="534.2811"/> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >> intensity" value="1242.0"/> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z >> error" value="-8.2315"/> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >> retention time error" value="0.0029"/> >> </FragmentIon> >> <FragmentIon> >> <cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" value="3"/> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >> m/z" value="394.1813"/> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >> intensity" value="1917.0"/> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z >> error" value="-14.7098"/> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >> retention time error" value="-0.0013"/> >> </FragmentIon> >> <FragmentIon> >> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" value="3"/> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >> m/z" value="367.1669"/> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >> intensity" value="345.0"/> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z >> error" value="-18.767"/> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >> retention time error" value="0.0025"/> >> </FragmentIon> >> <additional> >> <cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor mass" >> value="1971.9194"/> >> <cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor >> intensity" value="181349.0"/> >> <cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor error >> in ppm" value="0.8043"/> >> <cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor >> retention time in minutes" value="57.3537"/> >> <cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion >> mass RMS error" value="14.5969"/> >> <cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion >> retention time RMS error" value="0.0093"/> >> <cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted >> average charge state" value="2.2"/> >> <cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one match" >> value="" /> >> </additional> >> </PeptideItem> >> >> >> -- >> Phil Jones >> Senior Software Engineer >> PRIDE Project Team >> PANDA Group, EMBL-EBI >> Wellcome Trust Genome Campus >> Hinxton, Cambridge, CB10 1SD >> UK. >> >> Work phone: +44 1223 492662 (NEW NUMBER) >> Skype: philip-jones >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >> Build the coolest Linux based applications with Moblin SDK & win great prizes >> Grand prize is a trip for two to an Open Source event anywhere in the world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> > > |
From: Jones, A. <And...@li...> - 2008-07-21 12:26:09
|
Hi all, > An example to show how compact it could be: > fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" I have a couple of queries about this proposal... Given a peptide sequence, we would be able to work out what were the expected masses of these fragments, assuming a standard method of calculating the masses of the b and y ions (and losses) - do all search engines use the same equation to calculate ion masses? We wouldn't really know which peaks in the source spectrum corresponded with which ion. For many of the peaks we would be able to make a fair guess i.e. there is an observed peak within the tolerance window matching the expected mass but this doesn't help when there are multiple peaks within the window - I don't think we could correctly assume it would always be the most abundant peak...? In other words, we still have information loss. Perhaps one way forward would be for us to list the use cases that fragment ions must be reported for - do we have a list of use cases anywhere? I think getting this right will be a long process, so we have to make sure that we have a strong enough use case if we really want to get this into analysisXML version1. Cheers Andy > -----Original Message----- > From: psi...@li... [mailto:psidev-pi-dev- > bo...@li...] On Behalf Of Matthew Chambers > Sent: 18 July 2008 16:00 > To: psi...@li... > Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is currently > handled in PRIDE (Issue 28) > > I also agree that anything beyond an array is far too verbose. To answer > this question, I think we need to decide the scope of the problem. What > do we want fragment ion information to represent? I think analysis > software is too diverse to use it for anything more than basic > annotation, but basic annotation is important. If there are ways people > want it to be usable beyond that, speak up. :) > > For basic annotation, all I think is needed is the fragment type, series > number, charge state, and possibly any modification like a neutral loss > or radical. The array can be an attribute or text node. We can use a > grammar for each term, where each term represents an ion and terms are > space delimited. The grammar might look like: <a|b|c|x|y|z><# between 1 > and peptide_length>[<+|-><formula>][,(<+|-><charge>] > We could make the charge part mandatory or if it was optional, assume a > +1 charge (or possibly allow the charge to be based on the polarity of > the source scan?). I assume there is a standard chemical formula format > that can be represented compactly in ASCII text, but I don't know it. > An example to show how compact it could be: > fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" > > For basic annotation, the masses are not necessary I think. Expected > mass can be recomputed if all the label metadata is complete and > regular, and the observed mass is unimportant for annotation (IMO). > > -Matt > > > David Creasy wrote: > > Hi Phil, > > > > Just to be sure I've not misunderstood... from below, each fragment ion > > takes approx 500 bytes. Lets assume a conservative average of 20 > > fragment matches per spectrum and a modest search with 100k spectra. > > Assuming that we just report fragment matches for the top match for each > > spectrum, this would result in a file that is 500 x 20 x 100,000 = 1Gb. > > If we reported fragment matches for the the top 10 matches for each > > spectrum, this would be 10Gb. Is this reasonable and acceptable? > > > > David > > > > > > > > Phil Jones @ EBI wrote: > > > >> Hi, > >> > >> Regarding Issue 28 > >> <http://code.google.com/p/psi-pi/issues/detail?id=28> "support > >> reporting of fragment ions" > >> > >> As a suggestion of how this might be tackled: > >> > >> The latest development version of the PRIDE database includes a very > >> simple mechanism > >> for recording fragment ion information, illustrated below. (Please > >> note - made up data.) > >> > >> In this example, CV terms are used to define the type of ion and > >> related information > >> / annotation. Note that this is even more simple that the suggestion > >> made by Andy > >> above - no attempt is made here to indicate which residue has been > >> called for each > >> fragment ion - it is just listing the ions. > >> > >> Also note that while the PeptideItem is referencing the mass spectrum (which is > >> reported in detail in the associated mzData file), the individual > >> fragment ions are > >> just reporting the m/z value and not attempting to make any kind of > >> hard reference to > >> the spectrum. > >> > >> As you can see, this has been developed in collaboration with Waters, > >> with output > >> from the ProteinLynx Global Server. (Actual values / sequence have > >> been changed). > >> > >> One possible change would be to make the m/z value an attribute of the > >> FragmentIon element, as this value will be mandatory and required to > >> relate the fragment ion to the correct peak on the mass spectrum. The > >> CV used for the annotation would also need to be part of the PI CV ?? > >> > >> Note that in the existing model, there are other terms available, to > >> allow any kind of fragment ion to be described (not just B and Y ions) > >> > >> In the context of analysisXML, the <FragmentIon/> elements would be > >> children of a <SpectrumIdentificationResultItem/> > >> > >> best regards, > >> > >> Phil. > >> > >> <PeptideItem> > >> <Sequence>LFQQSQWTREVFSNSCK</Sequence> > >> <Start>435</Start> > >> <End>460</End> > >> <SpectrumReference>123</SpectrumReference> > >> <FragmentIon> > >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > value="3"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> m/z" value="379.2215"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> intensity" value="1382.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > >> error" value="-7.1543"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> retention time error" value="0.0207"/> > >> </FragmentIon> > >> <FragmentIon> > >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > value="4"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> m/z" value="534.2811"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> intensity" value="1242.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > >> error" value="-8.2315"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> retention time error" value="0.0029"/> > >> </FragmentIon> > >> <FragmentIon> > >> <cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" > value="3"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> m/z" value="394.1813"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> intensity" value="1917.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > >> error" value="-14.7098"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> retention time error" value="-0.0013"/> > >> </FragmentIon> > >> <FragmentIon> > >> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" > value="3"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> m/z" value="367.1669"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> intensity" value="345.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > >> error" value="-18.767"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> retention time error" value="0.0025"/> > >> </FragmentIon> > >> <additional> > >> <cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor mass" > >> value="1971.9194"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor > >> intensity" value="181349.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor error > >> in ppm" value="0.8043"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor > >> retention time in minutes" value="57.3537"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion > >> mass RMS error" value="14.5969"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion > >> retention time RMS error" value="0.0093"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted > >> average charge state" value="2.2"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one match" > >> value="" /> > >> </additional> > >> </PeptideItem> > >> > >> > >> -- > >> Phil Jones > >> Senior Software Engineer > >> PRIDE Project Team > >> PANDA Group, EMBL-EBI > >> Wellcome Trust Genome Campus > >> Hinxton, Cambridge, CB10 1SD > >> UK. > >> > >> Work phone: +44 1223 492662 (NEW NUMBER) > >> Skype: philip-jones > >> > >> ------------------------------------------------------------------------- > >> This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > >> Build the coolest Linux based applications with Moblin SDK & win great prizes > >> Grand prize is a trip for two to an Open Source event anywhere in the world > >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ > >> _______________________________________________ > >> Psidev-pi-dev mailing list > >> Psi...@li... > >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >> > > > > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev |
From: Matt C. <mat...@va...> - 2008-07-21 13:21:12
|
Hi Andy, As we have both said, it's important to determine the use cases for this information. :) The only reasonable use case that doesn't take up oodles of disk space is simply knowing the ion types that were predicted. Unless I planned to reproduce the search engine's comparison exactly, I don't see the point in knowing the exact mass(es) that the search engine expected and the observed ion(s) that it matched to. And if I plan to reproduce the score, that probably means I have access to the search engine's algorithm, so I'd just regenerate the comparison. As for mapping to the observed ion(s), I think it's not relevant for the purposes of basic annotation. For clarity of presentation, viewers usually show the ion as either a logical point in the spectrum independent of the data itself, or they map it to the most abundant peak in the window. These approaches can be combined by changing the annotation when the user zooms in. So yes, in this approach we have information loss. But I think it's better than not having the information at all (and depending on a vendor-supplied and version-dependent script to regenerate it) and certainly better than choking on 10gb analysis files. ;) -Matt Jones, Andy wrote: > Hi all, > > >> An example to show how compact it could be: >> fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" >> > > I have a couple of queries about this proposal... > > Given a peptide sequence, we would be able to work out what were the expected masses of these fragments, assuming a standard method of calculating the masses of the b and y ions (and losses) - do all search engines use the same equation to calculate ion masses? > > We wouldn't really know which peaks in the source spectrum corresponded with which ion. For many of the peaks we would be able to make a fair guess i.e. there is an observed peak within the tolerance window matching the expected mass but this doesn't help when there are multiple peaks within the window - I don't think we could correctly assume it would always be the most abundant peak...? > > In other words, we still have information loss. Perhaps one way forward would be for us to list the use cases that fragment ions must be reported for - do we have a list of use cases anywhere? > > I think getting this right will be a long process, so we have to make sure that we have a strong enough use case if we really want to get this into analysisXML version1. > > Cheers > Andy > > > |
From: Jones, A. <And...@li...> - 2008-07-21 14:08:25
|
Hi Matt, > As we have both said, it's important to determine the use cases for this > information. :) The only reasonable use case that doesn't take up oodles > of disk space is simply knowing the ion types that were predicted. Another alternative would be to have parallel arrays, similar to mzML, with fragment ions as you suggested in one and observed masses in the other (perhaps represented in base64 binary...) - I'm not necessarily suggesting this is a good idea! > As for mapping to the observed ion(s), I think it's not relevant for the > purposes of basic annotation. For clarity of presentation, viewers > usually show the ion as either a logical point in the spectrum > independent of the data itself, or they map it to the most abundant peak > in the window. These approaches can be combined by changing the > annotation when the user zooms in. Agreed, I can see the use case for viewers. Are there any others...? The problem I have at the moment is that we're a long way from having this standard grammar specified in a formal way which could be verified. One option to consider is defining an auxiliary (non-XML) file which could be transferred in parallel - this way we can keep it outside the formal analysisXML standard, in which we try out something similar to your proposal and see if we can get the main search engines to output something consistent. If successful, roll it into analysisXML v2...? Andy > -----Original Message----- > From: Matt Chambers [mailto:mat...@va...] > Sent: 21 July 2008 14:23 > To: Jones, Andy > Cc: psi...@li... > Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is currently > handled in PRIDE (Issue 28) > > Hi Andy, > > As we have both said, it's important to determine the use cases for this > information. :) The only reasonable use case that doesn't take up oodles > of disk space is simply knowing the ion types that were predicted. > > Unless I planned to reproduce the search engine's comparison exactly, I > don't see the point in knowing the exact mass(es) that the search engine > expected and the observed ion(s) that it matched to. And if I plan to > reproduce the score, that probably means I have access to the search > engine's algorithm, so I'd just regenerate the comparison. > > As for mapping to the observed ion(s), I think it's not relevant for the > purposes of basic annotation. For clarity of presentation, viewers > usually show the ion as either a logical point in the spectrum > independent of the data itself, or they map it to the most abundant peak > in the window. These approaches can be combined by changing the > annotation when the user zooms in. > > So yes, in this approach we have information loss. But I think it's > better than not having the information at all (and depending on a > vendor-supplied and version-dependent script to regenerate it) and > certainly better than choking on 10gb analysis files. ;) > > -Matt > > > Jones, Andy wrote: > > Hi all, > > > > > >> An example to show how compact it could be: > >> fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" > >> > > > > I have a couple of queries about this proposal... > > > > Given a peptide sequence, we would be able to work out what were the > expected masses of these fragments, assuming a standard method of calculating > the masses of the b and y ions (and losses) - do all search engines use the same > equation to calculate ion masses? > > > > We wouldn't really know which peaks in the source spectrum corresponded with > which ion. For many of the peaks we would be able to make a fair guess i.e. there > is an observed peak within the tolerance window matching the expected mass but > this doesn't help when there are multiple peaks within the window - I don't think we > could correctly assume it would always be the most abundant peak...? > > > > In other words, we still have information loss. Perhaps one way forward would > be for us to list the use cases that fragment ions must be reported for - do we > have a list of use cases anywhere? > > > > I think getting this right will be a long process, so we have to make sure that we > have a strong enough use case if we really want to get this into analysisXML > version1. > > > > Cheers > > Andy > > > > > > |
From: Matthew C. <mat...@va...> - 2008-07-21 14:41:20
|
By standard grammar are you referring to the little format I came up with? <a|b|c|x|y|z><# between 1 and peptide_length>[<+|-><formula>][,(<+|-><charge>] It seems pretty easy to verify to me - easier than some of the other features in mzML and analysisXML. :) The hardest part is the <formula> and verifying that the ion series # is between 1 and peptide_length. Those may have to be semantic rather than syntactic verification steps. Making the charge part mandatory would simplify the format. I think the auxiliary file would rarely be written and/or copied along with the original file, so it wouldn't do much good. If it's a concern, the <formula> part could wait until a later version. -Matt Jones, Andy wrote: > Hi Matt, > >> As for mapping to the observed ion(s), I think it's not relevant for the >> purposes of basic annotation. For clarity of presentation, viewers >> usually show the ion as either a logical point in the spectrum >> independent of the data itself, or they map it to the most abundant peak >> in the window. These approaches can be combined by changing the >> annotation when the user zooms in. >> > > Agreed, I can see the use case for viewers. Are there any others...? > > The problem I have at the moment is that we're a long way from having this standard grammar specified in a formal way which could be verified. One option to consider is defining an auxiliary (non-XML) file which could be transferred in parallel - this way we can keep it outside the formal analysisXML standard, in which we try out something similar to your proposal and see if we can get the main search engines to output something consistent. If successful, roll it into analysisXML v2...? > > Andy > > > > > > > > > >> -----Original Message----- >> From: Matt Chambers [mailto:mat...@va...] >> Sent: 21 July 2008 14:23 >> To: Jones, Andy >> Cc: psi...@li... >> Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is currently >> handled in PRIDE (Issue 28) >> >> Hi Andy, >> >> As we have both said, it's important to determine the use cases for this >> information. :) The only reasonable use case that doesn't take up oodles >> of disk space is simply knowing the ion types that were predicted. >> >> Unless I planned to reproduce the search engine's comparison exactly, I >> don't see the point in knowing the exact mass(es) that the search engine >> expected and the observed ion(s) that it matched to. And if I plan to >> reproduce the score, that probably means I have access to the search >> engine's algorithm, so I'd just regenerate the comparison. >> >> As for mapping to the observed ion(s), I think it's not relevant for the >> purposes of basic annotation. For clarity of presentation, viewers >> usually show the ion as either a logical point in the spectrum >> independent of the data itself, or they map it to the most abundant peak >> in the window. These approaches can be combined by changing the >> annotation when the user zooms in. >> >> So yes, in this approach we have information loss. But I think it's >> better than not having the information at all (and depending on a >> vendor-supplied and version-dependent script to regenerate it) and >> certainly better than choking on 10gb analysis files. ;) >> >> -Matt >> >> >> Jones, Andy wrote: >> >>> Hi all, >>> >>> >>> >>>> An example to show how compact it could be: >>>> fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" >>>> >>>> >>> I have a couple of queries about this proposal... >>> >>> Given a peptide sequence, we would be able to work out what were the >>> >> expected masses of these fragments, assuming a standard method of calculating >> the masses of the b and y ions (and losses) - do all search engines use the same >> equation to calculate ion masses? >> >>> We wouldn't really know which peaks in the source spectrum corresponded with >>> >> which ion. For many of the peaks we would be able to make a fair guess i.e. there >> is an observed peak within the tolerance window matching the expected mass but >> this doesn't help when there are multiple peaks within the window - I don't think we >> could correctly assume it would always be the most abundant peak...? >> >>> In other words, we still have information loss. Perhaps one way forward would >>> >> be for us to list the use cases that fragment ions must be reported for - do we >> have a list of use cases anywhere? >> >>> I think getting this right will be a long process, so we have to make sure that we >>> >> have a strong enough use case if we really want to get this into analysisXML >> version1. >> >>> Cheers >>> Andy >>> >>> >>> >>> > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > |
From: David C. <dc...@ma...> - 2008-07-21 14:55:00
|
What about internal fragments, immonium ions, side chain cleavages? Or would we just ignore these... David Matthew Chambers wrote: > By standard grammar are you referring to the little format I came up with? > > <a|b|c|x|y|z><# between 1 and peptide_length>[<+|-><formula>][,(<+|-><charge>] > > It seems pretty easy to verify to me - easier than some of the other > features in mzML and analysisXML. :) The hardest part is the <formula> > and verifying that the ion series # is between 1 and peptide_length. > Those may have to be semantic rather than syntactic verification steps. > Making the charge part mandatory would simplify the format. I think the > auxiliary file would rarely be written and/or copied along with the > original file, so it wouldn't do much good. If it's a concern, the > <formula> part could wait until a later version. > > -Matt > > > Jones, Andy wrote: >> Hi Matt, >> >>> As for mapping to the observed ion(s), I think it's not relevant for the >>> purposes of basic annotation. For clarity of presentation, viewers >>> usually show the ion as either a logical point in the spectrum >>> independent of the data itself, or they map it to the most abundant peak >>> in the window. These approaches can be combined by changing the >>> annotation when the user zooms in. >>> >> Agreed, I can see the use case for viewers. Are there any others...? >> >> The problem I have at the moment is that we're a long way from having this standard grammar specified in a formal way which could be verified. One option to consider is defining an auxiliary (non-XML) file which could be transferred in parallel - this way we can keep it outside the formal analysisXML standard, in which we try out something similar to your proposal and see if we can get the main search engines to output something consistent. If successful, roll it into analysisXML v2...? >> >> Andy >> >> >> >> >> >> >> >> >> >>> -----Original Message----- >>> From: Matt Chambers [mailto:mat...@va...] >>> Sent: 21 July 2008 14:23 >>> To: Jones, Andy >>> Cc: psi...@li... >>> Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is currently >>> handled in PRIDE (Issue 28) >>> >>> Hi Andy, >>> >>> As we have both said, it's important to determine the use cases for this >>> information. :) The only reasonable use case that doesn't take up oodles >>> of disk space is simply knowing the ion types that were predicted. >>> >>> Unless I planned to reproduce the search engine's comparison exactly, I >>> don't see the point in knowing the exact mass(es) that the search engine >>> expected and the observed ion(s) that it matched to. And if I plan to >>> reproduce the score, that probably means I have access to the search >>> engine's algorithm, so I'd just regenerate the comparison. >>> >>> As for mapping to the observed ion(s), I think it's not relevant for the >>> purposes of basic annotation. For clarity of presentation, viewers >>> usually show the ion as either a logical point in the spectrum >>> independent of the data itself, or they map it to the most abundant peak >>> in the window. These approaches can be combined by changing the >>> annotation when the user zooms in. >>> >>> So yes, in this approach we have information loss. But I think it's >>> better than not having the information at all (and depending on a >>> vendor-supplied and version-dependent script to regenerate it) and >>> certainly better than choking on 10gb analysis files. ;) >>> >>> -Matt >>> >>> >>> Jones, Andy wrote: >>> >>>> Hi all, >>>> >>>> >>>> >>>>> An example to show how compact it could be: >>>>> fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" >>>>> >>>>> >>>> I have a couple of queries about this proposal... >>>> >>>> Given a peptide sequence, we would be able to work out what were the >>>> >>> expected masses of these fragments, assuming a standard method of calculating >>> the masses of the b and y ions (and losses) - do all search engines use the same >>> equation to calculate ion masses? >>> >>>> We wouldn't really know which peaks in the source spectrum corresponded with >>>> >>> which ion. For many of the peaks we would be able to make a fair guess i.e. there >>> is an observed peak within the tolerance window matching the expected mass but >>> this doesn't help when there are multiple peaks within the window - I don't think we >>> could correctly assume it would always be the most abundant peak...? >>> >>>> In other words, we still have information loss. Perhaps one way forward would >>>> >>> be for us to list the use cases that fragment ions must be reported for - do we >>> have a list of use cases anywhere? >>> >>>> I think getting this right will be a long process, so we have to make sure that we >>>> >>> have a strong enough use case if we really want to get this into analysisXML >>> version1. >>> >>>> Cheers >>>> Andy >>>> >>>> >>>> >>>> >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >> Build the coolest Linux based applications with Moblin SDK & win great prizes >> Grand prize is a trip for two to an Open Source event anywhere in the world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> >> > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: Matthew C. <mat...@va...> - 2008-07-21 15:27:22
|
Hi David, I have heard of those fragment types, but I don't know enough about them to propose a grammar. From what I know about immonium ions they would be simple enough, but the other two do seem too ugly to represent with a single label. I think we can safely ignore these, at least until a later version. -Matt David Creasy wrote: > What about internal fragments, immonium ions, side chain cleavages? > Or would we just ignore these... > David > > Matthew Chambers wrote: >> By standard grammar are you referring to the little format I came up >> with? >> >> <a|b|c|x|y|z><# between 1 and >> peptide_length>[<+|-><formula>][,(<+|-><charge>] >> >> It seems pretty easy to verify to me - easier than some of the other >> features in mzML and analysisXML. :) The hardest part is the >> <formula> and verifying that the ion series # is between 1 and >> peptide_length. Those may have to be semantic rather than syntactic >> verification steps. Making the charge part mandatory would simplify >> the format. I think the auxiliary file would rarely be written and/or >> copied along with the original file, so it wouldn't do much good. If >> it's a concern, the <formula> part could wait until a later version. >> >> -Matt >> >> >> Jones, Andy wrote: >>> Hi Matt, >>> >>>> As for mapping to the observed ion(s), I think it's not relevant >>>> for the >>>> purposes of basic annotation. For clarity of presentation, viewers >>>> usually show the ion as either a logical point in the spectrum >>>> independent of the data itself, or they map it to the most abundant >>>> peak >>>> in the window. These approaches can be combined by changing the >>>> annotation when the user zooms in. >>>> >>> Agreed, I can see the use case for viewers. Are there any others...? >>> The problem I have at the moment is that we're a long way from >>> having this standard grammar specified in a formal way which could >>> be verified. One option to consider is defining an auxiliary >>> (non-XML) file which could be transferred in parallel - this way we >>> can keep it outside the formal analysisXML standard, in which we try >>> out something similar to your proposal and see if we can get the >>> main search engines to output something consistent. If successful, >>> roll it into analysisXML v2...? >>> >>> Andy >>> >>> >>> >>> >>> >>> >>> >>> >>> >>>> -----Original Message----- >>>> From: Matt Chambers [mailto:mat...@va...] >>>> Sent: 21 July 2008 14:23 >>>> To: Jones, Andy >>>> Cc: psi...@li... >>>> Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it >>>> is currently >>>> handled in PRIDE (Issue 28) >>>> >>>> Hi Andy, >>>> >>>> As we have both said, it's important to determine the use cases for >>>> this >>>> information. :) The only reasonable use case that doesn't take up >>>> oodles >>>> of disk space is simply knowing the ion types that were predicted. >>>> >>>> Unless I planned to reproduce the search engine's comparison >>>> exactly, I >>>> don't see the point in knowing the exact mass(es) that the search >>>> engine >>>> expected and the observed ion(s) that it matched to. And if I plan to >>>> reproduce the score, that probably means I have access to the search >>>> engine's algorithm, so I'd just regenerate the comparison. >>>> >>>> As for mapping to the observed ion(s), I think it's not relevant >>>> for the >>>> purposes of basic annotation. For clarity of presentation, viewers >>>> usually show the ion as either a logical point in the spectrum >>>> independent of the data itself, or they map it to the most abundant >>>> peak >>>> in the window. These approaches can be combined by changing the >>>> annotation when the user zooms in. >>>> >>>> So yes, in this approach we have information loss. But I think it's >>>> better than not having the information at all (and depending on a >>>> vendor-supplied and version-dependent script to regenerate it) and >>>> certainly better than choking on 10gb analysis files. ;) >>>> >>>> -Matt >>>> >>>> >>>> Jones, Andy wrote: >>>> >>>>> Hi all, >>>>> >>>>> >>>>> >>>>>> An example to show how compact it could be: >>>>>> fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" >>>>>> >>>>>> >>>>> I have a couple of queries about this proposal... >>>>> >>>>> Given a peptide sequence, we would be able to work out what were the >>>>> >>>> expected masses of these fragments, assuming a standard method of >>>> calculating >>>> the masses of the b and y ions (and losses) - do all search engines >>>> use the same >>>> equation to calculate ion masses? >>>> >>>>> We wouldn't really know which peaks in the source spectrum >>>>> corresponded with >>>>> >>>> which ion. For many of the peaks we would be able to make a fair >>>> guess i.e. there >>>> is an observed peak within the tolerance window matching the >>>> expected mass but >>>> this doesn't help when there are multiple peaks within the window - >>>> I don't think we >>>> could correctly assume it would always be the most abundant peak...? >>>> >>>>> In other words, we still have information loss. Perhaps one way >>>>> forward would >>>>> >>>> be for us to list the use cases that fragment ions must be reported >>>> for - do we >>>> have a list of use cases anywhere? >>>> >>>>> I think getting this right will be a long process, so we have to >>>>> make sure that we >>>>> >>>> have a strong enough use case if we really want to get this into >>>> analysisXML >>>> version1. >>>> >>>>> Cheers >>>>> Andy >>>>> >>>>> >>>>> >>>>> >>> >>> ------------------------------------------------------------------------- >>> >>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>> challenge >>> Build the coolest Linux based applications with Moblin SDK & win >>> great prizes >>> Grand prize is a trip for two to an Open Source event anywhere in >>> the world >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> >> >> ------------------------------------------------------------------------- >> >> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >> Build the coolest Linux based applications with Moblin SDK & win >> great prizes >> Grand prize is a trip for two to an Open Source event anywhere in the >> world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > |
From: Jones, A. <And...@li...> - 2008-08-01 10:03:49
|
Hi all, Here's a proposal for fragmentation ions as discussed on the call that's halfway between using cvParams for all values and using an array based encoding. I think it's pretty flexible and concise. First up, setup a FragmentationTable for the entire list of the spectra, which says the kinds of measures you're going to report lower down: <SpectrumIdentificationList id="MASCOT_results"> <FragmentationTable> <Measures> <Measure id = "m1"> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion m/z"/> </Measure> <Measure id = "m2"> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion intensity"/> </Measure> <Measure id = "m3"> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z error"/> </Measure> <Measure id = "m4"> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion retention time error"/> </Measure> </Measures> </FragmentationTable> Then for each SpectrumIdentificationItem, you reference back to these Measures <SpectrumIdentificationItem id="SEQ_spec1_pep1" Peptide_ref="prot1_pep1" chargeState="1"> <PeptideEvidence id="PE1_SEQ_spec1_pep1" start="67" pre="-" end="79" isDecoy="false" /> ... <Fragmentation> <IonType> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" value="3"/> <FragArrayIndex values = "3 8 10"/> <FragArray Measure_ref = "m1" values = "379.2215 457.1234 540.234"/> <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> <!-- and so on for other measures as defined in the FragmentationTable --> </IonType> <IonType> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="4"/> <FragArrayIndex values = "2 12 14"/> <FragArray Measure_ref = "m1" values = "560.153 859.111 945.653"/> <FragArray Measure_ref = "m2" values = "502.0 330.5 559.5"/> <!-- and so on for other measures as defined in the FragmentationTable --> </IonType> </Fragmentation> Each array contains space separated values (i.e. an xsd:list). The FragArrayIndex tells you which ions you've found i.e. for the second IonType we have b2 b12 and b14 which have the m/z and intensity values in the m1 and m2 arrays. This will save a lot of space if there are many ions of the same type in each array and I think it is fairly easy to read as well. Slightly more space could be saved by defining the ion types in the FragmentationTable but not much really once you've added a reference back up to it. Cheers Andy > -----Original Message----- > From: psi...@li... [mailto:psidev-pi-dev- > bo...@li...] On Behalf Of Matthew Chambers > Sent: 18 July 2008 16:00 > To: psi...@li... > Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is currently > handled in PRIDE (Issue 28) > > I also agree that anything beyond an array is far too verbose. To answer > this question, I think we need to decide the scope of the problem. What > do we want fragment ion information to represent? I think analysis > software is too diverse to use it for anything more than basic > annotation, but basic annotation is important. If there are ways people > want it to be usable beyond that, speak up. :) > > For basic annotation, all I think is needed is the fragment type, series > number, charge state, and possibly any modification like a neutral loss > or radical. The array can be an attribute or text node. We can use a > grammar for each term, where each term represents an ion and terms are > space delimited. The grammar might look like: <a|b|c|x|y|z><# between 1 > and peptide_length>[<+|-><formula>][,(<+|-><charge>] > We could make the charge part mandatory or if it was optional, assume a > +1 charge (or possibly allow the charge to be based on the polarity of > the source scan?). I assume there is a standard chemical formula format > that can be represented compactly in ASCII text, but I don't know it. > An example to show how compact it could be: > fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" > > For basic annotation, the masses are not necessary I think. Expected > mass can be recomputed if all the label metadata is complete and > regular, and the observed mass is unimportant for annotation (IMO). > > -Matt > > > David Creasy wrote: > > Hi Phil, > > > > Just to be sure I've not misunderstood... from below, each fragment ion > > takes approx 500 bytes. Lets assume a conservative average of 20 > > fragment matches per spectrum and a modest search with 100k spectra. > > Assuming that we just report fragment matches for the top match for each > > spectrum, this would result in a file that is 500 x 20 x 100,000 = 1Gb. > > If we reported fragment matches for the the top 10 matches for each > > spectrum, this would be 10Gb. Is this reasonable and acceptable? > > > > David > > > > > > > > Phil Jones @ EBI wrote: > > > >> Hi, > >> > >> Regarding Issue 28 > >> <http://code.google.com/p/psi-pi/issues/detail?id=28> "support > >> reporting of fragment ions" > >> > >> As a suggestion of how this might be tackled: > >> > >> The latest development version of the PRIDE database includes a very > >> simple mechanism > >> for recording fragment ion information, illustrated below. (Please > >> note - made up data.) > >> > >> In this example, CV terms are used to define the type of ion and > >> related information > >> / annotation. Note that this is even more simple that the suggestion > >> made by Andy > >> above - no attempt is made here to indicate which residue has been > >> called for each > >> fragment ion - it is just listing the ions. > >> > >> Also note that while the PeptideItem is referencing the mass spectrum (which is > >> reported in detail in the associated mzData file), the individual > >> fragment ions are > >> just reporting the m/z value and not attempting to make any kind of > >> hard reference to > >> the spectrum. > >> > >> As you can see, this has been developed in collaboration with Waters, > >> with output > >> from the ProteinLynx Global Server. (Actual values / sequence have > >> been changed). > >> > >> One possible change would be to make the m/z value an attribute of the > >> FragmentIon element, as this value will be mandatory and required to > >> relate the fragment ion to the correct peak on the mass spectrum. The > >> CV used for the annotation would also need to be part of the PI CV ?? > >> > >> Note that in the existing model, there are other terms available, to > >> allow any kind of fragment ion to be described (not just B and Y ions) > >> > >> In the context of analysisXML, the <FragmentIon/> elements would be > >> children of a <SpectrumIdentificationResultItem/> > >> > >> best regards, > >> > >> Phil. > >> > >> <PeptideItem> > >> <Sequence>LFQQSQWTREVFSNSCK</Sequence> > >> <Start>435</Start> > >> <End>460</End> > >> <SpectrumReference>123</SpectrumReference> > >> <FragmentIon> > >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > value="3"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> m/z" value="379.2215"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> intensity" value="1382.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > >> error" value="-7.1543"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> retention time error" value="0.0207"/> > >> </FragmentIon> > >> <FragmentIon> > >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > value="4"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> m/z" value="534.2811"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> intensity" value="1242.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > >> error" value="-8.2315"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> retention time error" value="0.0029"/> > >> </FragmentIon> > >> <FragmentIon> > >> <cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" > value="3"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> m/z" value="394.1813"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> intensity" value="1917.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > >> error" value="-14.7098"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> retention time error" value="-0.0013"/> > >> </FragmentIon> > >> <FragmentIon> > >> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" > value="3"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> m/z" value="367.1669"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> intensity" value="345.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > >> error" value="-18.767"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> retention time error" value="0.0025"/> > >> </FragmentIon> > >> <additional> > >> <cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor mass" > >> value="1971.9194"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor > >> intensity" value="181349.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor error > >> in ppm" value="0.8043"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor > >> retention time in minutes" value="57.3537"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion > >> mass RMS error" value="14.5969"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion > >> retention time RMS error" value="0.0093"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted > >> average charge state" value="2.2"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one match" > >> value="" /> > >> </additional> > >> </PeptideItem> > >> > >> > >> -- > >> Phil Jones > >> Senior Software Engineer > >> PRIDE Project Team > >> PANDA Group, EMBL-EBI > >> Wellcome Trust Genome Campus > >> Hinxton, Cambridge, CB10 1SD > >> UK. > >> > >> Work phone: +44 1223 492662 (NEW NUMBER) > >> Skype: philip-jones > >> > >> ------------------------------------------------------------------------- > >> This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > >> Build the coolest Linux based applications with Moblin SDK & win great prizes > >> Grand prize is a trip for two to an Open Source event anywhere in the world > >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ > >> _______________________________________________ > >> Psidev-pi-dev mailing list > >> Psi...@li... > >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >> > > > > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev |
From: Phil J. @ E. <pj...@eb...> - 2008-08-01 10:22:52
|
Hi Andy, This looks really good - both flexible and compact. Just to clarify - in your example: <IonType> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" value="3"/> <FragArrayIndex values = "3 8 10"/> <FragArray Measure_ref = "m1" values = "379.2215 457.1234 540.234"/> <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> <!-- and so on for other measures as defined in the FragmentationTable --> </IonType> If this is describing three Y-H20 ions, 3, 8 and 10 (i.e. all of the Y-H20 ions for this peptide identification) then the attribute value="3" on the cvParam element should be removed - or have I misunderstood how this works? Please excuse me for stating the obvious, but... there is no reason why the pointers m1, m2, m3, m4 could not be more human readable, so changed in this example to mz, inten, mz_error, ret_error for example. (To help implementors understand the mechanism). best regards, Phil. 2008/8/1 Jones, Andy <And...@li...>: > Hi all, > > Here's a proposal for fragmentation ions as discussed on the call that's halfway between using cvParams for all values and using an array based encoding. I think it's pretty flexible and concise. > > > First up, setup a FragmentationTable for the entire list of the spectra, which says the kinds of measures you're going to report lower down: > > > <SpectrumIdentificationList id="MASCOT_results"> > <FragmentationTable> > <Measures> > <Measure id = "m1"> > <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion m/z"/> > </Measure> > <Measure id = "m2"> > <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion intensity"/> > </Measure> > <Measure id = "m3"> > <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z error"/> > </Measure> > <Measure id = "m4"> > <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion retention time error"/> > </Measure> > </Measures> > </FragmentationTable> > > Then for each SpectrumIdentificationItem, you reference back to these Measures > > <SpectrumIdentificationItem id="SEQ_spec1_pep1" Peptide_ref="prot1_pep1" chargeState="1"> > <PeptideEvidence id="PE1_SEQ_spec1_pep1" start="67" pre="-" end="79" isDecoy="false" /> > > ... > > <Fragmentation> > <IonType> > <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" value="3"/> > <FragArrayIndex values = "3 8 10"/> > <FragArray Measure_ref = "m1" values = "379.2215 457.1234 540.234"/> > <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> > <!-- and so on for other measures as defined in the FragmentationTable --> > </IonType> > <IonType> > <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="4"/> > <FragArrayIndex values = "2 12 14"/> > <FragArray Measure_ref = "m1" values = "560.153 859.111 945.653"/> > <FragArray Measure_ref = "m2" values = "502.0 330.5 559.5"/> > <!-- and so on for other measures as defined in the FragmentationTable --> > </IonType> > > </Fragmentation> > > > Each array contains space separated values (i.e. an xsd:list). The FragArrayIndex tells you which ions you've found i.e. for the second IonType we have b2 b12 and b14 which have the m/z and intensity values in the m1 and m2 arrays. This will save a lot of space if there are many ions of the same type in each array and I think it is fairly easy to read as well. Slightly more space could be saved by defining the ion types in the FragmentationTable but not much really once you've added a reference back up to it. > > Cheers > Andy > > > > > > > > >> -----Original Message----- >> From: psi...@li... [mailto:psidev-pi-dev- >> bo...@li...] On Behalf Of Matthew Chambers >> Sent: 18 July 2008 16:00 >> To: psi...@li... >> Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is currently >> handled in PRIDE (Issue 28) >> >> I also agree that anything beyond an array is far too verbose. To answer >> this question, I think we need to decide the scope of the problem. What >> do we want fragment ion information to represent? I think analysis >> software is too diverse to use it for anything more than basic >> annotation, but basic annotation is important. If there are ways people >> want it to be usable beyond that, speak up. :) >> >> For basic annotation, all I think is needed is the fragment type, series >> number, charge state, and possibly any modification like a neutral loss >> or radical. The array can be an attribute or text node. We can use a >> grammar for each term, where each term represents an ion and terms are >> space delimited. The grammar might look like: <a|b|c|x|y|z><# between 1 >> and peptide_length>[<+|-><formula>][,(<+|-><charge>] >> We could make the charge part mandatory or if it was optional, assume a >> +1 charge (or possibly allow the charge to be based on the polarity of >> the source scan?). I assume there is a standard chemical formula format >> that can be represented compactly in ASCII text, but I don't know it. >> An example to show how compact it could be: >> fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" >> >> For basic annotation, the masses are not necessary I think. Expected >> mass can be recomputed if all the label metadata is complete and >> regular, and the observed mass is unimportant for annotation (IMO). >> >> -Matt >> >> >> David Creasy wrote: >> > Hi Phil, >> > >> > Just to be sure I've not misunderstood... from below, each fragment ion >> > takes approx 500 bytes. Lets assume a conservative average of 20 >> > fragment matches per spectrum and a modest search with 100k spectra. >> > Assuming that we just report fragment matches for the top match for each >> > spectrum, this would result in a file that is 500 x 20 x 100,000 = 1Gb. >> > If we reported fragment matches for the the top 10 matches for each >> > spectrum, this would be 10Gb. Is this reasonable and acceptable? >> > >> > David >> > >> > >> > >> > Phil Jones @ EBI wrote: >> > >> >> Hi, >> >> >> >> Regarding Issue 28 >> >> <http://code.google.com/p/psi-pi/issues/detail?id=28> "support >> >> reporting of fragment ions" >> >> >> >> As a suggestion of how this might be tackled: >> >> >> >> The latest development version of the PRIDE database includes a very >> >> simple mechanism >> >> for recording fragment ion information, illustrated below. (Please >> >> note - made up data.) >> >> >> >> In this example, CV terms are used to define the type of ion and >> >> related information >> >> / annotation. Note that this is even more simple that the suggestion >> >> made by Andy >> >> above - no attempt is made here to indicate which residue has been >> >> called for each >> >> fragment ion - it is just listing the ions. >> >> >> >> Also note that while the PeptideItem is referencing the mass spectrum (which is >> >> reported in detail in the associated mzData file), the individual >> >> fragment ions are >> >> just reporting the m/z value and not attempting to make any kind of >> >> hard reference to >> >> the spectrum. >> >> >> >> As you can see, this has been developed in collaboration with Waters, >> >> with output >> >> from the ProteinLynx Global Server. (Actual values / sequence have >> >> been changed). >> >> >> >> One possible change would be to make the m/z value an attribute of the >> >> FragmentIon element, as this value will be mandatory and required to >> >> relate the fragment ion to the correct peak on the mass spectrum. The >> >> CV used for the annotation would also need to be part of the PI CV ?? >> >> >> >> Note that in the existing model, there are other terms available, to >> >> allow any kind of fragment ion to be described (not just B and Y ions) >> >> >> >> In the context of analysisXML, the <FragmentIon/> elements would be >> >> children of a <SpectrumIdentificationResultItem/> >> >> >> >> best regards, >> >> >> >> Phil. >> >> >> >> <PeptideItem> >> >> <Sequence>LFQQSQWTREVFSNSCK</Sequence> >> >> <Start>435</Start> >> >> <End>460</End> >> >> <SpectrumReference>123</SpectrumReference> >> >> <FragmentIon> >> >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" >> value="3"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >> >> m/z" value="379.2215"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >> >> intensity" value="1382.0"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z >> >> error" value="-7.1543"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >> >> retention time error" value="0.0207"/> >> >> </FragmentIon> >> >> <FragmentIon> >> >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" >> value="4"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >> >> m/z" value="534.2811"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >> >> intensity" value="1242.0"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z >> >> error" value="-8.2315"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >> >> retention time error" value="0.0029"/> >> >> </FragmentIon> >> >> <FragmentIon> >> >> <cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" >> value="3"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >> >> m/z" value="394.1813"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >> >> intensity" value="1917.0"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z >> >> error" value="-14.7098"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >> >> retention time error" value="-0.0013"/> >> >> </FragmentIon> >> >> <FragmentIon> >> >> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" >> value="3"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >> >> m/z" value="367.1669"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >> >> intensity" value="345.0"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z >> >> error" value="-18.767"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >> >> retention time error" value="0.0025"/> >> >> </FragmentIon> >> >> <additional> >> >> <cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor mass" >> >> value="1971.9194"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor >> >> intensity" value="181349.0"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor error >> >> in ppm" value="0.8043"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor >> >> retention time in minutes" value="57.3537"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion >> >> mass RMS error" value="14.5969"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion >> >> retention time RMS error" value="0.0093"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted >> >> average charge state" value="2.2"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one match" >> >> value="" /> >> >> </additional> >> >> </PeptideItem> >> >> >> >> >> >> -- >> >> Phil Jones >> >> Senior Software Engineer >> >> PRIDE Project Team >> >> PANDA Group, EMBL-EBI >> >> Wellcome Trust Genome Campus >> >> Hinxton, Cambridge, CB10 1SD >> >> UK. >> >> >> >> Work phone: +44 1223 492662 (NEW NUMBER) >> >> Skype: philip-jones >> >> >> >> ------------------------------------------------------------------------- >> >> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >> >> Build the coolest Linux based applications with Moblin SDK & win great prizes >> >> Grand prize is a trip for two to an Open Source event anywhere in the world >> >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> >> _______________________________________________ >> >> Psidev-pi-dev mailing list >> >> Psi...@li... >> >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> >> >> > >> > >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >> Build the coolest Linux based applications with Moblin SDK & win great prizes >> Grand prize is a trip for two to an Open Source event anywhere in the world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > -- Phil Jones Senior Software Engineer PRIDE Project Team PANDA Group, EMBL-EBI Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD UK. Work phone: +44 1223 492662 (NEW NUMBER) Skype: philip-jones |
From: Jones, A. <And...@li...> - 2008-08-01 10:26:05
|
> If this is describing three Y-H20 ions, 3, 8 and 10 (i.e. all of the > Y-H20 ions for this peptide identification) then the attribute > value="3" on the cvParam element should be removed - or have I > misunderstood how this works? Correct, my mistake. The example says we have found y3-H2O y8-H2O and y10-H2O, the cvParam should not have had the value <Fragmentation> <IonType> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O"/> <FragArrayIndex values = "3 8 10"/> <FragArray Measure_ref = "m1" values = "379.2215 457.12345 540.234"/> <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> <!-- and so on for other measures as defined in the FragmentationTable --> </IonType> <IonType> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion"/> <FragArrayIndex values = "2 12 14"/> <FragArray Measure_ref = "m1" values = "560.153 859.111 945.653"/> <FragArray Measure_ref = "m2" values = "502.0 330.5 559.5"/> <!-- and so on for other measures as defined in the FragmentationTable --> </IonType> </Fragmentation> > Please excuse me for stating the obvious, but... there is no reason > why the pointers m1, m2, m3, m4 could not be more human readable, so > changed in this example to mz, inten, mz_error, ret_error for example. > (To help implementors understand the mechanism). Good suggestion. Cheers Andy > -----Original Message----- > From: phi...@go... [mailto:phi...@go...] On > Behalf Of Phil Jones @ EBI > Sent: 01 August 2008 11:23 > To: Jones, Andy; psi...@li... > Subject: Re: [Psidev-pi-dev] Fragmentation Ions > > Hi Andy, > > This looks really good - both flexible and compact. > > Just to clarify - in your example: > > <IonType> > <cvParam cvLabel="Waters" accession="PLGS:00035" > name="y ion -H2O" value="3"/> > <FragArrayIndex values = "3 8 10"/> > <FragArray Measure_ref = "m1" values = "379.2215 > 457.1234 540.234"/> > <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> > <!-- and so on for other measures as defined in the > FragmentationTable --> > </IonType> > > If this is describing three Y-H20 ions, 3, 8 and 10 (i.e. all of the > Y-H20 ions for this peptide identification) then the attribute > value="3" on the cvParam element should be removed - or have I > misunderstood how this works? > > Please excuse me for stating the obvious, but... there is no reason > why the pointers m1, m2, m3, m4 could not be more human readable, so > changed in this example to mz, inten, mz_error, ret_error for example. > (To help implementors understand the mechanism). > > best regards, > > Phil. > > > > 2008/8/1 Jones, Andy <And...@li...>: > > Hi all, > > > > Here's a proposal for fragmentation ions as discussed on the call that's halfway > between using cvParams for all values and using an array based encoding. I think > it's pretty flexible and concise. > > > > > > First up, setup a FragmentationTable for the entire list of the spectra, which says > the kinds of measures you're going to report lower down: > > > > > > <SpectrumIdentificationList id="MASCOT_results"> > > <FragmentationTable> > > <Measures> > > <Measure id = "m1"> > > <cvParam cvLabel="Waters" accession="PLGS:00024" > name="product ion m/z"/> > > </Measure> > > <Measure id = "m2"> > > <cvParam cvLabel="Waters" accession="PLGS:00025" > name="product ion intensity"/> > > </Measure> > > <Measure id = "m3"> > > <cvParam cvLabel="Waters" accession="PLGS:00026" > name="product ion m/z error"/> > > </Measure> > > <Measure id = "m4"> > > <cvParam cvLabel="Waters" accession="PLGS:00027" > name="product ion retention time error"/> > > </Measure> > > </Measures> > > </FragmentationTable> > > > > Then for each SpectrumIdentificationItem, you reference back to these > Measures > > > > <SpectrumIdentificationItem id="SEQ_spec1_pep1" Peptide_ref="prot1_pep1" > chargeState="1"> > > <PeptideEvidence id="PE1_SEQ_spec1_pep1" start="67" pre="-" end="79" > isDecoy="false" /> > > > > ... > > > > <Fragmentation> > > <IonType> > > <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion - > H2O" value="3"/> > > <FragArrayIndex values = "3 8 10"/> > > <FragArray Measure_ref = "m1" values = "379.2215 457.1234 > 540.234"/> > > <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> > > <!-- and so on for other measures as defined in the > FragmentationTable --> > > </IonType> > > <IonType> > > <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > value="4"/> > > <FragArrayIndex values = "2 12 14"/> > > <FragArray Measure_ref = "m1" values = "560.153 859.111 > 945.653"/> > > <FragArray Measure_ref = "m2" values = "502.0 330.5 559.5"/> > > <!-- and so on for other measures as defined in the > FragmentationTable --> > > </IonType> > > > > </Fragmentation> > > > > > > Each array contains space separated values (i.e. an xsd:list). The FragArrayIndex > tells you which ions you've found i.e. for the second IonType we have b2 b12 and > b14 which have the m/z and intensity values in the m1 and m2 arrays. This will > save a lot of space if there are many ions of the same type in each array and I > think it is fairly easy to read as well. Slightly more space could be saved by > defining the ion types in the FragmentationTable but not much really once you've > added a reference back up to it. > > > > Cheers > > Andy > > > > > > > > > > > > > > > > > >> -----Original Message----- > >> From: psi...@li... [mailto:psidev-pi-dev- > >> bo...@li...] On Behalf Of Matthew Chambers > >> Sent: 18 July 2008 16:00 > >> To: psi...@li... > >> Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is currently > >> handled in PRIDE (Issue 28) > >> > >> I also agree that anything beyond an array is far too verbose. To answer > >> this question, I think we need to decide the scope of the problem. What > >> do we want fragment ion information to represent? I think analysis > >> software is too diverse to use it for anything more than basic > >> annotation, but basic annotation is important. If there are ways people > >> want it to be usable beyond that, speak up. :) > >> > >> For basic annotation, all I think is needed is the fragment type, series > >> number, charge state, and possibly any modification like a neutral loss > >> or radical. The array can be an attribute or text node. We can use a > >> grammar for each term, where each term represents an ion and terms are > >> space delimited. The grammar might look like: <a|b|c|x|y|z><# between 1 > >> and peptide_length>[<+|-><formula>][,(<+|-><charge>] > >> We could make the charge part mandatory or if it was optional, assume a > >> +1 charge (or possibly allow the charge to be based on the polarity of > >> the source scan?). I assume there is a standard chemical formula format > >> that can be represented compactly in ASCII text, but I don't know it. > >> An example to show how compact it could be: > >> fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" > >> > >> For basic annotation, the masses are not necessary I think. Expected > >> mass can be recomputed if all the label metadata is complete and > >> regular, and the observed mass is unimportant for annotation (IMO). > >> > >> -Matt > >> > >> > >> David Creasy wrote: > >> > Hi Phil, > >> > > >> > Just to be sure I've not misunderstood... from below, each fragment ion > >> > takes approx 500 bytes. Lets assume a conservative average of 20 > >> > fragment matches per spectrum and a modest search with 100k spectra. > >> > Assuming that we just report fragment matches for the top match for each > >> > spectrum, this would result in a file that is 500 x 20 x 100,000 = 1Gb. > >> > If we reported fragment matches for the the top 10 matches for each > >> > spectrum, this would be 10Gb. Is this reasonable and acceptable? > >> > > >> > David > >> > > >> > > >> > > >> > Phil Jones @ EBI wrote: > >> > > >> >> Hi, > >> >> > >> >> Regarding Issue 28 > >> >> <http://code.google.com/p/psi-pi/issues/detail?id=28> "support > >> >> reporting of fragment ions" > >> >> > >> >> As a suggestion of how this might be tackled: > >> >> > >> >> The latest development version of the PRIDE database includes a very > >> >> simple mechanism > >> >> for recording fragment ion information, illustrated below. (Please > >> >> note - made up data.) > >> >> > >> >> In this example, CV terms are used to define the type of ion and > >> >> related information > >> >> / annotation. Note that this is even more simple that the suggestion > >> >> made by Andy > >> >> above - no attempt is made here to indicate which residue has been > >> >> called for each > >> >> fragment ion - it is just listing the ions. > >> >> > >> >> Also note that while the PeptideItem is referencing the mass spectrum > (which is > >> >> reported in detail in the associated mzData file), the individual > >> >> fragment ions are > >> >> just reporting the m/z value and not attempting to make any kind of > >> >> hard reference to > >> >> the spectrum. > >> >> > >> >> As you can see, this has been developed in collaboration with Waters, > >> >> with output > >> >> from the ProteinLynx Global Server. (Actual values / sequence have > >> >> been changed). > >> >> > >> >> One possible change would be to make the m/z value an attribute of the > >> >> FragmentIon element, as this value will be mandatory and required to > >> >> relate the fragment ion to the correct peak on the mass spectrum. The > >> >> CV used for the annotation would also need to be part of the PI CV ?? > >> >> > >> >> Note that in the existing model, there are other terms available, to > >> >> allow any kind of fragment ion to be described (not just B and Y ions) > >> >> > >> >> In the context of analysisXML, the <FragmentIon/> elements would be > >> >> children of a <SpectrumIdentificationResultItem/> > >> >> > >> >> best regards, > >> >> > >> >> Phil. > >> >> > >> >> <PeptideItem> > >> >> <Sequence>LFQQSQWTREVFSNSCK</Sequence> > >> >> <Start>435</Start> > >> >> <End>460</End> > >> >> <SpectrumReference>123</SpectrumReference> > >> >> <FragmentIon> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > >> value="3"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> >> m/z" value="379.2215"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> >> intensity" value="1382.0"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion > m/z > >> >> error" value="-7.1543"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> >> retention time error" value="0.0207"/> > >> >> </FragmentIon> > >> >> <FragmentIon> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > >> value="4"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> >> m/z" value="534.2811"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> >> intensity" value="1242.0"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion > m/z > >> >> error" value="-8.2315"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> >> retention time error" value="0.0029"/> > >> >> </FragmentIon> > >> >> <FragmentIon> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" > >> value="3"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> >> m/z" value="394.1813"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> >> intensity" value="1917.0"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion > m/z > >> >> error" value="-14.7098"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> >> retention time error" value="-0.0013"/> > >> >> </FragmentIon> > >> >> <FragmentIon> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" > >> value="3"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> >> m/z" value="367.1669"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> >> intensity" value="345.0"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion > m/z > >> >> error" value="-18.767"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> >> retention time error" value="0.0025"/> > >> >> </FragmentIon> > >> >> <additional> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor > mass" > >> >> value="1971.9194"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor > >> >> intensity" value="181349.0"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor > error > >> >> in ppm" value="0.8043"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor > >> >> retention time in minutes" value="57.3537"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion > >> >> mass RMS error" value="14.5969"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion > >> >> retention time RMS error" value="0.0093"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted > >> >> average charge state" value="2.2"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one > match" > >> >> value="" /> > >> >> </additional> > >> >> </PeptideItem> > >> >> > >> >> > >> >> -- > >> >> Phil Jones > >> >> Senior Software Engineer > >> >> PRIDE Project Team > >> >> PANDA Group, EMBL-EBI > >> >> Wellcome Trust Genome Campus > >> >> Hinxton, Cambridge, CB10 1SD > >> >> UK. > >> >> > >> >> Work phone: +44 1223 492662 (NEW NUMBER) > >> >> Skype: philip-jones > >> >> > >> >> ------------------------------------------------------------------------- > >> >> This SF.Net email is sponsored by the Moblin Your Move Developer's > >> challenge > >> >> Build the coolest Linux based applications with Moblin SDK & win great > prizes > >> >> Grand prize is a trip for two to an Open Source event anywhere in the world > >> >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ > >> >> _______________________________________________ > >> >> Psidev-pi-dev mailing list > >> >> Psi...@li... > >> >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >> >> > >> > > >> > > >> > >> ------------------------------------------------------------------------- > >> This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > >> Build the coolest Linux based applications with Moblin SDK & win great prizes > >> Grand prize is a trip for two to an Open Source event anywhere in the world > >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ > >> _______________________________________________ > >> Psidev-pi-dev mailing list > >> Psi...@li... > >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > ------------------------------------------------------------------------- > > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > > Build the coolest Linux based applications with Moblin SDK & win great prizes > > Grand prize is a trip for two to an Open Source event anywhere in the world > > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > _______________________________________________ > > Psidev-pi-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > > > -- > Phil Jones > Senior Software Engineer > PRIDE Project Team > PANDA Group, EMBL-EBI > Wellcome Trust Genome Campus > Hinxton, Cambridge, CB10 1SD > UK. > > Work phone: +44 1223 492662 (NEW NUMBER) > Skype: philip-jones |
From: Matt C. <mat...@va...> - 2008-08-01 13:24:20
|
Hi all, I'm (still) not sure what the use case is for all the "extra" measurements that seem to me to be redundant with the label, but if reporting those is the decision of the implementor, I'm happy with having that capability. Some rough calculation tells me that if I was to write this format from MyriMatch with 10k spectra with 5 results each and an average of 2 y ions and 2 b ions matched, that would be about 16mb of fragmentation data (leaving out the "extra" measurements). That is a lot better than where we were before. But I think we can compact it some more. IIRC, other places in the schema have elements that essentially subclass cvParam, is that right? It would compact things to make IonType such a subclass with the intention that the accession attribute point to an ion CV term and an extra attribute would correspond with the FragArrayIndex. The current proposal: > <Fragmentation> > <IonType> > <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion"/> > <FragArrayIndex values="1 2 3"/> > </IonType> > <IonType> > <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion"/> > <FragArrayIndex values="4 5 6"/> > </IonType> > </Fragmentation> My modified proposal (with some extra compactness possible by opting to leave out the extra measurements): > <Fragmentation> > <IonType cvLabel="Waters" accession="PLGS:00035" name="y ion" > values="1 2 3"/> > <IonType cvLabel="Waters" accession="PLGS:00032" name="b ion" > values="4 5 6"/> > </Fragmentation> This method would not interfere with the capability of having extra measurements, and it provides roughly 30% more compact way of annotating an ion series. Although even then, it's about 10 times less compact than the formatted attribute proposal, but which would only work by explicitly denying the storage of extra measurements. For reference, using the same conditions for my approximate calculations above, MyriMatch's output by the formatted attribute method would have about 1.5mb of fragmentation info. > fragmentEvidence="y1 y2 y3 b4 b5 b6" -Matt Jones, Andy wrote: >> If this is describing three Y-H20 ions, 3, 8 and 10 (i.e. all of the >> Y-H20 ions for this peptide identification) then the attribute >> value="3" on the cvParam element should be removed - or have I >> misunderstood how this works? >> > > Correct, my mistake. The example says we have found y3-H2O y8-H2O and y10-H2O, the cvParam should not have had the value > > > <Fragmentation> > <IonType> > <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O"/> > <FragArrayIndex values = "3 8 10"/> > <FragArray Measure_ref = "m1" values = "379.2215 457.12345 540.234"/> > <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> > <!-- and so on for other measures as defined in the FragmentationTable --> > </IonType> > <IonType> > <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion"/> > <FragArrayIndex values = "2 12 14"/> > <FragArray Measure_ref = "m1" values = "560.153 859.111 945.653"/> > <FragArray Measure_ref = "m2" values = "502.0 330.5 559.5"/> > <!-- and so on for other measures as defined in the FragmentationTable --> > </IonType> > > </Fragmentation> > > > >> Please excuse me for stating the obvious, but... there is no reason >> why the pointers m1, m2, m3, m4 could not be more human readable, so >> changed in this example to mz, inten, mz_error, ret_error for example. >> (To help implementors understand the mechanism). >> > > Good suggestion. > > Cheers > Andy > > > > >> -----Original Message----- >> From: phi...@go... [mailto:phi...@go...] On >> Behalf Of Phil Jones @ EBI >> Sent: 01 August 2008 11:23 >> To: Jones, Andy; psi...@li... >> Subject: Re: [Psidev-pi-dev] Fragmentation Ions >> >> Hi Andy, >> >> This looks really good - both flexible and compact. >> >> Just to clarify - in your example: >> >> <IonType> >> <cvParam cvLabel="Waters" accession="PLGS:00035" >> name="y ion -H2O" value="3"/> >> <FragArrayIndex values = "3 8 10"/> >> <FragArray Measure_ref = "m1" values = "379.2215 >> 457.1234 540.234"/> >> <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> >> <!-- and so on for other measures as defined in the >> FragmentationTable --> >> </IonType> >> >> If this is describing three Y-H20 ions, 3, 8 and 10 (i.e. all of the >> Y-H20 ions for this peptide identification) then the attribute >> value="3" on the cvParam element should be removed - or have I >> misunderstood how this works? >> >> Please excuse me for stating the obvious, but... there is no reason >> why the pointers m1, m2, m3, m4 could not be more human readable, so >> changed in this example to mz, inten, mz_error, ret_error for example. >> (To help implementors understand the mechanism). >> >> best regards, >> >> Phil. >> >> >> >> 2008/8/1 Jones, Andy <And...@li...>: >> >>> Hi all, >>> >>> Here's a proposal for fragmentation ions as discussed on the call that's halfway >>> >> between using cvParams for all values and using an array based encoding. I think >> it's pretty flexible and concise. >> >>> First up, setup a FragmentationTable for the entire list of the spectra, which says >>> >> the kinds of measures you're going to report lower down: >> >>> <SpectrumIdentificationList id="MASCOT_results"> >>> <FragmentationTable> >>> <Measures> >>> <Measure id = "m1"> >>> <cvParam cvLabel="Waters" accession="PLGS:00024" >>> >> name="product ion m/z"/> >> >>> </Measure> >>> <Measure id = "m2"> >>> <cvParam cvLabel="Waters" accession="PLGS:00025" >>> >> name="product ion intensity"/> >> >>> </Measure> >>> <Measure id = "m3"> >>> <cvParam cvLabel="Waters" accession="PLGS:00026" >>> >> name="product ion m/z error"/> >> >>> </Measure> >>> <Measure id = "m4"> >>> <cvParam cvLabel="Waters" accession="PLGS:00027" >>> >> name="product ion retention time error"/> >> >>> </Measure> >>> </Measures> >>> </FragmentationTable> >>> >>> Then for each SpectrumIdentificationItem, you reference back to these >>> >> Measures >> >>> <SpectrumIdentificationItem id="SEQ_spec1_pep1" Peptide_ref="prot1_pep1" >>> >> chargeState="1"> >> >>> <PeptideEvidence id="PE1_SEQ_spec1_pep1" start="67" pre="-" end="79" >>> >> isDecoy="false" /> >> >>> ... >>> >>> <Fragmentation> >>> <IonType> >>> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion - >>> >> H2O" value="3"/> >> >>> <FragArrayIndex values = "3 8 10"/> >>> <FragArray Measure_ref = "m1" values = "379.2215 457.1234 >>> >> 540.234"/> >> >>> <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> >>> <!-- and so on for other measures as defined in the >>> >> FragmentationTable --> >> >>> </IonType> >>> <IonType> >>> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" >>> >> value="4"/> >> >>> <FragArrayIndex values = "2 12 14"/> >>> <FragArray Measure_ref = "m1" values = "560.153 859.111 >>> >> 945.653"/> >> >>> <FragArray Measure_ref = "m2" values = "502.0 330.5 559.5"/> >>> <!-- and so on for other measures as defined in the >>> >> FragmentationTable --> >> >>> </IonType> >>> >>> </Fragmentation> >>> >>> >>> Each array contains space separated values (i.e. an xsd:list). The FragArrayIndex >>> >> tells you which ions you've found i.e. for the second IonType we have b2 b12 and >> b14 which have the m/z and intensity values in the m1 and m2 arrays. This will >> save a lot of space if there are many ions of the same type in each array and I >> think it is fairly easy to read as well. Slightly more space could be saved by >> defining the ion types in the FragmentationTable but not much really once you've >> added a reference back up to it. >> >>> Cheers >>> Andy >>> >>> >>> >>> >>> >>> >>> >>> >>> >>>> -----Original Message----- >>>> From: psi...@li... [mailto:psidev-pi-dev- >>>> bo...@li...] On Behalf Of Matthew Chambers >>>> Sent: 18 July 2008 16:00 >>>> To: psi...@li... >>>> Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is currently >>>> handled in PRIDE (Issue 28) >>>> >>>> I also agree that anything beyond an array is far too verbose. To answer >>>> this question, I think we need to decide the scope of the problem. What >>>> do we want fragment ion information to represent? I think analysis >>>> software is too diverse to use it for anything more than basic >>>> annotation, but basic annotation is important. If there are ways people >>>> want it to be usable beyond that, speak up. :) >>>> >>>> For basic annotation, all I think is needed is the fragment type, series >>>> number, charge state, and possibly any modification like a neutral loss >>>> or radical. The array can be an attribute or text node. We can use a >>>> grammar for each term, where each term represents an ion and terms are >>>> space delimited. The grammar might look like: <a|b|c|x|y|z><# between 1 >>>> and peptide_length>[<+|-><formula>][,(<+|-><charge>] >>>> We could make the charge part mandatory or if it was optional, assume a >>>> +1 charge (or possibly allow the charge to be based on the polarity of >>>> the source scan?). I assume there is a standard chemical formula format >>>> that can be represented compactly in ASCII text, but I don't know it. >>>> An example to show how compact it could be: >>>> fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" >>>> >>>> For basic annotation, the masses are not necessary I think. Expected >>>> mass can be recomputed if all the label metadata is complete and >>>> regular, and the observed mass is unimportant for annotation (IMO). >>>> >>>> -Matt >>>> >>>> >>>> David Creasy wrote: >>>> >>>>> Hi Phil, >>>>> >>>>> Just to be sure I've not misunderstood... from below, each fragment ion >>>>> takes approx 500 bytes. Lets assume a conservative average of 20 >>>>> fragment matches per spectrum and a modest search with 100k spectra. >>>>> Assuming that we just report fragment matches for the top match for each >>>>> spectrum, this would result in a file that is 500 x 20 x 100,000 = 1Gb. >>>>> If we reported fragment matches for the the top 10 matches for each >>>>> spectrum, this would be 10Gb. Is this reasonable and acceptable? >>>>> >>>>> David >>>>> >>>>> >>>>> >>>>> Phil Jones @ EBI wrote: >>>>> >>>>> >>>>>> Hi, >>>>>> >>>>>> Regarding Issue 28 >>>>>> <http://code.google.com/p/psi-pi/issues/detail?id=28> "support >>>>>> reporting of fragment ions" >>>>>> >>>>>> As a suggestion of how this might be tackled: >>>>>> >>>>>> The latest development version of the PRIDE database includes a very >>>>>> simple mechanism >>>>>> for recording fragment ion information, illustrated below. (Please >>>>>> note - made up data.) >>>>>> >>>>>> In this example, CV terms are used to define the type of ion and >>>>>> related information >>>>>> / annotation. Note that this is even more simple that the suggestion >>>>>> made by Andy >>>>>> above - no attempt is made here to indicate which residue has been >>>>>> called for each >>>>>> fragment ion - it is just listing the ions. >>>>>> >>>>>> Also note that while the PeptideItem is referencing the mass spectrum >>>>>> >> (which is >> >>>>>> reported in detail in the associated mzData file), the individual >>>>>> fragment ions are >>>>>> just reporting the m/z value and not attempting to make any kind of >>>>>> hard reference to >>>>>> the spectrum. >>>>>> >>>>>> As you can see, this has been developed in collaboration with Waters, >>>>>> with output >>>>>> from the ProteinLynx Global Server. (Actual values / sequence have >>>>>> been changed). >>>>>> >>>>>> One possible change would be to make the m/z value an attribute of the >>>>>> FragmentIon element, as this value will be mandatory and required to >>>>>> relate the fragment ion to the correct peak on the mass spectrum. The >>>>>> CV used for the annotation would also need to be part of the PI CV ?? >>>>>> >>>>>> Note that in the existing model, there are other terms available, to >>>>>> allow any kind of fragment ion to be described (not just B and Y ions) >>>>>> >>>>>> In the context of analysisXML, the <FragmentIon/> elements would be >>>>>> children of a <SpectrumIdentificationResultItem/> >>>>>> >>>>>> best regards, >>>>>> >>>>>> Phil. >>>>>> >>>>>> <PeptideItem> >>>>>> <Sequence>LFQQSQWTREVFSNSCK</Sequence> >>>>>> <Start>435</Start> >>>>>> <End>460</End> >>>>>> <SpectrumReference>123</SpectrumReference> >>>>>> <FragmentIon> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" >>>>>> >>>> value="3"/> >>>> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >>>>>> m/z" value="379.2215"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >>>>>> intensity" value="1382.0"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion >>>>>> >> m/z >> >>>>>> error" value="-7.1543"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >>>>>> retention time error" value="0.0207"/> >>>>>> </FragmentIon> >>>>>> <FragmentIon> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" >>>>>> >>>> value="4"/> >>>> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >>>>>> m/z" value="534.2811"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >>>>>> intensity" value="1242.0"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion >>>>>> >> m/z >> >>>>>> error" value="-8.2315"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >>>>>> retention time error" value="0.0029"/> >>>>>> </FragmentIon> >>>>>> <FragmentIon> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" >>>>>> >>>> value="3"/> >>>> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >>>>>> m/z" value="394.1813"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >>>>>> intensity" value="1917.0"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion >>>>>> >> m/z >> >>>>>> error" value="-14.7098"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >>>>>> retention time error" value="-0.0013"/> >>>>>> </FragmentIon> >>>>>> <FragmentIon> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" >>>>>> >>>> value="3"/> >>>> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >>>>>> m/z" value="367.1669"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >>>>>> intensity" value="345.0"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion >>>>>> >> m/z >> >>>>>> error" value="-18.767"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >>>>>> retention time error" value="0.0025"/> >>>>>> </FragmentIon> >>>>>> <additional> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor >>>>>> >> mass" >> >>>>>> value="1971.9194"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor >>>>>> intensity" value="181349.0"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor >>>>>> >> error >> >>>>>> in ppm" value="0.8043"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor >>>>>> retention time in minutes" value="57.3537"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion >>>>>> mass RMS error" value="14.5969"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion >>>>>> retention time RMS error" value="0.0093"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted >>>>>> average charge state" value="2.2"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one >>>>>> >> match" >> >>>>>> value="" /> >>>>>> </additional> >>>>>> </PeptideItem> >>>>>> >>>>>> >>>>>> -- >>>>>> Phil Jones >>>>>> Senior Software Engineer >>>>>> PRIDE Project Team >>>>>> PANDA Group, EMBL-EBI >>>>>> Wellcome Trust Genome Campus >>>>>> Hinxton, Cambridge, CB10 1SD >>>>>> UK. >>>>>> >>>>>> Work phone: +44 1223 492662 (NEW NUMBER) >>>>>> Skype: philip-jones >>>>>> |
From: Jones, A. <And...@li...> - 2008-08-01 13:57:38
|
Hi Matt, > I'm (still) not sure what the use case is for all the "extra" > measurements that seem to me to be redundant with the label, but if > reporting those is the decision of the implementor, I'm happy with > having that capability. On the call, it was discussed that some implementers may wish to report additional scores associated with particular peaks e.g. why a particular peak was identified as a particular ion (for example related to the abundance of an adjacent peak). I think this is a niche use case but this proposal would allow it to be done. Also see the other values in comment 5 of issue 28 e.g. product ion m/z error > My modified proposal (with some extra compactness possible by opting to > leave out the extra measurements): > > <Fragmentation> > > <IonType cvLabel="Waters" accession="PLGS:00035" name="y ion" > > values="1 2 3"/> > > <IonType cvLabel="Waters" accession="PLGS:00032" name="b ion" > > values="4 5 6"/> > > </Fragmentation> We can definitely do it this way in the XSD. The only minor advantage of doing it the other way is that the format (xsd:list) can be verified by an XML parser rather than relying on the validator software, either is fine though. > My modified proposal (with some extra compactness possible by opting to > leave out the extra measurements): > > <Fragmentation> > > <IonType cvLabel="Waters" accession="PLGS:00035" name="y ion" > > values="1 2 3"/> > > <IonType cvLabel="Waters" accession="PLGS:00032" name="b ion" > > values="4 5 6"/> > > </Fragmentation> I agree this is fine for a viewer, in that it tells you the expected ion types that were detected, but it doesn't tell you which observed ion types were matched to them. If multiple peaks fall in the same range near an expected peak, you've lost information. > Although even then, it's about 10 times less compact than the formatted > attribute proposal, but which would only work by explicitly denying the > storage of extra measurements. For reference, using the same conditions > for my approximate calculations above, MyriMatch's output by the > formatted attribute method would have about 1.5mb of fragmentation info. > > fragmentEvidence="y1 y2 y3 b4 b5 b6" Although initially I was in favour of this approach, it suffers from the problem that we have to decide now (in the schema documentation) on all ion types and definitions. I'm not even sure if there is universal agreement on what constitutes each ion type, see comment from David: >"What about internal fragments, immonium ions, side chain cleavages?" I don't even know what an immonium ion is so I don't want to have to sign off on a perfect list of all ion types in the analysisXML documentation! By using ontology terms we can leave flexibility in there so that implementers can report whatever ion types they like. IMO being as compact as possible is not really a big deal? Cheers Andy > -----Original Message----- > From: psi...@li... [mailto:psidev-pi-dev- > bo...@li...] On Behalf Of Matt Chambers > Sent: 01 August 2008 14:24 > To: psi...@li... > Subject: Re: [Psidev-pi-dev] Fragmentation Ions > > Hi all, > > I'm (still) not sure what the use case is for all the "extra" > measurements that seem to me to be redundant with the label, but if > reporting those is the decision of the implementor, I'm happy with > having that capability. Some rough calculation tells me that if I was to > write this format from MyriMatch with 10k spectra with 5 results each > and an average of 2 y ions and 2 b ions matched, that would be about > 16mb of fragmentation data (leaving out the "extra" measurements). That > is a lot better than where we were before. But I think we can compact it > some more. IIRC, other places in the schema have elements that > essentially subclass cvParam, is that right? It would compact things to > make IonType such a subclass with the intention that the accession > attribute point to an ion CV term and an extra attribute would > correspond with the FragArrayIndex. > > The current proposal: > > <Fragmentation> > > <IonType> > > <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion"/> > > <FragArrayIndex values="1 2 3"/> > > </IonType> > > <IonType> > > <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion"/> > > <FragArrayIndex values="4 5 6"/> > > </IonType> > > </Fragmentation> > > My modified proposal (with some extra compactness possible by opting to > leave out the extra measurements): > > <Fragmentation> > > <IonType cvLabel="Waters" accession="PLGS:00035" name="y ion" > > values="1 2 3"/> > > <IonType cvLabel="Waters" accession="PLGS:00032" name="b ion" > > values="4 5 6"/> > > </Fragmentation> > > This method would not interfere with the capability of having extra > measurements, and it provides roughly 30% more compact way of annotating > an ion series. > > Although even then, it's about 10 times less compact than the formatted > attribute proposal, but which would only work by explicitly denying the > storage of extra measurements. For reference, using the same conditions > for my approximate calculations above, MyriMatch's output by the > formatted attribute method would have about 1.5mb of fragmentation info. > > fragmentEvidence="y1 y2 y3 b4 b5 b6" > > -Matt > > > > Jones, Andy wrote: > >> If this is describing three Y-H20 ions, 3, 8 and 10 (i.e. all of the > >> Y-H20 ions for this peptide identification) then the attribute > >> value="3" on the cvParam element should be removed - or have I > >> misunderstood how this works? > >> > > > > Correct, my mistake. The example says we have found y3-H2O y8-H2O and > y10-H2O, the cvParam should not have had the value > > > > > > <Fragmentation> > > <IonType> > > <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion - > H2O"/> > > <FragArrayIndex values = "3 8 10"/> > > <FragArray Measure_ref = "m1" values = "379.2215 457.12345 > 540.234"/> > > <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> > > <!-- and so on for other measures as defined in the > FragmentationTable --> > > </IonType> > > <IonType> > > <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion"/> > > <FragArrayIndex values = "2 12 14"/> > > <FragArray Measure_ref = "m1" values = "560.153 859.111 > 945.653"/> > > <FragArray Measure_ref = "m2" values = "502.0 330.5 559.5"/> > > <!-- and so on for other measures as defined in the > FragmentationTable --> > > </IonType> > > > > </Fragmentation> > > > > > > > >> Please excuse me for stating the obvious, but... there is no reason > >> why the pointers m1, m2, m3, m4 could not be more human readable, so > >> changed in this example to mz, inten, mz_error, ret_error for example. > >> (To help implementors understand the mechanism). > >> > > > > Good suggestion. > > > > Cheers > > Andy > > > > > > > > > >> -----Original Message----- > >> From: phi...@go... [mailto:phi...@go...] > On > >> Behalf Of Phil Jones @ EBI > >> Sent: 01 August 2008 11:23 > >> To: Jones, Andy; psi...@li... > >> Subject: Re: [Psidev-pi-dev] Fragmentation Ions > >> > >> Hi Andy, > >> > >> This looks really good - both flexible and compact. > >> > >> Just to clarify - in your example: > >> > >> <IonType> > >> <cvParam cvLabel="Waters" accession="PLGS:00035" > >> name="y ion -H2O" value="3"/> > >> <FragArrayIndex values = "3 8 10"/> > >> <FragArray Measure_ref = "m1" values = "379.2215 > >> 457.1234 540.234"/> > >> <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> > >> <!-- and so on for other measures as defined in the > >> FragmentationTable --> > >> </IonType> > >> > >> If this is describing three Y-H20 ions, 3, 8 and 10 (i.e. all of the > >> Y-H20 ions for this peptide identification) then the attribute > >> value="3" on the cvParam element should be removed - or have I > >> misunderstood how this works? > >> > >> Please excuse me for stating the obvious, but... there is no reason > >> why the pointers m1, m2, m3, m4 could not be more human readable, so > >> changed in this example to mz, inten, mz_error, ret_error for example. > >> (To help implementors understand the mechanism). > >> > >> best regards, > >> > >> Phil. > >> > >> > >> > >> 2008/8/1 Jones, Andy <And...@li...>: > >> > >>> Hi all, > >>> > >>> Here's a proposal for fragmentation ions as discussed on the call that's > halfway > >>> > >> between using cvParams for all values and using an array based encoding. I > think > >> it's pretty flexible and concise. > >> > >>> First up, setup a FragmentationTable for the entire list of the spectra, which > says > >>> > >> the kinds of measures you're going to report lower down: > >> > >>> <SpectrumIdentificationList id="MASCOT_results"> > >>> <FragmentationTable> > >>> <Measures> > >>> <Measure id = "m1"> > >>> <cvParam cvLabel="Waters" accession="PLGS:00024" > >>> > >> name="product ion m/z"/> > >> > >>> </Measure> > >>> <Measure id = "m2"> > >>> <cvParam cvLabel="Waters" accession="PLGS:00025" > >>> > >> name="product ion intensity"/> > >> > >>> </Measure> > >>> <Measure id = "m3"> > >>> <cvParam cvLabel="Waters" accession="PLGS:00026" > >>> > >> name="product ion m/z error"/> > >> > >>> </Measure> > >>> <Measure id = "m4"> > >>> <cvParam cvLabel="Waters" accession="PLGS:00027" > >>> > >> name="product ion retention time error"/> > >> > >>> </Measure> > >>> </Measures> > >>> </FragmentationTable> > >>> > >>> Then for each SpectrumIdentificationItem, you reference back to these > >>> > >> Measures > >> > >>> <SpectrumIdentificationItem id="SEQ_spec1_pep1" > Peptide_ref="prot1_pep1" > >>> > >> chargeState="1"> > >> > >>> <PeptideEvidence id="PE1_SEQ_spec1_pep1" start="67" pre="-" > end="79" > >>> > >> isDecoy="false" /> > >> > >>> ... > >>> > >>> <Fragmentation> > >>> <IonType> > >>> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion - > >>> > >> H2O" value="3"/> > >> > >>> <FragArrayIndex values = "3 8 10"/> > >>> <FragArray Measure_ref = "m1" values = "379.2215 457.1234 > >>> > >> 540.234"/> > >> > >>> <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> > >>> <!-- and so on for other measures as defined in the > >>> > >> FragmentationTable --> > >> > >>> </IonType> > >>> <IonType> > >>> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > >>> > >> value="4"/> > >> > >>> <FragArrayIndex values = "2 12 14"/> > >>> <FragArray Measure_ref = "m1" values = "560.153 859.111 > >>> > >> 945.653"/> > >> > >>> <FragArray Measure_ref = "m2" values = "502.0 330.5 559.5"/> > >>> <!-- and so on for other measures as defined in the > >>> > >> FragmentationTable --> > >> > >>> </IonType> > >>> > >>> </Fragmentation> > >>> > >>> > >>> Each array contains space separated values (i.e. an xsd:list). The > FragArrayIndex > >>> > >> tells you which ions you've found i.e. for the second IonType we have b2 b12 > and > >> b14 which have the m/z and intensity values in the m1 and m2 arrays. This will > >> save a lot of space if there are many ions of the same type in each array and I > >> think it is fairly easy to read as well. Slightly more space could be saved by > >> defining the ion types in the FragmentationTable but not much really once > you've > >> added a reference back up to it. > >> > >>> Cheers > >>> Andy > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>>> -----Original Message----- > >>>> From: psi...@li... [mailto:psidev-pi-dev- > >>>> bo...@li...] On Behalf Of Matthew Chambers > >>>> Sent: 18 July 2008 16:00 > >>>> To: psi...@li... > >>>> Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is > currently > >>>> handled in PRIDE (Issue 28) > >>>> > >>>> I also agree that anything beyond an array is far too verbose. To answer > >>>> this question, I think we need to decide the scope of the problem. What > >>>> do we want fragment ion information to represent? I think analysis > >>>> software is too diverse to use it for anything more than basic > >>>> annotation, but basic annotation is important. If there are ways people > >>>> want it to be usable beyond that, speak up. :) > >>>> > >>>> For basic annotation, all I think is needed is the fragment type, series > >>>> number, charge state, and possibly any modification like a neutral loss > >>>> or radical. The array can be an attribute or text node. We can use a > >>>> grammar for each term, where each term represents an ion and terms are > >>>> space delimited. The grammar might look like: <a|b|c|x|y|z><# between 1 > >>>> and peptide_length>[<+|-><formula>][,(<+|-><charge>] > >>>> We could make the charge part mandatory or if it was optional, assume a > >>>> +1 charge (or possibly allow the charge to be based on the polarity of > >>>> the source scan?). I assume there is a standard chemical formula format > >>>> that can be represented compactly in ASCII text, but I don't know it. > >>>> An example to show how compact it could be: > >>>> fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" > >>>> > >>>> For basic annotation, the masses are not necessary I think. Expected > >>>> mass can be recomputed if all the label metadata is complete and > >>>> regular, and the observed mass is unimportant for annotation (IMO). > >>>> > >>>> -Matt > >>>> > >>>> > >>>> David Creasy wrote: > >>>> > >>>>> Hi Phil, > >>>>> > >>>>> Just to be sure I've not misunderstood... from below, each fragment ion > >>>>> takes approx 500 bytes. Lets assume a conservative average of 20 > >>>>> fragment matches per spectrum and a modest search with 100k spectra. > >>>>> Assuming that we just report fragment matches for the top match for each > >>>>> spectrum, this would result in a file that is 500 x 20 x 100,000 = 1Gb. > >>>>> If we reported fragment matches for the the top 10 matches for each > >>>>> spectrum, this would be 10Gb. Is this reasonable and acceptable? > >>>>> > >>>>> David > >>>>> > >>>>> > >>>>> > >>>>> Phil Jones @ EBI wrote: > >>>>> > >>>>> > >>>>>> Hi, > >>>>>> > >>>>>> Regarding Issue 28 > >>>>>> <http://code.google.com/p/psi-pi/issues/detail?id=28> "support > >>>>>> reporting of fragment ions" > >>>>>> > >>>>>> As a suggestion of how this might be tackled: > >>>>>> > >>>>>> The latest development version of the PRIDE database includes a very > >>>>>> simple mechanism > >>>>>> for recording fragment ion information, illustrated below. (Please > >>>>>> note - made up data.) > >>>>>> > >>>>>> In this example, CV terms are used to define the type of ion and > >>>>>> related information > >>>>>> / annotation. Note that this is even more simple that the suggestion > >>>>>> made by Andy > >>>>>> above - no attempt is made here to indicate which residue has been > >>>>>> called for each > >>>>>> fragment ion - it is just listing the ions. > >>>>>> > >>>>>> Also note that while the PeptideItem is referencing the mass spectrum > >>>>>> > >> (which is > >> > >>>>>> reported in detail in the associated mzData file), the individual > >>>>>> fragment ions are > >>>>>> just reporting the m/z value and not attempting to make any kind of > >>>>>> hard reference to > >>>>>> the spectrum. > >>>>>> > >>>>>> As you can see, this has been developed in collaboration with Waters, > >>>>>> with output > >>>>>> from the ProteinLynx Global Server. (Actual values / sequence have > >>>>>> been changed). > >>>>>> > >>>>>> One possible change would be to make the m/z value an attribute of the > >>>>>> FragmentIon element, as this value will be mandatory and required to > >>>>>> relate the fragment ion to the correct peak on the mass spectrum. The > >>>>>> CV used for the annotation would also need to be part of the PI CV ?? > >>>>>> > >>>>>> Note that in the existing model, there are other terms available, to > >>>>>> allow any kind of fragment ion to be described (not just B and Y ions) > >>>>>> > >>>>>> In the context of analysisXML, the <FragmentIon/> elements would be > >>>>>> children of a <SpectrumIdentificationResultItem/> > >>>>>> > >>>>>> best regards, > >>>>>> > >>>>>> Phil. > >>>>>> > >>>>>> <PeptideItem> > >>>>>> <Sequence>LFQQSQWTREVFSNSCK</Sequence> > >>>>>> <Start>435</Start> > >>>>>> <End>460</End> > >>>>>> <SpectrumReference>123</SpectrumReference> > >>>>>> <FragmentIon> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > >>>>>> > >>>> value="3"/> > >>>> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >>>>>> m/z" value="379.2215"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >>>>>> intensity" value="1382.0"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion > >>>>>> > >> m/z > >> > >>>>>> error" value="-7.1543"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >>>>>> retention time error" value="0.0207"/> > >>>>>> </FragmentIon> > >>>>>> <FragmentIon> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > >>>>>> > >>>> value="4"/> > >>>> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >>>>>> m/z" value="534.2811"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >>>>>> intensity" value="1242.0"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion > >>>>>> > >> m/z > >> > >>>>>> error" value="-8.2315"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >>>>>> retention time error" value="0.0029"/> > >>>>>> </FragmentIon> > >>>>>> <FragmentIon> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" > >>>>>> > >>>> value="3"/> > >>>> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >>>>>> m/z" value="394.1813"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >>>>>> intensity" value="1917.0"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion > >>>>>> > >> m/z > >> > >>>>>> error" value="-14.7098"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >>>>>> retention time error" value="-0.0013"/> > >>>>>> </FragmentIon> > >>>>>> <FragmentIon> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" > >>>>>> > >>>> value="3"/> > >>>> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >>>>>> m/z" value="367.1669"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >>>>>> intensity" value="345.0"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion > >>>>>> > >> m/z > >> > >>>>>> error" value="-18.767"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >>>>>> retention time error" value="0.0025"/> > >>>>>> </FragmentIon> > >>>>>> <additional> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor > >>>>>> > >> mass" > >> > >>>>>> value="1971.9194"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor > >>>>>> intensity" value="181349.0"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor > >>>>>> > >> error > >> > >>>>>> in ppm" value="0.8043"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor > >>>>>> retention time in minutes" value="57.3537"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion > >>>>>> mass RMS error" value="14.5969"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion > >>>>>> retention time RMS error" value="0.0093"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted > >>>>>> average charge state" value="2.2"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one > >>>>>> > >> match" > >> > >>>>>> value="" /> > >>>>>> </additional> > >>>>>> </PeptideItem> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Phil Jones > >>>>>> Senior Software Engineer > >>>>>> PRIDE Project Team > >>>>>> PANDA Group, EMBL-EBI > >>>>>> Wellcome Trust Genome Campus > >>>>>> Hinxton, Cambridge, CB10 1SD > >>>>>> UK. > >>>>>> > >>>>>> Work phone: +44 1223 492662 (NEW NUMBER) > >>>>>> Skype: philip-jones > >>>>>> > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev |
From: Matthew C. <mat...@va...> - 2008-08-01 15:21:16
|
Hi Andy, Jones, Andy wrote: > Hi Matt, > > >> I'm (still) not sure what the use case is for all the "extra" >> measurements that seem to me to be redundant with the label, but if >> reporting those is the decision of the implementor, I'm happy with >> having that capability. >> > On the call, it was discussed that some implementers may wish to report additional scores associated with particular peaks e.g. why a particular peak was identified as a particular ion (for example related to the abundance of an adjacent peak). I think this is a niche use case but this proposal would allow it to be done. Also see the other values in comment 5 of issue 28 e.g. product ion m/z error > I can see the use of being able to report additional scores on a per peak basis, although I don't personally know how to use that information. :) >> My modified proposal (with some extra compactness possible by opting to >> leave out the extra measurements): >> >>> <Fragmentation> >>> <IonType cvLabel="Waters" accession="PLGS:00035" name="y ion" >>> values="1 2 3"/> >>> <IonType cvLabel="Waters" accession="PLGS:00032" name="b ion" >>> values="4 5 6"/> >>> </Fragmentation> >>> > We can definitely do it this way in the XSD. The only minor advantage of doing it the other way is that the format (xsd:list) can be verified by an XML parser rather than relying on the validator software, either is fine though. > Well, notice I use "values" instead of "value" (but that wasn't intentional, hehe). We could either use the value attribute we get from subclassing, or add a new attribute that is more specific (like "seriesIndexList" or something). Even if we reuse CVParam's value, I would expect there to be some way of overriding the type of the inherited attribute? In any case, xsd:list isn't enough to semantically validate the series, because a list like "100 99999 2939439" is a valid xsd:list but obviously semantically crazy. :) >> Although even then, it's about 10 times less compact than the formatted >> attribute proposal, but which would only work by explicitly denying the >> storage of extra measurements. For reference, using the same conditions >> for my approximate calculations above, MyriMatch's output by the >> formatted attribute method would have about 1.5mb of fragmentation info. >> >>> fragmentEvidence="y1 y2 y3 b4 b5 b6" >>> > > Although initially I was in favour of this approach, it suffers from the problem that we have to decide now (in the schema documentation) on all ion types and definitions. I'm not even sure if there is universal agreement on what constitutes each ion type, see comment from David: > > >> "What about internal fragments, immonium ions, side chain cleavages?" >> > > I don't even know what an immonium ion is so I don't want to have to sign off on a perfect list of all ion types in the analysisXML documentation! By using ontology terms we can leave flexibility in there so that implementers can report whatever ion types they like. IMO being as compact as possible is not really a big deal? > Yes, those ion types are troublesome to represent in a label. How will that be done in the CV? In other words, the same question applies to the cvParam approach. :) But your point about having to have universal agreement up front is a good one. Yet, if we DON'T have agreement up front about ion types, will that mean we start getting requests for obscure, vendor-specific ion types as CV terms? Or will we see userParams used there instead? It's not clear to me exactly how the CV approach solves the up front agreement issue either (at least, not without creating its own issues). -Matt |
From: Jones, A. <And...@li...> - 2008-08-01 15:30:50
|
> Well, notice I use "values" instead of "value" (but that wasn't > intentional, hehe). We could either use the value attribute we get from > subclassing, or add a new attribute that is more specific (like > "seriesIndexList" or something). Y I think either of these options would be fine. I would marginally favour adding, say, seriesIndex = "" then we can add a list datatype and specific documentation for the attribute in the XSD. > > I don't even know what an immonium ion is so I don't want to have to sign off on > a perfect list of all ion types in the analysisXML documentation! By using ontology > terms we can leave flexibility in there so that implementers can report whatever ion > types they like. IMO being as compact as possible is not really a big deal? > > > Yes, those ion types are troublesome to represent in a label. How will > that be done in the CV? In other words, the same question applies to the > cvParam approach. :) But your point about having to have universal > agreement up front is a good one. Yet, if we DON'T have agreement up > front about ion types, will that mean we start getting requests for > obscure, vendor-specific ion types as CV terms? Or will we see > userParams used there instead? It's not clear to me exactly how the CV > approach solves the up front agreement issue either (at least, not > without creating its own issues). I think the PSI cv should contain the main ion types (b / y, neutral losses etc) with definitions. For the rest, I was thinking obscure vendor specific terms was the way to go i.e. not my problem :-) I just want to focus on getting the XSD over the line into version 1 state. CV terms can evolve as and when they are needed but once the spec doc is fixed at version 1 we don't want to have to touch it again! Cheers Andy > -----Original Message----- > From: Matthew Chambers [mailto:mat...@va...] > Sent: 01 August 2008 16:21 > To: Jones, Andy > Cc: psi...@li... > Subject: Re: [Psidev-pi-dev] Fragmentation Ions > > Hi Andy, > > > Jones, Andy wrote: > > Hi Matt, > > > > > >> I'm (still) not sure what the use case is for all the "extra" > >> measurements that seem to me to be redundant with the label, but if > >> reporting those is the decision of the implementor, I'm happy with > >> having that capability. > >> > > On the call, it was discussed that some implementers may wish to report > additional scores associated with particular peaks e.g. why a particular peak was > identified as a particular ion (for example related to the abundance of an adjacent > peak). I think this is a niche use case but this proposal would allow it to be done. > Also see the other values in comment 5 of issue 28 e.g. product ion m/z error > > > I can see the use of being able to report additional scores on a per > peak basis, although I don't personally know how to use that information. :) > > > >> My modified proposal (with some extra compactness possible by opting to > >> leave out the extra measurements): > >> > >>> <Fragmentation> > >>> <IonType cvLabel="Waters" accession="PLGS:00035" name="y ion" > >>> values="1 2 3"/> > >>> <IonType cvLabel="Waters" accession="PLGS:00032" name="b ion" > >>> values="4 5 6"/> > >>> </Fragmentation> > >>> > > We can definitely do it this way in the XSD. The only minor advantage of doing it > the other way is that the format (xsd:list) can be verified by an XML parser rather > than relying on the validator software, either is fine though. > > > Well, notice I use "values" instead of "value" (but that wasn't > intentional, hehe). We could either use the value attribute we get from > subclassing, or add a new attribute that is more specific (like > "seriesIndexList" or something). Even if we reuse CVParam's value, I > would expect there to be some way of overriding the type of the > inherited attribute? In any case, xsd:list isn't enough to semantically > validate the series, because a list like "100 99999 2939439" is a valid > xsd:list but obviously semantically crazy. :) > > > >> Although even then, it's about 10 times less compact than the formatted > >> attribute proposal, but which would only work by explicitly denying the > >> storage of extra measurements. For reference, using the same conditions > >> for my approximate calculations above, MyriMatch's output by the > >> formatted attribute method would have about 1.5mb of fragmentation info. > >> > >>> fragmentEvidence="y1 y2 y3 b4 b5 b6" > >>> > > > > Although initially I was in favour of this approach, it suffers from the problem that > we have to decide now (in the schema documentation) on all ion types and > definitions. I'm not even sure if there is universal agreement on what constitutes > each ion type, see comment from David: > > > > > >> "What about internal fragments, immonium ions, side chain cleavages?" > >> > > > > I don't even know what an immonium ion is so I don't want to have to sign off on > a perfect list of all ion types in the analysisXML documentation! By using ontology > terms we can leave flexibility in there so that implementers can report whatever ion > types they like. IMO being as compact as possible is not really a big deal? > > > Yes, those ion types are troublesome to represent in a label. How will > that be done in the CV? In other words, the same question applies to the > cvParam approach. :) But your point about having to have universal > agreement up front is a good one. Yet, if we DON'T have agreement up > front about ion types, will that mean we start getting requests for > obscure, vendor-specific ion types as CV terms? Or will we see > userParams used there instead? It's not clear to me exactly how the CV > approach solves the up front agreement issue either (at least, not > without creating its own issues). > > -Matt |
From: Eugene K. <Eug...@lu...> - 2008-07-22 13:13:07
|
Hi David, Pierre-Alain and Phil, I am not sure if this discussion has continued elsewhere? Anyway here are a few thoughts. 1) How is this handled (or not) in pepXML (ISB search analysis xml file). We can discuss this with Jimmy Eng if required? 2) In many cases the first answer is not the correct one (but it could be in the top 10). So if you do not support all top ten per spectrum then it's pointless. Several algorithms (X!Tandem for e.g.) only display the top hit with associated fragment ion information. I could look at OMSSA and let you know what it does. 3) Phil: Are Waters proposing that MS (to the e) experiments are supported within this framework? How big are the XML files (I agree that this is all encompassing but is it practical - as David and Pierre-Alain have alluded to)? 4) Perhaps the information used by the algorithm in reaching it's score should be supported - as per Mascot dat file (this would be good practice anyway) because it indicates some transparency on behalf of the algorithm vendor. 5) Something that would be useful (not directly related to analysisXML) is how to calculate the mass of a peptide using monoisotopic and average masses. IUPAC provides this but it would be good if everyone settled on the same exact masses for the elements (and modifications of course). A script could easily compute the correct fragment matches (within prescribed tolerance) based on the information in analysisXML. A problem of course is deciding which mz ion is which fragment ion if they overlap (default is accept all?). What about charge state of m/z ions. Currently most algorithms only go up to +2? Just my thoughts. Look forward to discussing further. regards, Eugene ________________________________ From: Pierre-Alain Binz [mailto:pie...@is...] Sent: Fri 18/07/2008 10:49 PM To: David Creasy Cc: Phil Jones @ EBI; psi...@li...; Eugene Kapp Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is currently handled in PRIDE (Issue 28) Hi Phil, to my opinion also, really too verbose. Typically a place where arrays can be used efficiently. In principle, the way I had shown with the phenyx example can probably be better encoded in single dimension or even multy dimension arrays (just like mzXML for m/z-I pairs). Just my thoughts Pierre-Alain David Creasy wrote: Hi Phil, Just to be sure I've not misunderstood... from below, each fragment ion takes approx 500 bytes. Lets assume a conservative average of 20 fragment matches per spectrum and a modest search with 100k spectra. Assuming that we just report fragment matches for the top match for each spectrum, this would result in a file that is 500 x 20 x 100,000 = 1Gb. If we reported fragment matches for the the top 10 matches for each spectrum, this would be 10Gb. Is this reasonable and acceptable? David Phil Jones @ EBI wrote: Hi, Regarding Issue 28 <http://code.google.com/p/psi-pi/issues/detail?id=28> <http://code.google.com/p/psi-pi/issues/detail?id=28> "support reporting of fragment ions" As a suggestion of how this might be tackled: The latest development version of the PRIDE database includes a very simple mechanism for recording fragment ion information, illustrated below. (Please note - made up data.) In this example, CV terms are used to define the type of ion and related information / annotation. Note that this is even more simple that the suggestion made by Andy above - no attempt is made here to indicate which residue has been called for each fragment ion - it is just listing the ions. Also note that while the PeptideItem is referencing the mass spectrum (which is reported in detail in the associated mzData file), the individual fragment ions are just reporting the m/z value and not attempting to make any kind of hard reference to the spectrum. As you can see, this has been developed in collaboration with Waters, with output from the ProteinLynx Global Server. (Actual values / sequence have been changed). One possible change would be to make the m/z value an attribute of the FragmentIon element, as this value will be mandatory and required to relate the fragment ion to the correct peak on the mass spectrum. The CV used for the annotation would also need to be part of the PI CV ?? Note that in the existing model, there are other terms available, to allow any kind of fragment ion to be described (not just B and Y ions) In the context of analysisXML, the <FragmentIon/> elements would be children of a <SpectrumIdentificationResultItem/> best regards, Phil. <PeptideItem> <Sequence>LFQQSQWTREVFSNSCK</Sequence> <Start>435</Start> <End>460</End> <SpectrumReference>123</SpectrumReference> <FragmentIon> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="3"/> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion m/z" value="379.2215"/> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion intensity" value="1382.0"/> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z error" value="-7.1543"/> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion retention time error" value="0.0207"/> </FragmentIon> <FragmentIon> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="4"/> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion m/z" value="534.2811"/> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion intensity" value="1242.0"/> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z error" value="-8.2315"/> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion retention time error" value="0.0029"/> </FragmentIon> <FragmentIon> <cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" value="3"/> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion m/z" value="394.1813"/> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion intensity" value="1917.0"/> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z error" value="-14.7098"/> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion retention time error" value="-0.0013"/> </FragmentIon> <FragmentIon> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" value="3"/> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion m/z" value="367.1669"/> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion intensity" value="345.0"/> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z error" value="-18.767"/> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion retention time error" value="0.0025"/> </FragmentIon> <additional> <cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor mass" value="1971.9194"/> <cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor intensity" value="181349.0"/> <cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor error in ppm" value="0.8043"/> <cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor retention time in minutes" value="57.3537"/> <cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion mass RMS error" value="14.5969"/> <cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion retention time RMS error" value="0.0093"/> <cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted average charge state" value="2.2"/> <cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one match" value="" /> </additional> </PeptideItem> -- Phil Jones Senior Software Engineer PRIDE Project Team PANDA Group, EMBL-EBI Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD UK. Work phone: +44 1223 492662 (NEW NUMBER) Skype: philip-jones ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev This communication is intended only for the named recipient and may contain information that is confidential, legally privileged or subject to copyright; the Ludwig Institute for Cancer Research does not waiver any rights if you have received this communication in error. The views expressed in this communication are those of the sender and do not necessarily reflect the views of the Ludwig Institute for Cancer Research. |