From: Eugene K. <Eug...@lu...> - 2008-07-22 13:13:07
|
Hi David, Pierre-Alain and Phil, I am not sure if this discussion has continued elsewhere? Anyway here are a few thoughts. 1) How is this handled (or not) in pepXML (ISB search analysis xml file). We can discuss this with Jimmy Eng if required? 2) In many cases the first answer is not the correct one (but it could be in the top 10). So if you do not support all top ten per spectrum then it's pointless. Several algorithms (X!Tandem for e.g.) only display the top hit with associated fragment ion information. I could look at OMSSA and let you know what it does. 3) Phil: Are Waters proposing that MS (to the e) experiments are supported within this framework? How big are the XML files (I agree that this is all encompassing but is it practical - as David and Pierre-Alain have alluded to)? 4) Perhaps the information used by the algorithm in reaching it's score should be supported - as per Mascot dat file (this would be good practice anyway) because it indicates some transparency on behalf of the algorithm vendor. 5) Something that would be useful (not directly related to analysisXML) is how to calculate the mass of a peptide using monoisotopic and average masses. IUPAC provides this but it would be good if everyone settled on the same exact masses for the elements (and modifications of course). A script could easily compute the correct fragment matches (within prescribed tolerance) based on the information in analysisXML. A problem of course is deciding which mz ion is which fragment ion if they overlap (default is accept all?). What about charge state of m/z ions. Currently most algorithms only go up to +2? Just my thoughts. Look forward to discussing further. regards, Eugene ________________________________ From: Pierre-Alain Binz [mailto:pie...@is...] Sent: Fri 18/07/2008 10:49 PM To: David Creasy Cc: Phil Jones @ EBI; psi...@li...; Eugene Kapp Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is currently handled in PRIDE (Issue 28) Hi Phil, to my opinion also, really too verbose. Typically a place where arrays can be used efficiently. In principle, the way I had shown with the phenyx example can probably be better encoded in single dimension or even multy dimension arrays (just like mzXML for m/z-I pairs). Just my thoughts Pierre-Alain David Creasy wrote: Hi Phil, Just to be sure I've not misunderstood... from below, each fragment ion takes approx 500 bytes. Lets assume a conservative average of 20 fragment matches per spectrum and a modest search with 100k spectra. Assuming that we just report fragment matches for the top match for each spectrum, this would result in a file that is 500 x 20 x 100,000 = 1Gb. If we reported fragment matches for the the top 10 matches for each spectrum, this would be 10Gb. Is this reasonable and acceptable? David Phil Jones @ EBI wrote: Hi, Regarding Issue 28 <http://code.google.com/p/psi-pi/issues/detail?id=28> <http://code.google.com/p/psi-pi/issues/detail?id=28> "support reporting of fragment ions" As a suggestion of how this might be tackled: The latest development version of the PRIDE database includes a very simple mechanism for recording fragment ion information, illustrated below. (Please note - made up data.) In this example, CV terms are used to define the type of ion and related information / annotation. Note that this is even more simple that the suggestion made by Andy above - no attempt is made here to indicate which residue has been called for each fragment ion - it is just listing the ions. Also note that while the PeptideItem is referencing the mass spectrum (which is reported in detail in the associated mzData file), the individual fragment ions are just reporting the m/z value and not attempting to make any kind of hard reference to the spectrum. As you can see, this has been developed in collaboration with Waters, with output from the ProteinLynx Global Server. (Actual values / sequence have been changed). One possible change would be to make the m/z value an attribute of the FragmentIon element, as this value will be mandatory and required to relate the fragment ion to the correct peak on the mass spectrum. The CV used for the annotation would also need to be part of the PI CV ?? Note that in the existing model, there are other terms available, to allow any kind of fragment ion to be described (not just B and Y ions) In the context of analysisXML, the <FragmentIon/> elements would be children of a <SpectrumIdentificationResultItem/> best regards, Phil. <PeptideItem> <Sequence>LFQQSQWTREVFSNSCK</Sequence> <Start>435</Start> <End>460</End> <SpectrumReference>123</SpectrumReference> <FragmentIon> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="3"/> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion m/z" value="379.2215"/> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion intensity" value="1382.0"/> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z error" value="-7.1543"/> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion retention time error" value="0.0207"/> </FragmentIon> <FragmentIon> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="4"/> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion m/z" value="534.2811"/> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion intensity" value="1242.0"/> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z error" value="-8.2315"/> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion retention time error" value="0.0029"/> </FragmentIon> <FragmentIon> <cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" value="3"/> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion m/z" value="394.1813"/> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion intensity" value="1917.0"/> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z error" value="-14.7098"/> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion retention time error" value="-0.0013"/> </FragmentIon> <FragmentIon> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" value="3"/> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion m/z" value="367.1669"/> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion intensity" value="345.0"/> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z error" value="-18.767"/> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion retention time error" value="0.0025"/> </FragmentIon> <additional> <cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor mass" value="1971.9194"/> <cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor intensity" value="181349.0"/> <cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor error in ppm" value="0.8043"/> <cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor retention time in minutes" value="57.3537"/> <cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion mass RMS error" value="14.5969"/> <cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion retention time RMS error" value="0.0093"/> <cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted average charge state" value="2.2"/> <cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one match" value="" /> </additional> </PeptideItem> -- Phil Jones Senior Software Engineer PRIDE Project Team PANDA Group, EMBL-EBI Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD UK. Work phone: +44 1223 492662 (NEW NUMBER) Skype: philip-jones ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev This communication is intended only for the named recipient and may contain information that is confidential, legally privileged or subject to copyright; the Ludwig Institute for Cancer Research does not waiver any rights if you have received this communication in error. The views expressed in this communication are those of the sender and do not necessarily reflect the views of the Ludwig Institute for Cancer Research. |