Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is currently handled in PRIDE (Issue 28)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi David, Pierre-Alain and Phil,

I am not sure if this discussion has continued elsewhere? Anyway here are a few thoughts.

1) How is this handled (or not) in pepXML (ISB search analysis xml file). We can discuss this with Jimmy Eng if required?

2) In many cases the first answer is not the correct one (but it could be in the top 10). So if you do not support all top ten per spectrum then it's pointless. Several algorithms (X!Tandem for e.g.) only display the top hit with associated fragment ion information. I could look at OMSSA and let you know what it does.

3) Phil: Are Waters proposing that MS (to the e) experiments are supported within this framework? How big are the XML files (I agree that this is all encompassing but is it practical - as David and Pierre-Alain have alluded to)?

4) Perhaps the information used by the algorithm in reaching it's score should be supported - as per Mascot dat file (this would be good practice anyway) because it indicates some transparency on behalf of the algorithm vendor.

5) Something that would be useful (not directly related to analysisXML) is how to calculate the mass of a peptide using monoisotopic and average masses. IUPAC provides this but it would be good if everyone settled on the same exact masses for the elements (and modifications of course). A script could easily compute the correct fragment matches (within prescribed tolerance) based on the information in analysisXML. A problem of course is deciding which mz ion is which fragment ion if they overlap (default is accept all?). What about charge state of m/z ions. Currently most algorithms only go up to +2?

Just my thoughts. Look forward to discussing further.

regards,
Eugene

________________________________

From: Pierre-Alain Binz [mailto:pie...@is...]
Sent: Fri 18/07/2008 10:49 PM
To: David Creasy
Cc: Phil Jones @ EBI; psi...@li...; Eugene Kapp
Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is currently handled in PRIDE (Issue 28)

Hi Phil, to my opinion also, really too verbose.
Typically a place where arrays can be used efficiently.
In principle, the way I had shown with the phenyx example can probably be better encoded in single dimension or even multy dimension arrays (just like mzXML for m/z-I pairs).
Just my thoughts
Pierre-Alain

David Creasy wrote: 

	Hi Phil,

	Just to be sure I've not misunderstood... from below, each fragment ion 
	takes approx 500 bytes. Lets assume a conservative average of 20 
	fragment matches per spectrum and a modest search with 100k spectra. 
	Assuming that we just report fragment matches for the top match for each 
	spectrum, this would result in a file that is 500 x 20 x 100,000 = 1Gb. 
	If we reported fragment matches for the the top 10 matches for each 
	spectrum, this would be 10Gb. Is this reasonable and acceptable?

	David

	Phil Jones @ EBI wrote:

		Hi,

		Regarding Issue 28
		<http://code.google.com/p/psi-pi/issues/detail?id=28> <http://code.google.com/p/psi-pi/issues/detail?id=28>  "support
		reporting of fragment ions"

		As a suggestion of how this might be tackled:

		The latest development version of the PRIDE database includes a very
		simple mechanism
		for recording fragment ion information, illustrated below.  (Please
		note - made up data.)

		In this example, CV terms are used to define the type of ion and
		related information
		/ annotation.  Note that this is even more simple that the suggestion
		made by Andy
		above - no attempt is made here to indicate which residue has been
		called for each
		fragment ion - it is just listing the ions.

		Also note that while the PeptideItem is referencing the mass spectrum (which is
		reported in detail in the associated mzData file), the individual
		fragment ions are
		just reporting the m/z value and not attempting to make any kind of
		hard reference to
		the spectrum.

		As you can see, this has been developed in collaboration with Waters,
		with output
		from the ProteinLynx Global Server. (Actual values / sequence have
		been changed).

		One possible change would be to make the m/z value an attribute of the
		FragmentIon element, as this value will be mandatory and required to
		relate the fragment ion to the correct peak on the mass spectrum.  The
		CV used for the annotation would also need to be part of the PI CV ??

		Note that in the existing model, there are other terms available, to
		allow any kind of fragment ion to be described (not just B and Y ions)

		In the context of analysisXML, the <FragmentIon/> elements would be
		children of a <SpectrumIdentificationResultItem/>

		best regards,

		Phil.

		<PeptideItem>
		<Sequence>LFQQSQWTREVFSNSCK</Sequence>
		<Start>435</Start>
		<End>460</End>
		<SpectrumReference>123</SpectrumReference>
		<FragmentIon>
		<cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="3"/>
		<cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion
		m/z" value="379.2215"/>
		<cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion
		intensity" value="1382.0"/>
		<cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z
		error" value="-7.1543"/>
		<cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion
		retention time error" value="0.0207"/>
		</FragmentIon>
		<FragmentIon>
		<cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="4"/>
		<cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion
		m/z" value="534.2811"/>
		<cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion
		intensity" value="1242.0"/>
		<cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z
		error" value="-8.2315"/>
		<cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion
		retention time error" value="0.0029"/>
		</FragmentIon>
		<FragmentIon>
		<cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" value="3"/>
		<cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion
		m/z" value="394.1813"/>
		<cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion
		intensity" value="1917.0"/>
		<cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z
		error" value="-14.7098"/>
		<cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion
		retention time error" value="-0.0013"/>
		</FragmentIon>
		<FragmentIon>
		<cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" value="3"/>
		<cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion
		m/z" value="367.1669"/>
		<cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion
		intensity" value="345.0"/>
		<cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z
		error" value="-18.767"/>
		<cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion
		retention time error" value="0.0025"/>
		</FragmentIon>
		<additional>
		<cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor mass"
		value="1971.9194"/>
		<cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor
		intensity" value="181349.0"/>
		<cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor error
		in ppm" value="0.8043"/>
		<cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor
		retention time in minutes" value="57.3537"/>
		<cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion
		mass RMS error" value="14.5969"/>
		<cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion
		retention time RMS error" value="0.0093"/>
		<cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted
		average charge state" value="2.2"/>
		<cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one match"
		value="" />
		</additional>
		</PeptideItem>

		--
		Phil Jones
		Senior Software Engineer
		PRIDE Project Team
		PANDA Group, EMBL-EBI
		Wellcome Trust Genome Campus
		Hinxton, Cambridge, CB10 1SD
		UK.

		Work phone: +44 1223 492662 (NEW NUMBER)
		Skype: philip-jones

		-------------------------------------------------------------------------
		This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
		Build the coolest Linux based applications with Moblin SDK & win great prizes
		Grand prize is a trip for two to an Open Source event anywhere in the world
		http://moblin-contest.org/redirect.php?banner_id=100&url=/
		_______________________________________________
		Psidev-pi-dev mailing list
		Psi...@li...
		https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev

This communication is intended only for the named recipient and may contain information that is confidential, legally privileged or subject to copyright; the Ludwig Institute for Cancer Research does not waiver any rights if you have received this communication in error.
The views expressed in this communication are those of the sender and do not necessarily reflect the views of the Ludwig Institute for Cancer Research.