You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
(5) |
Aug
(4) |
Sep
(4) |
Oct
(10) |
Nov
(1) |
Dec
(1) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2008 |
Jan
|
Feb
(2) |
Mar
(2) |
Apr
(8) |
May
(40) |
Jun
(30) |
Jul
(61) |
Aug
(21) |
Sep
(12) |
Oct
(56) |
Nov
(99) |
Dec
(83) |
2009 |
Jan
(3) |
Feb
(9) |
Mar
(1) |
Apr
(5) |
May
(88) |
Jun
(43) |
Jul
(60) |
Aug
(54) |
Sep
(4) |
Oct
(18) |
Nov
(9) |
Dec
(5) |
2010 |
Jan
|
Feb
(3) |
Mar
(1) |
Apr
(8) |
May
(10) |
Jun
(8) |
Jul
(10) |
Aug
(18) |
Sep
(11) |
Oct
(19) |
Nov
(14) |
Dec
(26) |
2011 |
Jan
(27) |
Feb
(38) |
Mar
(50) |
Apr
(128) |
May
(54) |
Jun
(116) |
Jul
(79) |
Aug
(163) |
Sep
(21) |
Oct
(14) |
Nov
(19) |
Dec
(9) |
2012 |
Jan
(7) |
Feb
(34) |
Mar
(34) |
Apr
(50) |
May
(70) |
Jun
(23) |
Jul
(8) |
Aug
(24) |
Sep
(35) |
Oct
(40) |
Nov
(276) |
Dec
(34) |
2013 |
Jan
(25) |
Feb
(23) |
Mar
(12) |
Apr
(59) |
May
(31) |
Jun
(11) |
Jul
(21) |
Aug
(7) |
Sep
(18) |
Oct
(11) |
Nov
(12) |
Dec
(18) |
2014 |
Jan
(37) |
Feb
(22) |
Mar
(9) |
Apr
(10) |
May
(38) |
Jun
(20) |
Jul
(15) |
Aug
(4) |
Sep
(4) |
Oct
(3) |
Nov
(8) |
Dec
(5) |
2015 |
Jan
(13) |
Feb
(34) |
Mar
(27) |
Apr
(5) |
May
(12) |
Jun
(10) |
Jul
(12) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
|
Dec
(6) |
2016 |
Jan
(1) |
Feb
(1) |
Mar
(17) |
Apr
(139) |
May
(120) |
Jun
(90) |
Jul
(10) |
Aug
|
Sep
|
Oct
(11) |
Nov
(6) |
Dec
(2) |
2017 |
Jan
(24) |
Feb
(8) |
Mar
(7) |
Apr
(2) |
May
(5) |
Jun
(11) |
Jul
(5) |
Aug
(9) |
Sep
(6) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
2018 |
Jan
(7) |
Feb
|
Mar
(4) |
Apr
(6) |
May
(10) |
Jun
(6) |
Jul
(7) |
Aug
|
Sep
(7) |
Oct
(5) |
Nov
(3) |
Dec
(3) |
2019 |
Jan
(3) |
Feb
|
Mar
(4) |
Apr
(3) |
May
(2) |
Jun
(6) |
Jul
(3) |
Aug
(2) |
Sep
|
Oct
(2) |
Nov
(12) |
Dec
(1) |
2020 |
Jan
(3) |
Feb
(1) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2021 |
Jan
|
Feb
|
Mar
|
Apr
(3) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: David C. <dc...@ma...> - 2008-07-28 23:06:40
|
Hi everyone, There will be an AnalysisXML working group conference call on Thursday at: http://www.timeanddate.com/worldclock/fixedtime.html?day=31&month=7&year=2008&hour=16&min=0&sec=0&p1=136 Agenda: 1. Decide / vote on http://code.google.com/p/psi-pi/issues/detail?id=30 2. Decide / vote on: http://code.google.com/p/psi-pi/issues/detail?id=28 3. Agree on what else required before submitting to steering group to review Dial in details: + Germany: 08001012079 + Switzerland: 0800000860 + UK: 08081095644 + USA: 1-866-314-3683 + Generic international: +44 2083222500 (UK number) access code: 297427 -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: David C. <dc...@ma...> - 2008-07-23 16:26:14
|
Hello, I've added some comments/suggestions for specifying an enzyme to: http://code.google.com/p/psi-pi/issues/detail?id=30 If anything needs clarification, please add further comments to the issue. Otherwise, we'll probably need to have a vote at the next telecon on whether to use #6, #7, #9 or #10 (or yet another suggestion) Thanks, David -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: David C. <dc...@ma...> - 2008-07-23 14:47:13
|
Hi everyone, Sorry for the late notice, but there won't be an AnalysisXML working group conference call this week. There will be one next Thursday, 31st July at 4:00pm as usual http://www.timeanddate.com/worldclock/fixedtime.html?day=31&month=7&year=2008&hour=16&min=0&sec=0&p1=136 I'll send out an agenda next week. David -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: Eugene K. <Eug...@lu...> - 2008-07-22 13:13:07
|
Hi David, Pierre-Alain and Phil, I am not sure if this discussion has continued elsewhere? Anyway here are a few thoughts. 1) How is this handled (or not) in pepXML (ISB search analysis xml file). We can discuss this with Jimmy Eng if required? 2) In many cases the first answer is not the correct one (but it could be in the top 10). So if you do not support all top ten per spectrum then it's pointless. Several algorithms (X!Tandem for e.g.) only display the top hit with associated fragment ion information. I could look at OMSSA and let you know what it does. 3) Phil: Are Waters proposing that MS (to the e) experiments are supported within this framework? How big are the XML files (I agree that this is all encompassing but is it practical - as David and Pierre-Alain have alluded to)? 4) Perhaps the information used by the algorithm in reaching it's score should be supported - as per Mascot dat file (this would be good practice anyway) because it indicates some transparency on behalf of the algorithm vendor. 5) Something that would be useful (not directly related to analysisXML) is how to calculate the mass of a peptide using monoisotopic and average masses. IUPAC provides this but it would be good if everyone settled on the same exact masses for the elements (and modifications of course). A script could easily compute the correct fragment matches (within prescribed tolerance) based on the information in analysisXML. A problem of course is deciding which mz ion is which fragment ion if they overlap (default is accept all?). What about charge state of m/z ions. Currently most algorithms only go up to +2? Just my thoughts. Look forward to discussing further. regards, Eugene ________________________________ From: Pierre-Alain Binz [mailto:pie...@is...] Sent: Fri 18/07/2008 10:49 PM To: David Creasy Cc: Phil Jones @ EBI; psi...@li...; Eugene Kapp Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is currently handled in PRIDE (Issue 28) Hi Phil, to my opinion also, really too verbose. Typically a place where arrays can be used efficiently. In principle, the way I had shown with the phenyx example can probably be better encoded in single dimension or even multy dimension arrays (just like mzXML for m/z-I pairs). Just my thoughts Pierre-Alain David Creasy wrote: Hi Phil, Just to be sure I've not misunderstood... from below, each fragment ion takes approx 500 bytes. Lets assume a conservative average of 20 fragment matches per spectrum and a modest search with 100k spectra. Assuming that we just report fragment matches for the top match for each spectrum, this would result in a file that is 500 x 20 x 100,000 = 1Gb. If we reported fragment matches for the the top 10 matches for each spectrum, this would be 10Gb. Is this reasonable and acceptable? David Phil Jones @ EBI wrote: Hi, Regarding Issue 28 <http://code.google.com/p/psi-pi/issues/detail?id=28> <http://code.google.com/p/psi-pi/issues/detail?id=28> "support reporting of fragment ions" As a suggestion of how this might be tackled: The latest development version of the PRIDE database includes a very simple mechanism for recording fragment ion information, illustrated below. (Please note - made up data.) In this example, CV terms are used to define the type of ion and related information / annotation. Note that this is even more simple that the suggestion made by Andy above - no attempt is made here to indicate which residue has been called for each fragment ion - it is just listing the ions. Also note that while the PeptideItem is referencing the mass spectrum (which is reported in detail in the associated mzData file), the individual fragment ions are just reporting the m/z value and not attempting to make any kind of hard reference to the spectrum. As you can see, this has been developed in collaboration with Waters, with output from the ProteinLynx Global Server. (Actual values / sequence have been changed). One possible change would be to make the m/z value an attribute of the FragmentIon element, as this value will be mandatory and required to relate the fragment ion to the correct peak on the mass spectrum. The CV used for the annotation would also need to be part of the PI CV ?? Note that in the existing model, there are other terms available, to allow any kind of fragment ion to be described (not just B and Y ions) In the context of analysisXML, the <FragmentIon/> elements would be children of a <SpectrumIdentificationResultItem/> best regards, Phil. <PeptideItem> <Sequence>LFQQSQWTREVFSNSCK</Sequence> <Start>435</Start> <End>460</End> <SpectrumReference>123</SpectrumReference> <FragmentIon> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="3"/> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion m/z" value="379.2215"/> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion intensity" value="1382.0"/> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z error" value="-7.1543"/> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion retention time error" value="0.0207"/> </FragmentIon> <FragmentIon> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="4"/> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion m/z" value="534.2811"/> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion intensity" value="1242.0"/> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z error" value="-8.2315"/> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion retention time error" value="0.0029"/> </FragmentIon> <FragmentIon> <cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" value="3"/> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion m/z" value="394.1813"/> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion intensity" value="1917.0"/> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z error" value="-14.7098"/> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion retention time error" value="-0.0013"/> </FragmentIon> <FragmentIon> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" value="3"/> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion m/z" value="367.1669"/> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion intensity" value="345.0"/> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z error" value="-18.767"/> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion retention time error" value="0.0025"/> </FragmentIon> <additional> <cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor mass" value="1971.9194"/> <cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor intensity" value="181349.0"/> <cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor error in ppm" value="0.8043"/> <cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor retention time in minutes" value="57.3537"/> <cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion mass RMS error" value="14.5969"/> <cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion retention time RMS error" value="0.0093"/> <cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted average charge state" value="2.2"/> <cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one match" value="" /> </additional> </PeptideItem> -- Phil Jones Senior Software Engineer PRIDE Project Team PANDA Group, EMBL-EBI Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD UK. Work phone: +44 1223 492662 (NEW NUMBER) Skype: philip-jones ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev This communication is intended only for the named recipient and may contain information that is confidential, legally privileged or subject to copyright; the Ludwig Institute for Cancer Research does not waiver any rights if you have received this communication in error. The views expressed in this communication are those of the sender and do not necessarily reflect the views of the Ludwig Institute for Cancer Research. |
From: Matthew C. <mat...@va...> - 2008-07-21 15:27:22
|
Hi David, I have heard of those fragment types, but I don't know enough about them to propose a grammar. From what I know about immonium ions they would be simple enough, but the other two do seem too ugly to represent with a single label. I think we can safely ignore these, at least until a later version. -Matt David Creasy wrote: > What about internal fragments, immonium ions, side chain cleavages? > Or would we just ignore these... > David > > Matthew Chambers wrote: >> By standard grammar are you referring to the little format I came up >> with? >> >> <a|b|c|x|y|z><# between 1 and >> peptide_length>[<+|-><formula>][,(<+|-><charge>] >> >> It seems pretty easy to verify to me - easier than some of the other >> features in mzML and analysisXML. :) The hardest part is the >> <formula> and verifying that the ion series # is between 1 and >> peptide_length. Those may have to be semantic rather than syntactic >> verification steps. Making the charge part mandatory would simplify >> the format. I think the auxiliary file would rarely be written and/or >> copied along with the original file, so it wouldn't do much good. If >> it's a concern, the <formula> part could wait until a later version. >> >> -Matt >> >> >> Jones, Andy wrote: >>> Hi Matt, >>> >>>> As for mapping to the observed ion(s), I think it's not relevant >>>> for the >>>> purposes of basic annotation. For clarity of presentation, viewers >>>> usually show the ion as either a logical point in the spectrum >>>> independent of the data itself, or they map it to the most abundant >>>> peak >>>> in the window. These approaches can be combined by changing the >>>> annotation when the user zooms in. >>>> >>> Agreed, I can see the use case for viewers. Are there any others...? >>> The problem I have at the moment is that we're a long way from >>> having this standard grammar specified in a formal way which could >>> be verified. One option to consider is defining an auxiliary >>> (non-XML) file which could be transferred in parallel - this way we >>> can keep it outside the formal analysisXML standard, in which we try >>> out something similar to your proposal and see if we can get the >>> main search engines to output something consistent. If successful, >>> roll it into analysisXML v2...? >>> >>> Andy >>> >>> >>> >>> >>> >>> >>> >>> >>> >>>> -----Original Message----- >>>> From: Matt Chambers [mailto:mat...@va...] >>>> Sent: 21 July 2008 14:23 >>>> To: Jones, Andy >>>> Cc: psi...@li... >>>> Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it >>>> is currently >>>> handled in PRIDE (Issue 28) >>>> >>>> Hi Andy, >>>> >>>> As we have both said, it's important to determine the use cases for >>>> this >>>> information. :) The only reasonable use case that doesn't take up >>>> oodles >>>> of disk space is simply knowing the ion types that were predicted. >>>> >>>> Unless I planned to reproduce the search engine's comparison >>>> exactly, I >>>> don't see the point in knowing the exact mass(es) that the search >>>> engine >>>> expected and the observed ion(s) that it matched to. And if I plan to >>>> reproduce the score, that probably means I have access to the search >>>> engine's algorithm, so I'd just regenerate the comparison. >>>> >>>> As for mapping to the observed ion(s), I think it's not relevant >>>> for the >>>> purposes of basic annotation. For clarity of presentation, viewers >>>> usually show the ion as either a logical point in the spectrum >>>> independent of the data itself, or they map it to the most abundant >>>> peak >>>> in the window. These approaches can be combined by changing the >>>> annotation when the user zooms in. >>>> >>>> So yes, in this approach we have information loss. But I think it's >>>> better than not having the information at all (and depending on a >>>> vendor-supplied and version-dependent script to regenerate it) and >>>> certainly better than choking on 10gb analysis files. ;) >>>> >>>> -Matt >>>> >>>> >>>> Jones, Andy wrote: >>>> >>>>> Hi all, >>>>> >>>>> >>>>> >>>>>> An example to show how compact it could be: >>>>>> fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" >>>>>> >>>>>> >>>>> I have a couple of queries about this proposal... >>>>> >>>>> Given a peptide sequence, we would be able to work out what were the >>>>> >>>> expected masses of these fragments, assuming a standard method of >>>> calculating >>>> the masses of the b and y ions (and losses) - do all search engines >>>> use the same >>>> equation to calculate ion masses? >>>> >>>>> We wouldn't really know which peaks in the source spectrum >>>>> corresponded with >>>>> >>>> which ion. For many of the peaks we would be able to make a fair >>>> guess i.e. there >>>> is an observed peak within the tolerance window matching the >>>> expected mass but >>>> this doesn't help when there are multiple peaks within the window - >>>> I don't think we >>>> could correctly assume it would always be the most abundant peak...? >>>> >>>>> In other words, we still have information loss. Perhaps one way >>>>> forward would >>>>> >>>> be for us to list the use cases that fragment ions must be reported >>>> for - do we >>>> have a list of use cases anywhere? >>>> >>>>> I think getting this right will be a long process, so we have to >>>>> make sure that we >>>>> >>>> have a strong enough use case if we really want to get this into >>>> analysisXML >>>> version1. >>>> >>>>> Cheers >>>>> Andy >>>>> >>>>> >>>>> >>>>> >>> >>> ------------------------------------------------------------------------- >>> >>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>> challenge >>> Build the coolest Linux based applications with Moblin SDK & win >>> great prizes >>> Grand prize is a trip for two to an Open Source event anywhere in >>> the world >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> >> >> ------------------------------------------------------------------------- >> >> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >> Build the coolest Linux based applications with Moblin SDK & win >> great prizes >> Grand prize is a trip for two to an Open Source event anywhere in the >> world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > |
From: David C. <dc...@ma...> - 2008-07-21 14:55:00
|
What about internal fragments, immonium ions, side chain cleavages? Or would we just ignore these... David Matthew Chambers wrote: > By standard grammar are you referring to the little format I came up with? > > <a|b|c|x|y|z><# between 1 and peptide_length>[<+|-><formula>][,(<+|-><charge>] > > It seems pretty easy to verify to me - easier than some of the other > features in mzML and analysisXML. :) The hardest part is the <formula> > and verifying that the ion series # is between 1 and peptide_length. > Those may have to be semantic rather than syntactic verification steps. > Making the charge part mandatory would simplify the format. I think the > auxiliary file would rarely be written and/or copied along with the > original file, so it wouldn't do much good. If it's a concern, the > <formula> part could wait until a later version. > > -Matt > > > Jones, Andy wrote: >> Hi Matt, >> >>> As for mapping to the observed ion(s), I think it's not relevant for the >>> purposes of basic annotation. For clarity of presentation, viewers >>> usually show the ion as either a logical point in the spectrum >>> independent of the data itself, or they map it to the most abundant peak >>> in the window. These approaches can be combined by changing the >>> annotation when the user zooms in. >>> >> Agreed, I can see the use case for viewers. Are there any others...? >> >> The problem I have at the moment is that we're a long way from having this standard grammar specified in a formal way which could be verified. One option to consider is defining an auxiliary (non-XML) file which could be transferred in parallel - this way we can keep it outside the formal analysisXML standard, in which we try out something similar to your proposal and see if we can get the main search engines to output something consistent. If successful, roll it into analysisXML v2...? >> >> Andy >> >> >> >> >> >> >> >> >> >>> -----Original Message----- >>> From: Matt Chambers [mailto:mat...@va...] >>> Sent: 21 July 2008 14:23 >>> To: Jones, Andy >>> Cc: psi...@li... >>> Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is currently >>> handled in PRIDE (Issue 28) >>> >>> Hi Andy, >>> >>> As we have both said, it's important to determine the use cases for this >>> information. :) The only reasonable use case that doesn't take up oodles >>> of disk space is simply knowing the ion types that were predicted. >>> >>> Unless I planned to reproduce the search engine's comparison exactly, I >>> don't see the point in knowing the exact mass(es) that the search engine >>> expected and the observed ion(s) that it matched to. And if I plan to >>> reproduce the score, that probably means I have access to the search >>> engine's algorithm, so I'd just regenerate the comparison. >>> >>> As for mapping to the observed ion(s), I think it's not relevant for the >>> purposes of basic annotation. For clarity of presentation, viewers >>> usually show the ion as either a logical point in the spectrum >>> independent of the data itself, or they map it to the most abundant peak >>> in the window. These approaches can be combined by changing the >>> annotation when the user zooms in. >>> >>> So yes, in this approach we have information loss. But I think it's >>> better than not having the information at all (and depending on a >>> vendor-supplied and version-dependent script to regenerate it) and >>> certainly better than choking on 10gb analysis files. ;) >>> >>> -Matt >>> >>> >>> Jones, Andy wrote: >>> >>>> Hi all, >>>> >>>> >>>> >>>>> An example to show how compact it could be: >>>>> fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" >>>>> >>>>> >>>> I have a couple of queries about this proposal... >>>> >>>> Given a peptide sequence, we would be able to work out what were the >>>> >>> expected masses of these fragments, assuming a standard method of calculating >>> the masses of the b and y ions (and losses) - do all search engines use the same >>> equation to calculate ion masses? >>> >>>> We wouldn't really know which peaks in the source spectrum corresponded with >>>> >>> which ion. For many of the peaks we would be able to make a fair guess i.e. there >>> is an observed peak within the tolerance window matching the expected mass but >>> this doesn't help when there are multiple peaks within the window - I don't think we >>> could correctly assume it would always be the most abundant peak...? >>> >>>> In other words, we still have information loss. Perhaps one way forward would >>>> >>> be for us to list the use cases that fragment ions must be reported for - do we >>> have a list of use cases anywhere? >>> >>>> I think getting this right will be a long process, so we have to make sure that we >>>> >>> have a strong enough use case if we really want to get this into analysisXML >>> version1. >>> >>>> Cheers >>>> Andy >>>> >>>> >>>> >>>> >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >> Build the coolest Linux based applications with Moblin SDK & win great prizes >> Grand prize is a trip for two to an Open Source event anywhere in the world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> >> > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: Matthew C. <mat...@va...> - 2008-07-21 14:41:20
|
By standard grammar are you referring to the little format I came up with? <a|b|c|x|y|z><# between 1 and peptide_length>[<+|-><formula>][,(<+|-><charge>] It seems pretty easy to verify to me - easier than some of the other features in mzML and analysisXML. :) The hardest part is the <formula> and verifying that the ion series # is between 1 and peptide_length. Those may have to be semantic rather than syntactic verification steps. Making the charge part mandatory would simplify the format. I think the auxiliary file would rarely be written and/or copied along with the original file, so it wouldn't do much good. If it's a concern, the <formula> part could wait until a later version. -Matt Jones, Andy wrote: > Hi Matt, > >> As for mapping to the observed ion(s), I think it's not relevant for the >> purposes of basic annotation. For clarity of presentation, viewers >> usually show the ion as either a logical point in the spectrum >> independent of the data itself, or they map it to the most abundant peak >> in the window. These approaches can be combined by changing the >> annotation when the user zooms in. >> > > Agreed, I can see the use case for viewers. Are there any others...? > > The problem I have at the moment is that we're a long way from having this standard grammar specified in a formal way which could be verified. One option to consider is defining an auxiliary (non-XML) file which could be transferred in parallel - this way we can keep it outside the formal analysisXML standard, in which we try out something similar to your proposal and see if we can get the main search engines to output something consistent. If successful, roll it into analysisXML v2...? > > Andy > > > > > > > > > >> -----Original Message----- >> From: Matt Chambers [mailto:mat...@va...] >> Sent: 21 July 2008 14:23 >> To: Jones, Andy >> Cc: psi...@li... >> Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is currently >> handled in PRIDE (Issue 28) >> >> Hi Andy, >> >> As we have both said, it's important to determine the use cases for this >> information. :) The only reasonable use case that doesn't take up oodles >> of disk space is simply knowing the ion types that were predicted. >> >> Unless I planned to reproduce the search engine's comparison exactly, I >> don't see the point in knowing the exact mass(es) that the search engine >> expected and the observed ion(s) that it matched to. And if I plan to >> reproduce the score, that probably means I have access to the search >> engine's algorithm, so I'd just regenerate the comparison. >> >> As for mapping to the observed ion(s), I think it's not relevant for the >> purposes of basic annotation. For clarity of presentation, viewers >> usually show the ion as either a logical point in the spectrum >> independent of the data itself, or they map it to the most abundant peak >> in the window. These approaches can be combined by changing the >> annotation when the user zooms in. >> >> So yes, in this approach we have information loss. But I think it's >> better than not having the information at all (and depending on a >> vendor-supplied and version-dependent script to regenerate it) and >> certainly better than choking on 10gb analysis files. ;) >> >> -Matt >> >> >> Jones, Andy wrote: >> >>> Hi all, >>> >>> >>> >>>> An example to show how compact it could be: >>>> fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" >>>> >>>> >>> I have a couple of queries about this proposal... >>> >>> Given a peptide sequence, we would be able to work out what were the >>> >> expected masses of these fragments, assuming a standard method of calculating >> the masses of the b and y ions (and losses) - do all search engines use the same >> equation to calculate ion masses? >> >>> We wouldn't really know which peaks in the source spectrum corresponded with >>> >> which ion. For many of the peaks we would be able to make a fair guess i.e. there >> is an observed peak within the tolerance window matching the expected mass but >> this doesn't help when there are multiple peaks within the window - I don't think we >> could correctly assume it would always be the most abundant peak...? >> >>> In other words, we still have information loss. Perhaps one way forward would >>> >> be for us to list the use cases that fragment ions must be reported for - do we >> have a list of use cases anywhere? >> >>> I think getting this right will be a long process, so we have to make sure that we >>> >> have a strong enough use case if we really want to get this into analysisXML >> version1. >> >>> Cheers >>> Andy >>> >>> >>> >>> > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > |
From: Jones, A. <And...@li...> - 2008-07-21 14:08:25
|
Hi Matt, > As we have both said, it's important to determine the use cases for this > information. :) The only reasonable use case that doesn't take up oodles > of disk space is simply knowing the ion types that were predicted. Another alternative would be to have parallel arrays, similar to mzML, with fragment ions as you suggested in one and observed masses in the other (perhaps represented in base64 binary...) - I'm not necessarily suggesting this is a good idea! > As for mapping to the observed ion(s), I think it's not relevant for the > purposes of basic annotation. For clarity of presentation, viewers > usually show the ion as either a logical point in the spectrum > independent of the data itself, or they map it to the most abundant peak > in the window. These approaches can be combined by changing the > annotation when the user zooms in. Agreed, I can see the use case for viewers. Are there any others...? The problem I have at the moment is that we're a long way from having this standard grammar specified in a formal way which could be verified. One option to consider is defining an auxiliary (non-XML) file which could be transferred in parallel - this way we can keep it outside the formal analysisXML standard, in which we try out something similar to your proposal and see if we can get the main search engines to output something consistent. If successful, roll it into analysisXML v2...? Andy > -----Original Message----- > From: Matt Chambers [mailto:mat...@va...] > Sent: 21 July 2008 14:23 > To: Jones, Andy > Cc: psi...@li... > Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is currently > handled in PRIDE (Issue 28) > > Hi Andy, > > As we have both said, it's important to determine the use cases for this > information. :) The only reasonable use case that doesn't take up oodles > of disk space is simply knowing the ion types that were predicted. > > Unless I planned to reproduce the search engine's comparison exactly, I > don't see the point in knowing the exact mass(es) that the search engine > expected and the observed ion(s) that it matched to. And if I plan to > reproduce the score, that probably means I have access to the search > engine's algorithm, so I'd just regenerate the comparison. > > As for mapping to the observed ion(s), I think it's not relevant for the > purposes of basic annotation. For clarity of presentation, viewers > usually show the ion as either a logical point in the spectrum > independent of the data itself, or they map it to the most abundant peak > in the window. These approaches can be combined by changing the > annotation when the user zooms in. > > So yes, in this approach we have information loss. But I think it's > better than not having the information at all (and depending on a > vendor-supplied and version-dependent script to regenerate it) and > certainly better than choking on 10gb analysis files. ;) > > -Matt > > > Jones, Andy wrote: > > Hi all, > > > > > >> An example to show how compact it could be: > >> fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" > >> > > > > I have a couple of queries about this proposal... > > > > Given a peptide sequence, we would be able to work out what were the > expected masses of these fragments, assuming a standard method of calculating > the masses of the b and y ions (and losses) - do all search engines use the same > equation to calculate ion masses? > > > > We wouldn't really know which peaks in the source spectrum corresponded with > which ion. For many of the peaks we would be able to make a fair guess i.e. there > is an observed peak within the tolerance window matching the expected mass but > this doesn't help when there are multiple peaks within the window - I don't think we > could correctly assume it would always be the most abundant peak...? > > > > In other words, we still have information loss. Perhaps one way forward would > be for us to list the use cases that fragment ions must be reported for - do we > have a list of use cases anywhere? > > > > I think getting this right will be a long process, so we have to make sure that we > have a strong enough use case if we really want to get this into analysisXML > version1. > > > > Cheers > > Andy > > > > > > |
From: Matt C. <mat...@va...> - 2008-07-21 13:21:12
|
Hi Andy, As we have both said, it's important to determine the use cases for this information. :) The only reasonable use case that doesn't take up oodles of disk space is simply knowing the ion types that were predicted. Unless I planned to reproduce the search engine's comparison exactly, I don't see the point in knowing the exact mass(es) that the search engine expected and the observed ion(s) that it matched to. And if I plan to reproduce the score, that probably means I have access to the search engine's algorithm, so I'd just regenerate the comparison. As for mapping to the observed ion(s), I think it's not relevant for the purposes of basic annotation. For clarity of presentation, viewers usually show the ion as either a logical point in the spectrum independent of the data itself, or they map it to the most abundant peak in the window. These approaches can be combined by changing the annotation when the user zooms in. So yes, in this approach we have information loss. But I think it's better than not having the information at all (and depending on a vendor-supplied and version-dependent script to regenerate it) and certainly better than choking on 10gb analysis files. ;) -Matt Jones, Andy wrote: > Hi all, > > >> An example to show how compact it could be: >> fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" >> > > I have a couple of queries about this proposal... > > Given a peptide sequence, we would be able to work out what were the expected masses of these fragments, assuming a standard method of calculating the masses of the b and y ions (and losses) - do all search engines use the same equation to calculate ion masses? > > We wouldn't really know which peaks in the source spectrum corresponded with which ion. For many of the peaks we would be able to make a fair guess i.e. there is an observed peak within the tolerance window matching the expected mass but this doesn't help when there are multiple peaks within the window - I don't think we could correctly assume it would always be the most abundant peak...? > > In other words, we still have information loss. Perhaps one way forward would be for us to list the use cases that fragment ions must be reported for - do we have a list of use cases anywhere? > > I think getting this right will be a long process, so we have to make sure that we have a strong enough use case if we really want to get this into analysisXML version1. > > Cheers > Andy > > > |
From: Jones, A. <And...@li...> - 2008-07-21 12:26:09
|
Hi all, > An example to show how compact it could be: > fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" I have a couple of queries about this proposal... Given a peptide sequence, we would be able to work out what were the expected masses of these fragments, assuming a standard method of calculating the masses of the b and y ions (and losses) - do all search engines use the same equation to calculate ion masses? We wouldn't really know which peaks in the source spectrum corresponded with which ion. For many of the peaks we would be able to make a fair guess i.e. there is an observed peak within the tolerance window matching the expected mass but this doesn't help when there are multiple peaks within the window - I don't think we could correctly assume it would always be the most abundant peak...? In other words, we still have information loss. Perhaps one way forward would be for us to list the use cases that fragment ions must be reported for - do we have a list of use cases anywhere? I think getting this right will be a long process, so we have to make sure that we have a strong enough use case if we really want to get this into analysisXML version1. Cheers Andy > -----Original Message----- > From: psi...@li... [mailto:psidev-pi-dev- > bo...@li...] On Behalf Of Matthew Chambers > Sent: 18 July 2008 16:00 > To: psi...@li... > Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is currently > handled in PRIDE (Issue 28) > > I also agree that anything beyond an array is far too verbose. To answer > this question, I think we need to decide the scope of the problem. What > do we want fragment ion information to represent? I think analysis > software is too diverse to use it for anything more than basic > annotation, but basic annotation is important. If there are ways people > want it to be usable beyond that, speak up. :) > > For basic annotation, all I think is needed is the fragment type, series > number, charge state, and possibly any modification like a neutral loss > or radical. The array can be an attribute or text node. We can use a > grammar for each term, where each term represents an ion and terms are > space delimited. The grammar might look like: <a|b|c|x|y|z><# between 1 > and peptide_length>[<+|-><formula>][,(<+|-><charge>] > We could make the charge part mandatory or if it was optional, assume a > +1 charge (or possibly allow the charge to be based on the polarity of > the source scan?). I assume there is a standard chemical formula format > that can be represented compactly in ASCII text, but I don't know it. > An example to show how compact it could be: > fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" > > For basic annotation, the masses are not necessary I think. Expected > mass can be recomputed if all the label metadata is complete and > regular, and the observed mass is unimportant for annotation (IMO). > > -Matt > > > David Creasy wrote: > > Hi Phil, > > > > Just to be sure I've not misunderstood... from below, each fragment ion > > takes approx 500 bytes. Lets assume a conservative average of 20 > > fragment matches per spectrum and a modest search with 100k spectra. > > Assuming that we just report fragment matches for the top match for each > > spectrum, this would result in a file that is 500 x 20 x 100,000 = 1Gb. > > If we reported fragment matches for the the top 10 matches for each > > spectrum, this would be 10Gb. Is this reasonable and acceptable? > > > > David > > > > > > > > Phil Jones @ EBI wrote: > > > >> Hi, > >> > >> Regarding Issue 28 > >> <http://code.google.com/p/psi-pi/issues/detail?id=28> "support > >> reporting of fragment ions" > >> > >> As a suggestion of how this might be tackled: > >> > >> The latest development version of the PRIDE database includes a very > >> simple mechanism > >> for recording fragment ion information, illustrated below. (Please > >> note - made up data.) > >> > >> In this example, CV terms are used to define the type of ion and > >> related information > >> / annotation. Note that this is even more simple that the suggestion > >> made by Andy > >> above - no attempt is made here to indicate which residue has been > >> called for each > >> fragment ion - it is just listing the ions. > >> > >> Also note that while the PeptideItem is referencing the mass spectrum (which is > >> reported in detail in the associated mzData file), the individual > >> fragment ions are > >> just reporting the m/z value and not attempting to make any kind of > >> hard reference to > >> the spectrum. > >> > >> As you can see, this has been developed in collaboration with Waters, > >> with output > >> from the ProteinLynx Global Server. (Actual values / sequence have > >> been changed). > >> > >> One possible change would be to make the m/z value an attribute of the > >> FragmentIon element, as this value will be mandatory and required to > >> relate the fragment ion to the correct peak on the mass spectrum. The > >> CV used for the annotation would also need to be part of the PI CV ?? > >> > >> Note that in the existing model, there are other terms available, to > >> allow any kind of fragment ion to be described (not just B and Y ions) > >> > >> In the context of analysisXML, the <FragmentIon/> elements would be > >> children of a <SpectrumIdentificationResultItem/> > >> > >> best regards, > >> > >> Phil. > >> > >> <PeptideItem> > >> <Sequence>LFQQSQWTREVFSNSCK</Sequence> > >> <Start>435</Start> > >> <End>460</End> > >> <SpectrumReference>123</SpectrumReference> > >> <FragmentIon> > >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > value="3"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> m/z" value="379.2215"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> intensity" value="1382.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > >> error" value="-7.1543"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> retention time error" value="0.0207"/> > >> </FragmentIon> > >> <FragmentIon> > >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > value="4"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> m/z" value="534.2811"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> intensity" value="1242.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > >> error" value="-8.2315"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> retention time error" value="0.0029"/> > >> </FragmentIon> > >> <FragmentIon> > >> <cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" > value="3"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> m/z" value="394.1813"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> intensity" value="1917.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > >> error" value="-14.7098"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> retention time error" value="-0.0013"/> > >> </FragmentIon> > >> <FragmentIon> > >> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" > value="3"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> m/z" value="367.1669"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> intensity" value="345.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > >> error" value="-18.767"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> retention time error" value="0.0025"/> > >> </FragmentIon> > >> <additional> > >> <cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor mass" > >> value="1971.9194"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor > >> intensity" value="181349.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor error > >> in ppm" value="0.8043"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor > >> retention time in minutes" value="57.3537"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion > >> mass RMS error" value="14.5969"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion > >> retention time RMS error" value="0.0093"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted > >> average charge state" value="2.2"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one match" > >> value="" /> > >> </additional> > >> </PeptideItem> > >> > >> > >> -- > >> Phil Jones > >> Senior Software Engineer > >> PRIDE Project Team > >> PANDA Group, EMBL-EBI > >> Wellcome Trust Genome Campus > >> Hinxton, Cambridge, CB10 1SD > >> UK. > >> > >> Work phone: +44 1223 492662 (NEW NUMBER) > >> Skype: philip-jones > >> > >> ------------------------------------------------------------------------- > >> This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > >> Build the coolest Linux based applications with Moblin SDK & win great prizes > >> Grand prize is a trip for two to an Open Source event anywhere in the world > >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ > >> _______________________________________________ > >> Psidev-pi-dev mailing list > >> Psi...@li... > >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >> > > > > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev |
From: Matthew C. <mat...@va...> - 2008-07-18 14:59:48
|
I also agree that anything beyond an array is far too verbose. To answer this question, I think we need to decide the scope of the problem. What do we want fragment ion information to represent? I think analysis software is too diverse to use it for anything more than basic annotation, but basic annotation is important. If there are ways people want it to be usable beyond that, speak up. :) For basic annotation, all I think is needed is the fragment type, series number, charge state, and possibly any modification like a neutral loss or radical. The array can be an attribute or text node. We can use a grammar for each term, where each term represents an ion and terms are space delimited. The grammar might look like: <a|b|c|x|y|z><# between 1 and peptide_length>[<+|-><formula>][,(<+|-><charge>] We could make the charge part mandatory or if it was optional, assume a +1 charge (or possibly allow the charge to be based on the polarity of the source scan?). I assume there is a standard chemical formula format that can be represented compactly in ASCII text, but I don't know it. An example to show how compact it could be: fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" For basic annotation, the masses are not necessary I think. Expected mass can be recomputed if all the label metadata is complete and regular, and the observed mass is unimportant for annotation (IMO). -Matt David Creasy wrote: > Hi Phil, > > Just to be sure I've not misunderstood... from below, each fragment ion > takes approx 500 bytes. Lets assume a conservative average of 20 > fragment matches per spectrum and a modest search with 100k spectra. > Assuming that we just report fragment matches for the top match for each > spectrum, this would result in a file that is 500 x 20 x 100,000 = 1Gb. > If we reported fragment matches for the the top 10 matches for each > spectrum, this would be 10Gb. Is this reasonable and acceptable? > > David > > > > Phil Jones @ EBI wrote: > >> Hi, >> >> Regarding Issue 28 >> <http://code.google.com/p/psi-pi/issues/detail?id=28> "support >> reporting of fragment ions" >> >> As a suggestion of how this might be tackled: >> >> The latest development version of the PRIDE database includes a very >> simple mechanism >> for recording fragment ion information, illustrated below. (Please >> note - made up data.) >> >> In this example, CV terms are used to define the type of ion and >> related information >> / annotation. Note that this is even more simple that the suggestion >> made by Andy >> above - no attempt is made here to indicate which residue has been >> called for each >> fragment ion - it is just listing the ions. >> >> Also note that while the PeptideItem is referencing the mass spectrum (which is >> reported in detail in the associated mzData file), the individual >> fragment ions are >> just reporting the m/z value and not attempting to make any kind of >> hard reference to >> the spectrum. >> >> As you can see, this has been developed in collaboration with Waters, >> with output >> from the ProteinLynx Global Server. (Actual values / sequence have >> been changed). >> >> One possible change would be to make the m/z value an attribute of the >> FragmentIon element, as this value will be mandatory and required to >> relate the fragment ion to the correct peak on the mass spectrum. The >> CV used for the annotation would also need to be part of the PI CV ?? >> >> Note that in the existing model, there are other terms available, to >> allow any kind of fragment ion to be described (not just B and Y ions) >> >> In the context of analysisXML, the <FragmentIon/> elements would be >> children of a <SpectrumIdentificationResultItem/> >> >> best regards, >> >> Phil. >> >> <PeptideItem> >> <Sequence>LFQQSQWTREVFSNSCK</Sequence> >> <Start>435</Start> >> <End>460</End> >> <SpectrumReference>123</SpectrumReference> >> <FragmentIon> >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="3"/> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >> m/z" value="379.2215"/> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >> intensity" value="1382.0"/> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z >> error" value="-7.1543"/> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >> retention time error" value="0.0207"/> >> </FragmentIon> >> <FragmentIon> >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="4"/> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >> m/z" value="534.2811"/> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >> intensity" value="1242.0"/> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z >> error" value="-8.2315"/> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >> retention time error" value="0.0029"/> >> </FragmentIon> >> <FragmentIon> >> <cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" value="3"/> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >> m/z" value="394.1813"/> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >> intensity" value="1917.0"/> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z >> error" value="-14.7098"/> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >> retention time error" value="-0.0013"/> >> </FragmentIon> >> <FragmentIon> >> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" value="3"/> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >> m/z" value="367.1669"/> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >> intensity" value="345.0"/> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z >> error" value="-18.767"/> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >> retention time error" value="0.0025"/> >> </FragmentIon> >> <additional> >> <cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor mass" >> value="1971.9194"/> >> <cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor >> intensity" value="181349.0"/> >> <cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor error >> in ppm" value="0.8043"/> >> <cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor >> retention time in minutes" value="57.3537"/> >> <cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion >> mass RMS error" value="14.5969"/> >> <cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion >> retention time RMS error" value="0.0093"/> >> <cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted >> average charge state" value="2.2"/> >> <cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one match" >> value="" /> >> </additional> >> </PeptideItem> >> >> >> -- >> Phil Jones >> Senior Software Engineer >> PRIDE Project Team >> PANDA Group, EMBL-EBI >> Wellcome Trust Genome Campus >> Hinxton, Cambridge, CB10 1SD >> UK. >> >> Work phone: +44 1223 492662 (NEW NUMBER) >> Skype: philip-jones >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >> Build the coolest Linux based applications with Moblin SDK & win great prizes >> Grand prize is a trip for two to an Open Source event anywhere in the world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> > > |
From: Pierre-Alain B. <pie...@is...> - 2008-07-18 12:49:30
|
Hi Phil, to my opinion also, really too verbose. Typically a place where arrays can be used efficiently. In principle, the way I had shown with the phenyx example can probably be better encoded in single dimension or even multy dimension arrays (just like mzXML for m/z-I pairs). Just my thoughts Pierre-Alain David Creasy wrote: > Hi Phil, > > Just to be sure I've not misunderstood... from below, each fragment ion > takes approx 500 bytes. Lets assume a conservative average of 20 > fragment matches per spectrum and a modest search with 100k spectra. > Assuming that we just report fragment matches for the top match for each > spectrum, this would result in a file that is 500 x 20 x 100,000 = 1Gb. > If we reported fragment matches for the the top 10 matches for each > spectrum, this would be 10Gb. Is this reasonable and acceptable? > > David > > > > Phil Jones @ EBI wrote: > >> Hi, >> >> Regarding Issue 28 >> <http://code.google.com/p/psi-pi/issues/detail?id=28> "support >> reporting of fragment ions" >> >> As a suggestion of how this might be tackled: >> >> The latest development version of the PRIDE database includes a very >> simple mechanism >> for recording fragment ion information, illustrated below. (Please >> note - made up data.) >> >> In this example, CV terms are used to define the type of ion and >> related information >> / annotation. Note that this is even more simple that the suggestion >> made by Andy >> above - no attempt is made here to indicate which residue has been >> called for each >> fragment ion - it is just listing the ions. >> >> Also note that while the PeptideItem is referencing the mass spectrum (which is >> reported in detail in the associated mzData file), the individual >> fragment ions are >> just reporting the m/z value and not attempting to make any kind of >> hard reference to >> the spectrum. >> >> As you can see, this has been developed in collaboration with Waters, >> with output >> from the ProteinLynx Global Server. (Actual values / sequence have >> been changed). >> >> One possible change would be to make the m/z value an attribute of the >> FragmentIon element, as this value will be mandatory and required to >> relate the fragment ion to the correct peak on the mass spectrum. The >> CV used for the annotation would also need to be part of the PI CV ?? >> >> Note that in the existing model, there are other terms available, to >> allow any kind of fragment ion to be described (not just B and Y ions) >> >> In the context of analysisXML, the <FragmentIon/> elements would be >> children of a <SpectrumIdentificationResultItem/> >> >> best regards, >> >> Phil. >> >> <PeptideItem> >> <Sequence>LFQQSQWTREVFSNSCK</Sequence> >> <Start>435</Start> >> <End>460</End> >> <SpectrumReference>123</SpectrumReference> >> <FragmentIon> >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="3"/> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >> m/z" value="379.2215"/> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >> intensity" value="1382.0"/> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z >> error" value="-7.1543"/> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >> retention time error" value="0.0207"/> >> </FragmentIon> >> <FragmentIon> >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="4"/> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >> m/z" value="534.2811"/> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >> intensity" value="1242.0"/> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z >> error" value="-8.2315"/> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >> retention time error" value="0.0029"/> >> </FragmentIon> >> <FragmentIon> >> <cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" value="3"/> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >> m/z" value="394.1813"/> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >> intensity" value="1917.0"/> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z >> error" value="-14.7098"/> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >> retention time error" value="-0.0013"/> >> </FragmentIon> >> <FragmentIon> >> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" value="3"/> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >> m/z" value="367.1669"/> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >> intensity" value="345.0"/> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z >> error" value="-18.767"/> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >> retention time error" value="0.0025"/> >> </FragmentIon> >> <additional> >> <cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor mass" >> value="1971.9194"/> >> <cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor >> intensity" value="181349.0"/> >> <cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor error >> in ppm" value="0.8043"/> >> <cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor >> retention time in minutes" value="57.3537"/> >> <cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion >> mass RMS error" value="14.5969"/> >> <cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion >> retention time RMS error" value="0.0093"/> >> <cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted >> average charge state" value="2.2"/> >> <cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one match" >> value="" /> >> </additional> >> </PeptideItem> >> >> >> -- >> Phil Jones >> Senior Software Engineer >> PRIDE Project Team >> PANDA Group, EMBL-EBI >> Wellcome Trust Genome Campus >> Hinxton, Cambridge, CB10 1SD >> UK. >> >> Work phone: +44 1223 492662 (NEW NUMBER) >> Skype: philip-jones >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >> Build the coolest Linux based applications with Moblin SDK & win great prizes >> Grand prize is a trip for two to an Open Source event anywhere in the world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> > > |
From: David C. <dc...@ma...> - 2008-07-18 11:24:44
|
Hi Phil, Just to be sure I've not misunderstood... from below, each fragment ion takes approx 500 bytes. Lets assume a conservative average of 20 fragment matches per spectrum and a modest search with 100k spectra. Assuming that we just report fragment matches for the top match for each spectrum, this would result in a file that is 500 x 20 x 100,000 = 1Gb. If we reported fragment matches for the the top 10 matches for each spectrum, this would be 10Gb. Is this reasonable and acceptable? David Phil Jones @ EBI wrote: > Hi, > > Regarding Issue 28 > <http://code.google.com/p/psi-pi/issues/detail?id=28> "support > reporting of fragment ions" > > As a suggestion of how this might be tackled: > > The latest development version of the PRIDE database includes a very > simple mechanism > for recording fragment ion information, illustrated below. (Please > note - made up data.) > > In this example, CV terms are used to define the type of ion and > related information > / annotation. Note that this is even more simple that the suggestion > made by Andy > above - no attempt is made here to indicate which residue has been > called for each > fragment ion - it is just listing the ions. > > Also note that while the PeptideItem is referencing the mass spectrum (which is > reported in detail in the associated mzData file), the individual > fragment ions are > just reporting the m/z value and not attempting to make any kind of > hard reference to > the spectrum. > > As you can see, this has been developed in collaboration with Waters, > with output > from the ProteinLynx Global Server. (Actual values / sequence have > been changed). > > One possible change would be to make the m/z value an attribute of the > FragmentIon element, as this value will be mandatory and required to > relate the fragment ion to the correct peak on the mass spectrum. The > CV used for the annotation would also need to be part of the PI CV ?? > > Note that in the existing model, there are other terms available, to > allow any kind of fragment ion to be described (not just B and Y ions) > > In the context of analysisXML, the <FragmentIon/> elements would be > children of a <SpectrumIdentificationResultItem/> > > best regards, > > Phil. > > <PeptideItem> > <Sequence>LFQQSQWTREVFSNSCK</Sequence> > <Start>435</Start> > <End>460</End> > <SpectrumReference>123</SpectrumReference> > <FragmentIon> > <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="3"/> > <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > m/z" value="379.2215"/> > <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > intensity" value="1382.0"/> > <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > error" value="-7.1543"/> > <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > retention time error" value="0.0207"/> > </FragmentIon> > <FragmentIon> > <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="4"/> > <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > m/z" value="534.2811"/> > <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > intensity" value="1242.0"/> > <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > error" value="-8.2315"/> > <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > retention time error" value="0.0029"/> > </FragmentIon> > <FragmentIon> > <cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" value="3"/> > <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > m/z" value="394.1813"/> > <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > intensity" value="1917.0"/> > <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > error" value="-14.7098"/> > <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > retention time error" value="-0.0013"/> > </FragmentIon> > <FragmentIon> > <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" value="3"/> > <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > m/z" value="367.1669"/> > <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > intensity" value="345.0"/> > <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > error" value="-18.767"/> > <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > retention time error" value="0.0025"/> > </FragmentIon> > <additional> > <cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor mass" > value="1971.9194"/> > <cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor > intensity" value="181349.0"/> > <cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor error > in ppm" value="0.8043"/> > <cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor > retention time in minutes" value="57.3537"/> > <cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion > mass RMS error" value="14.5969"/> > <cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion > retention time RMS error" value="0.0093"/> > <cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted > average charge state" value="2.2"/> > <cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one match" > value="" /> > </additional> > </PeptideItem> > > > -- > Phil Jones > Senior Software Engineer > PRIDE Project Team > PANDA Group, EMBL-EBI > Wellcome Trust Genome Campus > Hinxton, Cambridge, CB10 1SD > UK. > > Work phone: +44 1223 492662 (NEW NUMBER) > Skype: philip-jones > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: Phil J. @ E. <pj...@eb...> - 2008-07-18 10:44:24
|
Hi, Regarding Issue 28 <http://code.google.com/p/psi-pi/issues/detail?id=28> "support reporting of fragment ions" As a suggestion of how this might be tackled: The latest development version of the PRIDE database includes a very simple mechanism for recording fragment ion information, illustrated below. (Please note - made up data.) In this example, CV terms are used to define the type of ion and related information / annotation. Note that this is even more simple that the suggestion made by Andy above - no attempt is made here to indicate which residue has been called for each fragment ion - it is just listing the ions. Also note that while the PeptideItem is referencing the mass spectrum (which is reported in detail in the associated mzData file), the individual fragment ions are just reporting the m/z value and not attempting to make any kind of hard reference to the spectrum. As you can see, this has been developed in collaboration with Waters, with output from the ProteinLynx Global Server. (Actual values / sequence have been changed). One possible change would be to make the m/z value an attribute of the FragmentIon element, as this value will be mandatory and required to relate the fragment ion to the correct peak on the mass spectrum. The CV used for the annotation would also need to be part of the PI CV ?? Note that in the existing model, there are other terms available, to allow any kind of fragment ion to be described (not just B and Y ions) In the context of analysisXML, the <FragmentIon/> elements would be children of a <SpectrumIdentificationResultItem/> best regards, Phil. <PeptideItem> <Sequence>LFQQSQWTREVFSNSCK</Sequence> <Start>435</Start> <End>460</End> <SpectrumReference>123</SpectrumReference> <FragmentIon> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="3"/> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion m/z" value="379.2215"/> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion intensity" value="1382.0"/> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z error" value="-7.1543"/> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion retention time error" value="0.0207"/> </FragmentIon> <FragmentIon> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="4"/> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion m/z" value="534.2811"/> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion intensity" value="1242.0"/> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z error" value="-8.2315"/> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion retention time error" value="0.0029"/> </FragmentIon> <FragmentIon> <cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" value="3"/> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion m/z" value="394.1813"/> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion intensity" value="1917.0"/> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z error" value="-14.7098"/> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion retention time error" value="-0.0013"/> </FragmentIon> <FragmentIon> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" value="3"/> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion m/z" value="367.1669"/> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion intensity" value="345.0"/> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z error" value="-18.767"/> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion retention time error" value="0.0025"/> </FragmentIon> <additional> <cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor mass" value="1971.9194"/> <cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor intensity" value="181349.0"/> <cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor error in ppm" value="0.8043"/> <cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor retention time in minutes" value="57.3537"/> <cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion mass RMS error" value="14.5969"/> <cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion retention time RMS error" value="0.0093"/> <cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted average charge state" value="2.2"/> <cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one match" value="" /> </additional> </PeptideItem> -- Phil Jones Senior Software Engineer PRIDE Project Team PANDA Group, EMBL-EBI Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD UK. Work phone: +44 1223 492662 (NEW NUMBER) Skype: philip-jones |
From: Angel P. <an...@ma...> - 2008-07-17 13:20:58
|
Had a few fires to put out this week and didn't get a chance to look into the PTM issue. Also, still putting out fires so if there is a conf call today I won't be at it. -angel |
From: Martin E. <mar...@ru...> - 2008-07-16 17:02:05
|
Hi all! To fix issue 15 I worked on the OBO file, grouped some terms and moved those now being part of the schema under the term "now_in_schema" (which can be deleted in the final CV version). In the repository: the three old Excel spreadsheets were moved to sub-folder "old". The spreadsheet "search_engine_outputs.xls" (without date) is a TODO list for some final steps (marked in red). I suggest to discuss problems with specific CV terms by creating new issues, so I closed issue 15. Concurrent editing of the obo file may be critical if terms are "Destroy"ed (because the OBO editor fills gaps in the accession numbers), so please only use "Delete" (this moves terms to 'obsolete'). Bye Martin |
From: Martin E. <mar...@ru...> - 2008-07-16 15:14:23
|
Hi everyone, There will be an AnalysisXML working group conference call tomorrow (Thursday) at: http://www.timeanddate.com/worldclock/fixedtime.html?day=17&month=7&year=2008&hour=16&min=0&sec=0&p1=136 Minutes from the last meeting: http://www.psidev.info/index.php?q=node/355 Open Issues: http://code.google.com/p/psi-pi/issues/list Agenda: 1. Review of minutes from last meeting. 2. Any feedback from MPC and OMSSA example instance documents 3. status of CV and obo file 4. issues, e.g. fragment ions, ... Dial in details: + Germany: 08001012079 + Switzerland: 0800000860 + UK: 08081095644 + USA: 1-866-314-3683 + Generic international: +44 2083222500 (UK number) access code: 297427 -- Dr. Martin Eisenacher Bioinformatik Medizinisches Proteom-Center (MPC) Ruhr-Universität Bochum Tel.: +49 / 234 / 32 - 29288 Fax: +49 / 234 / 32 - 14554 http://www.medizinisches-proteom-center.de/ |
From: Martin E. <mar...@ru...> - 2008-07-10 14:50:42
|
Okay, that seems to work for our use case (although not having peptide FDR and translation table). Ive put your explanation to the wiki Bye Martin Von: Jones, Andy [mailto:And...@li...] Gesendet: Thursday, July 10, 2008 4:17 PM An: Martin Eisenacher; psi...@li... Betreff: RE: [Psidev-pi-dev] PeptideHypothesis and PeptideEvidence Hi Martin, This alteration came about because we realised that this provided a good solution to two problems: representing reverse database hits and translated sequences. The false discovery rate might need to be reported for peptide idents only, in which case you need to know which peptide sequences came from which proteins previously this mapping was only provided in the Protein evidence. Similarly, for translated sequence searches, there may not be any Protein hypotheses, yet the mapping back to positions within the original sequence and the translation frame must be reported. Hope this makes sense, hopefully we included something in the minutes about this. Looks like Im not going to make the call today (and on holiday next week...) so can someone else look after the schema updates? Cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of Martin Eisenacher Sent: 10 July 2008 14:15 To: psi...@li... Subject: [Psidev-pi-dev] PeptideHypothesis and PeptideEvidence Dear PSI-PI workers! Im confused about the new PeptideHypothesis element and the new location of the PeptideEvidence elements. Is it for the case, where the same peptide (sequence) is part of several proteins? But then this information is only relevant if both proteins are reported as ProteinDetection results (as AnalysisXML is only for reporting final results and not to allow information extraction). Then the PeptideEvidence elements are better placed under ProteinDetectionHypothesis (as agreed to after weeks of discussion ;-) ) If there is a convincing argument I missed, please state it here and I can put it into the wiki doc. Many Thanks! Bye Martin Von: psi...@li... [mailto:psi...@li...] Im Auftrag von Jones, Andy Gesendet: Friday, June 27, 2008 5:36 PM An: psi...@li... Betreff: Re: [Psidev-pi-dev] FW: Representing Sequences Hi all, Ive updated the schema in SVN with the following main changes: - PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the call (simple mappings to proteins are done at this level) - Added DBSequence that should be used instead of Sequence (following some of the discussion below) - Created a new collection class SequenceCollection (rather than ConceptualMoleculeCollection) so that only references can be given to DBSequence and Peptide o In fact, Im not sure if this is sensible since it prevents other types of ConceptualMolecule being added later... to discuss - In FuGE on cvParam, the value attribute is no longer mandatory Ive added a simple example that validates under examples\schema_usecase_examples\working27June Feel free to mail me any changes to make on Monday, Cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of Jones, Andy Sent: 27 June 2008 16:24 To: Angel Pizarro Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences I think Angels response below might not have made it round the list yet. I tend to agree that isDecoy is redundant information and perhaps this is not the best place to encode semantic information. An alternative would be to have a parameter, say on SpectrumIdentification for cvParam = decoy_string value = Rev. This would be a more compact representation and we would not have to add what is quite a specific attribute type (isDecoy) to Sequence. From: an...@it... [mailto:an...@it...] On Behalf Of Angel Pizarro Sent: 27 June 2008 15:59 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences my 2¢ : You need to be able to extend this to all molecule types, or am I missing the point of this thread, and you mean that this would be a suclass of the conceptual molecule element? Second, and this is is tangentially related, but are decoy sequences really a problem we should be putting our effort into? Is it in our domain to encode semantic information about a sequence, and possibly relating reported sequences as part of our schema? On a personal level I could care less if "isDecoy" is an attribute or not, but the temptation then would be for folks to encode the same accession for two different sequences, effectively making the primary key of the sequence object (accession, isDecoy) Do we want to go there? On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> wrote: So how about include length as an attribute and then let all other things go in the CV (pI, mass, etc.)? From: Jones, Andy Sent: 27 June 2008 14:54 To: 'David Creasy' Subject: RE: [Psidev-pi-dev] Representing Sequences id and name are standard for all elements that inherit from FuGE identifiable this is perhaps a separate discussion as to whether the optional name attribute should be there. I agree that length may be useful is this just an integer value with no unit? Yes, I think so. I'm less sure about pI and mass since mass at least can be calculated very simply Only if you have the sequence... (we have residue masses in the file). , and pI values (in my opinion) are pretty inaccurate and fairly meaningless Scandalous! (I happen to agree, but now some people will never speak to either of us ever again). The main problem with mass and pI is that these are 'irrelevant' if the sequence is nuleic acid rather than residues. Why not just allow CV there? We can share the same CV as the PEFF format, which includes, taxonomy, sequence type, gene ID, and lots of wonderful other things? unless someone can convince me otherwise? Cheers Andy From: David Creasy [mailto:dc...@ma...] Sent: 27 June 2008 14:51 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] Representing Sequences Hi Andy, length may be useful, because some people won't want to output the actual sequence for space reasons. The other things we wanted to add before were pI and mass. Why do we want name? Is this for, say, a description line? (Also, identifier -> id?) David Jones, Andy wrote: Hi all, It was decided on the call that we would like to flag that Sequences in the ConceptualMoleculeCollection should have a Boolean attribute to capture if they are decoy sequences. At the moment we are using the FuGE:Sequence element. I don't really want to add another attribute to this (it's less problematic cutting down FuGE than adding new things), so I'm wondering if we should define our own Sequence type in AnalysisXML. This would also allow us to choose exactly the relevant attributes. At the moment, Sequence can have all of the following: <pf:Sequence isCircular="true" sequence="String" length="0" isApproximateLength="true" SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" name="String"> Several of these attributes were created to represent concepts that probably will never be required or implemented in AnalysisXML. How about the following: <DBSequence identifier = "" name = "" isDecoy = "true"> <seq>MCTMG...</seq> <pf:DatabaseReference Database_ref="" accession="Rev_IPI00013808.1"/> </DBSequence> Are any of the other attributes on Sequence actually required? I'll post a new version of the schema with other changes WRT to PeptideEvidence shortly, Cheers Andy _____ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _____ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 _____ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _____ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Martin E. <mar...@ru...> - 2008-07-10 14:26:19
|
Hi Jenny, hi Andy, hi all! > <ModName unitName="MOD::00425" unitAccession="??" value="Oxidation (M)" /> > <MassValus unitName="" unitAccession="" value="" /> > <SpecificityRule.... > I am not sure why we are not able to use cvParams or what I was supposed > to put here? It is intended that these are derived from simpler FuGE types to avoid: <ModName> <cvParam ... /> </ModName> <MassValue> <cvParam ... /> </MassValue> but indeed ModName seems to be wrongly derived from PropertyValue, because one cannot specifiy the accession attribute. Andy, should be changed to be derived from CvParamType, isn't it? Correct (hopefully): <ModName accession="MOD::00425" value="Oxidation (M)" /> <MassValue value="15.9" unitName="Da" unitAccession="MOD::005XY"/> Bye Martin > > Thanks, > > Jenny > > PS. I am afraid Julian and I will not be able to make the call later to day. > > > > ------------------------------------------------------------------------- > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! > Studies have shown that voting for your favorite open source project, > along with a healthy diet, reduces your potential for chronic lameness > and boredom. Vote Now at http://www.sourceforge.net/community/cca08 > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev |
From: Jones, A. <And...@li...> - 2008-07-10 14:17:06
|
Hi Martin, This alteration came about because we realised that this provided a good solution to two problems: representing reverse database hits and translated sequences. The false discovery rate might need to be reported for peptide idents only, in which case you need to know which peptide sequences came from which proteins – previously this mapping was only provided in the Protein evidence. Similarly, for translated sequence searches, there may not be any Protein hypotheses, yet the mapping back to positions within the original sequence and the translation frame must be reported. Hope this makes sense, hopefully we included something in the minutes about this. Looks like I’m not going to make the call today (and on holiday next week...) so can someone else look after the schema updates? Cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of Martin Eisenacher Sent: 10 July 2008 14:15 To: psi...@li... Subject: [Psidev-pi-dev] PeptideHypothesis and PeptideEvidence Dear PSI-PI workers! I’m confused about the new PeptideHypothesis element and the new location of the PeptideEvidence elements. Is it for the case, where the same peptide (sequence) is part of several proteins? But then this information is only relevant if both proteins are reported as ProteinDetection results (as AnalysisXML is only for reporting “final” results and not to allow information extraction). Then the PeptideEvidence elements are better placed under ProteinDetectionHypothesis (as agreed to after weeks of discussion ;-) ) If there is a convincing argument I missed, please state it here and I can put it into the wiki doc. Many Thanks! Bye Martin Von: psi...@li... [mailto:psi...@li...] Im Auftrag von Jones, Andy Gesendet: Friday, June 27, 2008 5:36 PM An: psi...@li... Betreff: Re: [Psidev-pi-dev] FW: Representing Sequences Hi all, I’ve updated the schema in SVN with the following main changes: - PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the call (simple mappings to proteins are done at this level) - Added DBSequence that should be used instead of Sequence (following some of the discussion below) - Created a new collection class SequenceCollection (rather than ConceptualMoleculeCollection) so that only references can be given to DBSequence and Peptide o In fact, I’m not sure if this is sensible since it prevents other types of ConceptualMolecule being added later... to discuss - In FuGE on cvParam, the value attribute is no longer mandatory I’ve added a simple example that validates under examples\schema_usecase_examples\working27June Feel free to mail me any changes to make on Monday, Cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of Jones, Andy Sent: 27 June 2008 16:24 To: Angel Pizarro Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences I think Angel’s response below might not have made it round the list yet. I tend to agree that isDecoy is redundant information and perhaps this is not the best place to encode semantic information. An alternative would be to have a parameter, say on SpectrumIdentification for cvParam = “decoy_string” value = “Rev”. This would be a more compact representation and we would not have to add what is quite a specific attribute type (isDecoy) to Sequence. From: an...@it... [mailto:an...@it...] On Behalf Of Angel Pizarro Sent: 27 June 2008 15:59 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences my 2¢ : You need to be able to extend this to all molecule types, or am I missing the point of this thread, and you mean that this would be a suclass of the conceptual molecule element? Second, and this is is tangentially related, but are decoy sequences really a problem we should be putting our effort into? Is it in our domain to encode semantic information about a sequence, and possibly relating reported sequences as part of our schema? On a personal level I could care less if "isDecoy" is an attribute or not, but the temptation then would be for folks to encode the same accession for two different sequences, effectively making the primary key of the sequence object (accession, isDecoy) Do we want to go there? On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> wrote: So how about include length as an attribute and then let all other things go in the CV (pI, mass, etc.)? From: Jones, Andy Sent: 27 June 2008 14:54 To: 'David Creasy' Subject: RE: [Psidev-pi-dev] Representing Sequences id and name are standard for all elements that inherit from FuGE identifiable – this is perhaps a separate discussion as to whether the optional name attribute should be there. I agree that length may be useful – is this just an integer value with no unit? Yes, I think so. I'm less sure about pI and mass since mass at least can be calculated very simply Only if you have the sequence... (we have residue masses in the file). , and pI values (in my opinion) are pretty inaccurate and fairly meaningless Scandalous! (I happen to agree, but now some people will never speak to either of us ever again). The main problem with mass and pI is that these are 'irrelevant' if the sequence is nuleic acid rather than residues. Why not just allow CV there? We can share the same CV as the PEFF format, which includes, taxonomy, sequence type, gene ID, and lots of wonderful other things? – unless someone can convince me otherwise? Cheers Andy From: David Creasy [mailto:dc...@ma...] Sent: 27 June 2008 14:51 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] Representing Sequences Hi Andy, length may be useful, because some people won't want to output the actual sequence for space reasons. The other things we wanted to add before were pI and mass. Why do we want name? Is this for, say, a description line? (Also, identifier -> id?) David Jones, Andy wrote: Hi all, It was decided on the call that we would like to flag that Sequences in the ConceptualMoleculeCollection should have a Boolean attribute to capture if they are decoy sequences. At the moment we are using the FuGE:Sequence element. I don't really want to add another attribute to this (it's less problematic cutting down FuGE than adding new things), so I'm wondering if we should define our own Sequence type in AnalysisXML. This would also allow us to choose exactly the relevant attributes. At the moment, Sequence can have all of the following: <pf:Sequence isCircular="true" sequence="String" length="0" isApproximateLength="true" SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" name="String"> Several of these attributes were created to represent concepts that probably will never be required or implemented in AnalysisXML. How about the following: <DBSequence identifier = "" name = "" isDecoy = "true"> <seq>MCTMG...</seq> <pf:DatabaseReference Database_ref="" accession="Rev_IPI00013808.1"/> </DBSequence> Are any of the other attributes on Sequence actually required? I'll post a new version of the schema with other changes WRT to PeptideEvidence shortly, Cheers Andy ________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Martin E. <mar...@ru...> - 2008-07-10 13:28:08
|
In the MPC use case I stated the decoy regexp as a runtime parameter of the ProteinDetection. <pf:cvParam accession="PSI-PI:000653" name="decoy accession regexp" cvRef="PSI-PI" value="^SHD"/> and the false discovery rate estimation in the protein results: <pf:userParam name=Rank" value="3"/> <pf:userParam name="local FDR in sorted list above" value="33.33" unitName="percent"/> I find that elegant, because a search engine normally doesnt need to know, whether it is a reverse database. At the time of interpreting the results that gets relevant. Von: psi...@li... [mailto:psi...@li...] Im Auftrag von Jones, Andy Gesendet: Friday, June 27, 2008 5:24 PM An: Angel Pizarro Cc: psi...@li... Betreff: Re: [Psidev-pi-dev] FW: Representing Sequences I think Angels response below might not have made it round the list yet. I tend to agree that isDecoy is redundant information and perhaps this is not the best place to encode semantic information. An alternative would be to have a parameter, say on SpectrumIdentification for cvParam = decoy_string value = Rev. This would be a more compact representation and we would not have to add what is quite a specific attribute type (isDecoy) to Sequence. From: an...@it... [mailto:an...@it...] On Behalf Of Angel Pizarro Sent: 27 June 2008 15:59 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences my 2¢ : You need to be able to extend this to all molecule types, or am I missing the point of this thread, and you mean that this would be a suclass of the conceptual molecule element? Second, and this is is tangentially related, but are decoy sequences really a problem we should be putting our effort into? Is it in our domain to encode semantic information about a sequence, and possibly relating reported sequences as part of our schema? On a personal level I could care less if "isDecoy" is an attribute or not, but the temptation then would be for folks to encode the same accession for two different sequences, effectively making the primary key of the sequence object (accession, isDecoy) Do we want to go there? On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> wrote: So how about include length as an attribute and then let all other things go in the CV (pI, mass, etc.)? From: Jones, Andy Sent: 27 June 2008 14:54 To: 'David Creasy' Subject: RE: [Psidev-pi-dev] Representing Sequences id and name are standard for all elements that inherit from FuGE identifiable this is perhaps a separate discussion as to whether the optional name attribute should be there. I agree that length may be useful is this just an integer value with no unit? Yes, I think so. I'm less sure about pI and mass since mass at least can be calculated very simply Only if you have the sequence... (we have residue masses in the file). , and pI values (in my opinion) are pretty inaccurate and fairly meaningless Scandalous! (I happen to agree, but now some people will never speak to either of us ever again). The main problem with mass and pI is that these are 'irrelevant' if the sequence is nuleic acid rather than residues. Why not just allow CV there? We can share the same CV as the PEFF format, which includes, taxonomy, sequence type, gene ID, and lots of wonderful other things? unless someone can convince me otherwise? Cheers Andy From: David Creasy [mailto:dc...@ma...] Sent: 27 June 2008 14:51 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] Representing Sequences Hi Andy, length may be useful, because some people won't want to output the actual sequence for space reasons. The other things we wanted to add before were pI and mass. Why do we want name? Is this for, say, a description line? (Also, identifier -> id?) David Jones, Andy wrote: Hi all, It was decided on the call that we would like to flag that Sequences in the ConceptualMoleculeCollection should have a Boolean attribute to capture if they are decoy sequences. At the moment we are using the FuGE:Sequence element. I don't really want to add another attribute to this (it's less problematic cutting down FuGE than adding new things), so I'm wondering if we should define our own Sequence type in AnalysisXML. This would also allow us to choose exactly the relevant attributes. At the moment, Sequence can have all of the following: <pf:Sequence isCircular="true" sequence="String" length="0" isApproximateLength="true" SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" name="String"> Several of these attributes were created to represent concepts that probably will never be required or implemented in AnalysisXML. How about the following: <DBSequence identifier = "" name = "" isDecoy = "true"> <seq>MCTMG...</seq> <pf:DatabaseReference Database_ref="" accession="Rev_IPI00013808.1"/> </DBSequence> Are any of the other attributes on Sequence actually required? I'll post a new version of the schema with other changes WRT to PeptideEvidence shortly, Cheers Andy _____ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _____ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 _____ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _____ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Martin E. <mar...@ru...> - 2008-07-10 13:14:38
|
Dear PSI-PI workers! Im confused about the new PeptideHypothesis element and the new location of the PeptideEvidence elements. Is it for the case, where the same peptide (sequence) is part of several proteins? But then this information is only relevant if both proteins are reported as ProteinDetection results (as AnalysisXML is only for reporting final results and not to allow information extraction). Then the PeptideEvidence elements are better placed under ProteinDetectionHypothesis (as agreed to after weeks of discussion ;-) ) If there is a convincing argument I missed, please state it here and I can put it into the wiki doc. Many Thanks! Bye Martin Von: psi...@li... [mailto:psi...@li...] Im Auftrag von Jones, Andy Gesendet: Friday, June 27, 2008 5:36 PM An: psi...@li... Betreff: Re: [Psidev-pi-dev] FW: Representing Sequences Hi all, Ive updated the schema in SVN with the following main changes: - PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the call (simple mappings to proteins are done at this level) - Added DBSequence that should be used instead of Sequence (following some of the discussion below) - Created a new collection class SequenceCollection (rather than ConceptualMoleculeCollection) so that only references can be given to DBSequence and Peptide o In fact, Im not sure if this is sensible since it prevents other types of ConceptualMolecule being added later... to discuss - In FuGE on cvParam, the value attribute is no longer mandatory Ive added a simple example that validates under examples\schema_usecase_examples\working27June Feel free to mail me any changes to make on Monday, Cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of Jones, Andy Sent: 27 June 2008 16:24 To: Angel Pizarro Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences I think Angels response below might not have made it round the list yet. I tend to agree that isDecoy is redundant information and perhaps this is not the best place to encode semantic information. An alternative would be to have a parameter, say on SpectrumIdentification for cvParam = decoy_string value = Rev. This would be a more compact representation and we would not have to add what is quite a specific attribute type (isDecoy) to Sequence. From: an...@it... [mailto:an...@it...] On Behalf Of Angel Pizarro Sent: 27 June 2008 15:59 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences my 2¢ : You need to be able to extend this to all molecule types, or am I missing the point of this thread, and you mean that this would be a suclass of the conceptual molecule element? Second, and this is is tangentially related, but are decoy sequences really a problem we should be putting our effort into? Is it in our domain to encode semantic information about a sequence, and possibly relating reported sequences as part of our schema? On a personal level I could care less if "isDecoy" is an attribute or not, but the temptation then would be for folks to encode the same accession for two different sequences, effectively making the primary key of the sequence object (accession, isDecoy) Do we want to go there? On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> wrote: So how about include length as an attribute and then let all other things go in the CV (pI, mass, etc.)? From: Jones, Andy Sent: 27 June 2008 14:54 To: 'David Creasy' Subject: RE: [Psidev-pi-dev] Representing Sequences id and name are standard for all elements that inherit from FuGE identifiable this is perhaps a separate discussion as to whether the optional name attribute should be there. I agree that length may be useful is this just an integer value with no unit? Yes, I think so. I'm less sure about pI and mass since mass at least can be calculated very simply Only if you have the sequence... (we have residue masses in the file). , and pI values (in my opinion) are pretty inaccurate and fairly meaningless Scandalous! (I happen to agree, but now some people will never speak to either of us ever again). The main problem with mass and pI is that these are 'irrelevant' if the sequence is nuleic acid rather than residues. Why not just allow CV there? We can share the same CV as the PEFF format, which includes, taxonomy, sequence type, gene ID, and lots of wonderful other things? unless someone can convince me otherwise? Cheers Andy From: David Creasy [mailto:dc...@ma...] Sent: 27 June 2008 14:51 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] Representing Sequences Hi Andy, length may be useful, because some people won't want to output the actual sequence for space reasons. The other things we wanted to add before were pI and mass. Why do we want name? Is this for, say, a description line? (Also, identifier -> id?) David Jones, Andy wrote: Hi all, It was decided on the call that we would like to flag that Sequences in the ConceptualMoleculeCollection should have a Boolean attribute to capture if they are decoy sequences. At the moment we are using the FuGE:Sequence element. I don't really want to add another attribute to this (it's less problematic cutting down FuGE than adding new things), so I'm wondering if we should define our own Sequence type in AnalysisXML. This would also allow us to choose exactly the relevant attributes. At the moment, Sequence can have all of the following: <pf:Sequence isCircular="true" sequence="String" length="0" isApproximateLength="true" SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" name="String"> Several of these attributes were created to represent concepts that probably will never be required or implemented in AnalysisXML. How about the following: <DBSequence identifier = "" name = "" isDecoy = "true"> <seq>MCTMG...</seq> <pf:DatabaseReference Database_ref="" accession="Rev_IPI00013808.1"/> </DBSequence> Are any of the other attributes on Sequence actually required? I'll post a new version of the schema with other changes WRT to PeptideEvidence shortly, Cheers Andy _____ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _____ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 _____ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _____ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Jennifer S. <jen...@ma...> - 2008-07-10 09:26:53
|
Hi, I have added two OMSSA usecase examples to the svn repository (http://code.google.com/p/psi-pi/source/browse/trunk/examples/schema_usecase_examples/working9July/). I just have a small comment concerning modifications, I used cvParams when assigning a particular modification to a peptide however when adding the modifications that were used in the search I am not able to use cvParams instead it has this structure <ModificationParams> <SearchModification fixedMod="false"> <ModName unitName="MOD::00425" unitAccession="??" value="Oxidation (M)" /> <MassValus unitName="" unitAccession="" value="" /> <SpecificityRule.... I am not sure why we are not able to use cvParams or what I was supposed to put here? Thanks, Jenny PS. I am afraid Julian and I will not be able to make the call later to day. |
From: David C. <dc...@ma...> - 2008-07-09 18:36:38
|
Hi everyone, There will be an AnalysisXML working group conference call tomorrow (Thursday) at: http://www.timeanddate.com/worldclock/fixedtime.html?day=10&month=7&year=2008&hour=16&min=0&sec=0&p1=136 Minutes from the last meeting: http://psidev.info/index.php?q=node/353 Latest Mascot instance document at: http://code.google.com/p/psi-pi/source/browse/trunk/examples/schema_usecase_examples/working9July/F001350.xml Agenda: 1. Review of minutes from last meeting. 2. Any feedback from review of MPC example instance document 3. Example OMSSA document 4. Plan for getting faster progress... Dial in details: + Germany: 08001012079 + Switzerland: 0800000860 + UK: 08081095644 + USA: 1-866-314-3683 + Generic international: +44 2083222500 (UK number) access code: 297427 -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: David C. <dc...@ma...> - 2008-07-09 17:11:19
|
Hi Andy, Thanks - I've tested it and it all seems to work fine. David Jones, Andy wrote: > Hi Marc, > > Thanks for this, I've updated a new schema in the SVN, > > I also updated the KeyRefs for id rather than identifier (although I only did a global replace on identifier --> id so I didn't check that this works properly) > > Cheers > Andy > >> -----Original Message----- >> From: Marc Sturm [mailto:st...@in...] >> Sent: 08 July 2008 13:53 >> To: Jones, Andy >> Subject: Re: [Psidev-pi-dev] FW: pre and post >> >> Hi Andy, >> >> this should do: >> >> <?xml version="1.0" encoding="UTF-8"?> >> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" >> elementFormDefault="qualified" attributeFormDefault="unqualified"> >> <xs:element name="Dummy"> >> <xs:complexType> >> <xs:attribute name="pre" use="required"> >> <xs:simpleType> >> <xs:restriction base="xs:string"> >> <xs:pattern >> value="[ABCDEFGHIJKLMNOPQRSTUVWXYZ?-]{1}"/> >> </xs:restriction> >> </xs:simpleType> >> </xs:attribute> >> </xs:complexType> >> </xs:element> >> </xs:schema> >> >> Best, >> Marc > > ------------------------------------------------------------------------- > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! > Studies have shown that voting for your favorite open source project, > along with a healthy diet, reduces your potential for chronic lameness > and boredom. Vote Now at http://www.sourceforge.net/community/cca08 > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |