You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
(5) |
Aug
(4) |
Sep
(4) |
Oct
(10) |
Nov
(1) |
Dec
(1) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2008 |
Jan
|
Feb
(2) |
Mar
(2) |
Apr
(8) |
May
(40) |
Jun
(30) |
Jul
(61) |
Aug
(21) |
Sep
(12) |
Oct
(56) |
Nov
(99) |
Dec
(83) |
2009 |
Jan
(3) |
Feb
(9) |
Mar
(1) |
Apr
(5) |
May
(88) |
Jun
(43) |
Jul
(60) |
Aug
(54) |
Sep
(4) |
Oct
(18) |
Nov
(9) |
Dec
(5) |
2010 |
Jan
|
Feb
(3) |
Mar
(1) |
Apr
(8) |
May
(10) |
Jun
(8) |
Jul
(10) |
Aug
(18) |
Sep
(11) |
Oct
(19) |
Nov
(14) |
Dec
(26) |
2011 |
Jan
(27) |
Feb
(38) |
Mar
(50) |
Apr
(128) |
May
(54) |
Jun
(116) |
Jul
(79) |
Aug
(163) |
Sep
(21) |
Oct
(14) |
Nov
(19) |
Dec
(9) |
2012 |
Jan
(7) |
Feb
(34) |
Mar
(34) |
Apr
(50) |
May
(70) |
Jun
(23) |
Jul
(8) |
Aug
(24) |
Sep
(35) |
Oct
(40) |
Nov
(276) |
Dec
(34) |
2013 |
Jan
(25) |
Feb
(23) |
Mar
(12) |
Apr
(59) |
May
(31) |
Jun
(11) |
Jul
(21) |
Aug
(7) |
Sep
(18) |
Oct
(11) |
Nov
(12) |
Dec
(18) |
2014 |
Jan
(37) |
Feb
(22) |
Mar
(9) |
Apr
(10) |
May
(38) |
Jun
(20) |
Jul
(15) |
Aug
(4) |
Sep
(4) |
Oct
(3) |
Nov
(8) |
Dec
(5) |
2015 |
Jan
(13) |
Feb
(34) |
Mar
(27) |
Apr
(5) |
May
(12) |
Jun
(10) |
Jul
(12) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
|
Dec
(6) |
2016 |
Jan
(1) |
Feb
(1) |
Mar
(17) |
Apr
(139) |
May
(120) |
Jun
(90) |
Jul
(10) |
Aug
|
Sep
|
Oct
(11) |
Nov
(6) |
Dec
(2) |
2017 |
Jan
(24) |
Feb
(8) |
Mar
(7) |
Apr
(2) |
May
(5) |
Jun
(11) |
Jul
(5) |
Aug
(9) |
Sep
(6) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
2018 |
Jan
(7) |
Feb
|
Mar
(4) |
Apr
(6) |
May
(10) |
Jun
(6) |
Jul
(7) |
Aug
|
Sep
(7) |
Oct
(5) |
Nov
(3) |
Dec
(3) |
2019 |
Jan
(3) |
Feb
|
Mar
(4) |
Apr
(3) |
May
(2) |
Jun
(6) |
Jul
(3) |
Aug
(2) |
Sep
|
Oct
(2) |
Nov
(12) |
Dec
(1) |
2020 |
Jan
(3) |
Feb
(1) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2021 |
Jan
|
Feb
|
Mar
|
Apr
(3) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Martin E. <mar...@ru...> - 2008-05-15 11:11:45
|
Hi! Yes, I agree: reasonable! BTW: For human eyes it looks well arranged having ALL search engine parameters as CV params one after another, e.g. PRIDE XML: <StepDescription> <cvParam cvLabel="PRIDE" accession="PRIDE:0000071" name="Search Engine setting" /> <cvParam cvLabel="PRIDE" accession="PRIDE:0000046" name="Mascot" /> <cvParam cvLabel="PRIDE" accession="PRIDE:0000073" name="Variable modification setting" value="Oxidation (M)" /> <cvParam cvLabel="PRIDE" accession="PRIDE:0000073" name="Variable modification setting" value="PSI-MOD;MOD:00412" /> <cvParam cvLabel="PRIDE" accession="PRIDE:0000073" name="Variable modification setting" value="PSI-MOD;MOD:00768" /> <cvParam cvLabel="PRIDE" accession="PRIDE:0000073" name="Variable modification setting" value="PSI-MOD;MOD:00935" /> <cvParam cvLabel="PRIDE" accession="PRIDE:0000074" name="Maximum Missed Cleavages Setting" value="1" /> <cvParam cvLabel="PRIDE" accession="PRIDE:0000076" name="Mass value type setting monoisotopic" /> <cvParam cvLabel="PRIDE" accession="PRIDE:0000078" name="Peptide mass tolerance setting" value="50.0" /> <cvParam cvLabel="PRIDE" accession="PRIDE:0000083" name="mass error type setting ppm" /> <cvParam cvLabel="PRIDE" accession="PRIDE:0000088" name="Protonated setting MH+" /> <cvParam cvLabel="PRIDE" accession="PRIDE:0000158" name="MS search" /> <cvParam cvLabel="PRIDE" accession="PRIDE:0000160" name="Enzyme" value="Trypsin (*KR)" /> <cvParam cvLabel="PRIDE" accession="PRIDE:0000161" name="Fragment mass tolerance setting" value="50.0" /> <cvParam cvLabel="PRIDE" accession="PRIDE:0000083" name="mass error type setting ppm" /> <cvParam cvLabel="PRIDE" accession="PRIDE:0000162" name="Allowed missed cleavages" value="1" /> <cvParam cvLabel="PRIDE" accession="PRIDE:0000163" name="Instrument type" value="MALDI-TOF-TOF" /> </StepDescription> If we in the future want to force some parameters we could create mandatory elements with CVparam sub elements like: <Tolerance> <!- minOccur=1 maxOccur=1 --> <pf:cvParam accession="PSI-PI:0076D4" name="parent tol" cvRef="PSI-PI" value="1.0" unitAccession="uA:1" unitName="Da"/> </Tolerance> That merges the advantages of forcing elements with that of the FuGElight CVparams (with units). Bye Martin Von: psi...@li...urceforge. net [mailto:psi...@li...ur ceforge.net] Im Auftrag von Pierre-Alain Binz Gesendet: Thursday, May 15, 2008 10:36 AM An: Sean L Seymour Cc: psi...@li... Betreff: Re: [Psidev-pi-dev] PSI XML <-> Ontology Mapping File Hi all, I support some pragmatism to help releasing smoothly. I mean, it's fine to have a reasonable number of attributes, but let's try not to be spending too much time in arguing. I'd propose that if one tool has a definition of a term that differ from others, we can say go for CV for that term. Version 1 need to grab feedbacks from real usecases and might suffer from some weaknesses (loosely constrained limitations). Experience with mzData (released, implemented, but then strongly modified in mzML) has shown that a proof of principle version does not hurt so much. Go for the update of the excel list and fill it with values example to make sure we have same/different interpretation of the terms. Reasonable? Pierre-Alain Sean L Seymour wrote: Hi all, Which is the safe approach? CV right? You can always more solidly integrate into the schema when experience shows it's useful (version 2). If there's any question, I would just default to whatever is the conservative/safe thing to do. For example, even on the tolerance point, we don't have tolerance settings and internally don't use them the same way everyone else does so that can't be required by schema. I'm just trying to make the point that I don't think it hurts in the first version to put an awful lot of stuff in as CV, does it? Sean David Creasy <mailto:dc...@ma...> <dc...@ma...> Sent by: psi...@li...urceforge. net 05/14/2008 07:42 AM To psi...@li... cc Subject Re: [Psidev-pi-dev] PSI XML <-> Ontology Mapping File Thanks, Luisa that's quite clear. (I've now sent this message to the list for general input/thoughts) >>From a previous message, Luisa said: > In general I think number of those terms should be XML attributes in > the schema (like 'sample id', or 'date / time search performed' or > 'modification position', whereas CV are fine and should be the > reference for descriptif information like 'database filtering' or > 'search engines scores'. For the record, one thing that we agreed about in Lyon was that if 2 or more search engines supported a particular parameter, then we'd like this as a 'node' in the schema. So, for example with an MS-MS tolerance should this be CV? This tolerance is something that all search engines require (or maybe one day they estimate it, but we still want to know what value is used). So it's something that is 'required' and is not 'descriptive'. But a tolerance is no use without units, so this can't be a simple attribute. Similarly, in Luisa's example above: date / time search performed possibly needs a time zone, which would also be CV? How should we model this? Suggestions from anyone please - or just tell me what's already been decided elsewhere and we'll follow that. Another example: mass values can be calculated as 'monoisotopic' or 'average'. This again is required for all search engines and again seems like it could be cv to me. Or should we use and xsd:enumeration for something like this where there are only 2 possibilities? David Luisa Montecchi wrote: > Hi Phil, > > through the mapping you can limit the unit one can use in a given > *schema location* (Xpath), whereas you will need to create a so called > 'object rule' in the validator to verify that for each *CVparam term*, > only an appropriate subset of units are associated with it. > > In other words, the mapping allows the discrimination of the various > CVparam element in the schema by their Xpath and permits to restrict the > subset CVparam terms and/or unit terms that can be used in each location. > > Dependencies between CVparam terms and unit terms cannot be encoded in > the mapping, but can be checked via the validator tool, > > I hope this is clear, > > Best regards, > > > Luisa > > > > > Phil Jones @ EBI wrote: >> Hi Luisa, >> >> Can you confirm for me please - does the mapping file include the >> ability to >> mandate the presence of particular units for specific CV term usage in an >> XML file? (I am thinking now about mzML files, that include the unit >> ontology accession and term in the CvParam entry - we wish to use the >> same >> XML structure in analysisXML). >> >> Best regards, >> >> Phil. >> -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ---------------------------------------- --------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012000007 0mrt/direct/01/ ________________________________________ _______ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/list info/psidev-pi-dev _____ ---------------------------------------- --------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012000007 0mrt/direct/01/ _____ ________________________________________ _______ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/list info/psidev-pi-dev |
From: Pierre-Alain B. <pie...@is...> - 2008-05-15 08:43:01
|
Hi all, I support some pragmatism to help releasing smoothly. I mean, it's fine to have a reasonable number of attributes, but let's try not to be spending too much time in arguing. I'd propose that if one tool has a definition of a term that differ from others, we can say go for CV for that term. Version 1 need to grab feedbacks from real usecases and might suffer from some weaknesses (loosely constrained limitations). Experience with mzData (released, implemented, but then strongly modified in mzML) has shown that a proof of principle version does not hurt so much. Go for the update of the excel list and fill it with values example to make sure we have same/different interpretation of the terms. Reasonable? Pierre-Alain Sean L Seymour wrote: > > Hi all, > > Which is the safe approach? CV right? You can always more solidly > integrate into the schema when experience shows it's useful (version 2). > > If there's any question, I would just default to whatever is the > conservative/safe thing to do. For example, even on the tolerance > point, we don't have tolerance settings and internally don't use them > the same way everyone else does so that can't be required by schema. > > I'm just trying to make the point that I don't think it hurts in the > first version to put an awful lot of stuff in as CV, does it? > > Sean > > > > > *David Creasy <dc...@ma...>* > Sent by: psi...@li... > > 05/14/2008 07:42 AM > > > To > psi...@li... > cc > > Subject > Re: [Psidev-pi-dev] PSI XML <-> Ontology Mapping File > > > > > > > > > > Thanks, Luisa that's quite clear. > (I've now sent this message to the list for general input/thoughts) > > From a previous message, Luisa said: > > In general I think number of those terms should be XML attributes in > > the schema (like 'sample id', or 'date / time search performed' or > > 'modification position', whereas CV are fine and should be the > > reference for descriptif information like 'database filtering' or > > 'search engines scores'. > > For the record, one thing that we agreed about in Lyon was that if 2 or > more search engines supported a particular parameter, then we'd like > this as a 'node' in the schema. > > So, for example with an MS-MS tolerance should this be CV? This > tolerance is something that all search engines require (or maybe one day > they estimate it, but we still want to know what value is used). So it's > something that is 'required' and is not 'descriptive'. > But a tolerance is no use without units, so this can't be a simple > attribute. > Similarly, in Luisa's example above: date / time search performed > possibly needs a time zone, which would also be CV? > > How should we model this? Suggestions from anyone please - or just tell > me what's already been decided elsewhere and we'll follow that. > > Another example: mass values can be calculated as 'monoisotopic' or > 'average'. This again is required for all search engines and again seems > like it could be cv to me. Or should we use and xsd:enumeration for > something like this where there are only 2 possibilities? > > > David > > > Luisa Montecchi wrote: > > Hi Phil, > > > > through the mapping you can limit the unit one can use in a given > > *schema location* (Xpath), whereas you will need to create a so called > > 'object rule' in the validator to verify that for each *CVparam term*, > > only an appropriate subset of units are associated with it. > > > > In other words, the mapping allows the discrimination of the various > > CVparam element in the schema by their Xpath and permits to restrict > the > > subset CVparam terms and/or unit terms that can be used in each > location. > > > > Dependencies between CVparam terms and unit terms cannot be encoded in > > the mapping, but can be checked via the validator tool, > > > > I hope this is clear, > > > > Best regards, > > > > > > Luisa > > > > > > > > > > Phil Jones @ EBI wrote: > >> Hi Luisa, > >> > >> Can you confirm for me please - does the mapping file include the > >> ability to > >> mandate the presence of particular units for specific CV term usage > in an > >> XML file? (I am thinking now about mzML files, that include the unit > >> ontology accession and term in the CvParam entry - we wish to use the > >> same > >> XML structure in analysisXML). > >> > >> Best regards, > >> > >> Phil. > >> > > -- > David Creasy > Matrix Science > 64 Baker Street > London W1U 7GB, UK > Tel: +44 (0)20 7486 1050 > Fax: +44 (0)20 7224 1344 > > dc...@ma... > http://www.matrixscience.com > > Matrix Science Ltd. is registered in England and Wales > Company number 3533898 > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > ------------------------------------------------------------------------ > > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > |
From: Sean L S. <Sey...@ap...> - 2008-05-14 16:15:12
|
Hi all, Which is the safe approach? CV right? You can always more solidly integrate into the schema when experience shows it's useful (version 2). If there's any question, I would just default to whatever is the conservative/safe thing to do. For example, even on the tolerance point, we don't have tolerance settings and internally don't use them the same way everyone else does so that can't be required by schema. I'm just trying to make the point that I don't think it hurts in the first version to put an awful lot of stuff in as CV, does it? Sean David Creasy <dc...@ma...> Sent by: psi...@li... 05/14/2008 07:42 AM To psi...@li... cc Subject Re: [Psidev-pi-dev] PSI XML <-> Ontology Mapping File Thanks, Luisa that's quite clear. (I've now sent this message to the list for general input/thoughts) From a previous message, Luisa said: > In general I think number of those terms should be XML attributes in > the schema (like 'sample id', or 'date / time search performed' or > 'modification position', whereas CV are fine and should be the > reference for descriptif information like 'database filtering' or > 'search engines scores'. For the record, one thing that we agreed about in Lyon was that if 2 or more search engines supported a particular parameter, then we'd like this as a 'node' in the schema. So, for example with an MS-MS tolerance should this be CV? This tolerance is something that all search engines require (or maybe one day they estimate it, but we still want to know what value is used). So it's something that is 'required' and is not 'descriptive'. But a tolerance is no use without units, so this can't be a simple attribute. Similarly, in Luisa's example above: date / time search performed possibly needs a time zone, which would also be CV? How should we model this? Suggestions from anyone please - or just tell me what's already been decided elsewhere and we'll follow that. Another example: mass values can be calculated as 'monoisotopic' or 'average'. This again is required for all search engines and again seems like it could be cv to me. Or should we use and xsd:enumeration for something like this where there are only 2 possibilities? David Luisa Montecchi wrote: > Hi Phil, > > through the mapping you can limit the unit one can use in a given > *schema location* (Xpath), whereas you will need to create a so called > 'object rule' in the validator to verify that for each *CVparam term*, > only an appropriate subset of units are associated with it. > > In other words, the mapping allows the discrimination of the various > CVparam element in the schema by their Xpath and permits to restrict the > subset CVparam terms and/or unit terms that can be used in each location. > > Dependencies between CVparam terms and unit terms cannot be encoded in > the mapping, but can be checked via the validator tool, > > I hope this is clear, > > Best regards, > > > Luisa > > > > > Phil Jones @ EBI wrote: >> Hi Luisa, >> >> Can you confirm for me please - does the mapping file include the >> ability to >> mandate the presence of particular units for specific CV term usage in an >> XML file? (I am thinking now about mzML files, that include the unit >> ontology accession and term in the CvParam entry - we wish to use the >> same >> XML structure in analysisXML). >> >> Best regards, >> >> Phil. >> -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev |
From: David C. <dc...@ma...> - 2008-05-14 16:09:53
|
Hi everyone, There will be an AnalysisXML working group conference call tomorrow at: http://www.timeanddate.com/worldclock/fixedtime.html?day=15&month=5&year=2008&hour=16&min=0&sec=0&p1=136 Agenda: 1. Any volunteers from people going to the HUPO Congress in Amsterdam to give a short presentation on AnalysisXML at the PSI plenary? 2. Review of the Excel spreadsheet to decide what needs to be attributes and what needs to be CV. The spreadsheet and .obo file are available here: http://code.google.com/p/psi-pi/source/browse/trunk/cv/search_engine_outputs_2007Apr24.xls http://code.google.com/p/psi-pi/source/browse/trunk/cv/psi-pi.obo 3. Working through the current issues list at: http://code.google.com/p/psi-pi/issues/list + Germany: 08001012079 + Switzerland: 0800000860 + UK: 08081095644 + USA: 1-866-314-3683 + Generic international: +44 2083222500 (UK number) access code: 297427 -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: David C. <dc...@ma...> - 2008-05-14 14:41:40
|
Thanks, Luisa that's quite clear. (I've now sent this message to the list for general input/thoughts) From a previous message, Luisa said: > In general I think number of those terms should be XML attributes in > the schema (like 'sample id', or 'date / time search performed' or > 'modification position', whereas CV are fine and should be the > reference for descriptif information like 'database filtering' or > 'search engines scores'. For the record, one thing that we agreed about in Lyon was that if 2 or more search engines supported a particular parameter, then we'd like this as a 'node' in the schema. So, for example with an MS-MS tolerance should this be CV? This tolerance is something that all search engines require (or maybe one day they estimate it, but we still want to know what value is used). So it's something that is 'required' and is not 'descriptive'. But a tolerance is no use without units, so this can't be a simple attribute. Similarly, in Luisa's example above: date / time search performed possibly needs a time zone, which would also be CV? How should we model this? Suggestions from anyone please - or just tell me what's already been decided elsewhere and we'll follow that. Another example: mass values can be calculated as 'monoisotopic' or 'average'. This again is required for all search engines and again seems like it could be cv to me. Or should we use and xsd:enumeration for something like this where there are only 2 possibilities? David Luisa Montecchi wrote: > Hi Phil, > > through the mapping you can limit the unit one can use in a given > *schema location* (Xpath), whereas you will need to create a so called > 'object rule' in the validator to verify that for each *CVparam term*, > only an appropriate subset of units are associated with it. > > In other words, the mapping allows the discrimination of the various > CVparam element in the schema by their Xpath and permits to restrict the > subset CVparam terms and/or unit terms that can be used in each location. > > Dependencies between CVparam terms and unit terms cannot be encoded in > the mapping, but can be checked via the validator tool, > > I hope this is clear, > > Best regards, > > > Luisa > > > > > Phil Jones @ EBI wrote: >> Hi Luisa, >> >> Can you confirm for me please - does the mapping file include the >> ability to >> mandate the presence of particular units for specific CV term usage in an >> XML file? (I am thinking now about mzML files, that include the unit >> ontology accession and term in the CvParam entry - we wish to use the >> same >> XML structure in analysisXML). >> >> Best regards, >> >> Phil. >> -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: David C. <dc...@ma...> - 2008-05-07 17:27:51
|
Hi everyone, There will be an AnalysisXML working group conference call tomorrow at: http://www.timeanddate.com/worldclock/fixedtime.html?day=8&month=5&year=2008&hour=16&min=0&sec=0&p1=136 The aim is to work through the current issues list: http://code.google.com/p/psi-pi/issues/list and to review the latest schema available at: http://code.google.com/p/psi-pi/source/browse/trunk/schema/AnalysisXML_working7May.xsd + Germany: 08001012079 + Switzerland: 0800000860 + UK: 08081095644 + USA: 1-866-314-3683 + Generic international: +44 2083222500 (UK number) access code: 297427 -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: Martin E. <mar...@ru...> - 2008-05-02 19:18:15
|
Hi psi-pi-devs! (See http://code.google.com/p/psi-pi/issues/d etail?id=8) In the following four possible layouts of a SpectrumIdentificationRedultSet are given (use attached files to review in XMLSpy). To my knowledge in the "elements with body" solution we cannot control the type ("double" is "xs:decimal" in xsd). Bye Martin <!-- attributes: --> <SpectrumIdentificationResultSet identifier="Peptides1"> <SpectrumIdentificationResult identifier="ident_pep_1_1"> <SpectrumIdentificationHypothesis identifier="SIH1" peptide_ref="1_1" calculatedMassToCharge="670.86261" chargeState="2" experimentalMassToCharge="671.9"> <HypothesisValueProperty value="62.72" valuePropertyTerm_ref="DP:1:mascot_ions_ score"/> </SpectrumIdentificationHypothesis> <SpectrumElement spectrumID="S1" spectraDataInputRef_ref="file://F:/spect ra_file_1.mzML"/> </SpectrumIdentificationResult> </SpectrumIdentificationResultSet> <!-- elements WITHOUT XML body text (as commonly done at the moment): --> <SpectrumIdentificationResultSet identifier="Peptides1"> <SpectrumIdentificationResult identifier="ident_pep_1_1"> <SpectrumIdentificationHypothesis identifier="SIH1" peptide_ref="1_1"> <calculatedMassToCharge value="670.86261"/> <chargeState value="2"/> <experimentalMassToCharge value="671.9"/> <HypothesisValueProperty value="62.72" valuePropertyTerm_ref="DP:1:mascot_ions_ score"/> </SpectrumIdentificationHypothesis> <SpectrumElement spectrumID="S1" spectraDataInputRef_ref="file://F:/spect ra_file_1.mzML"/> </SpectrumIdentificationResult> </SpectrumIdentificationResultSet> <!-- elements WITH XML body text: --> <SpectrumIdentificationResultSet identifier="Peptides1"> <SpectrumIdentificationResult identifier="ident_pep_1_1"> <SpectrumIdentificationHypothesis identifier="SIH1" peptide_ref="1_1"> <calculatedMassToChargeALT>670.86261</ca lculatedMassToChargeALT> <chargeStateALT>2</chargeStateALT> <experimentalMassToChargeALT>671.9</expe rimentalMassToChargeALT> <HypothesisValueProperty value="62.72" valuePropertyTerm_ref="DP:1:mascot_ions_ score"/> </SpectrumIdentificationHypothesis> <SpectrumElement spectrumID="S1" spectraDataInputRef_ref="file://F:/spect ra_file_1.mzML"/> </SpectrumIdentificationResult> </SpectrumIdentificationResultSet> <!-- CV terms (NOT line-in yet): --> <SpectrumIdentificationResultSet identifier="Peptides1"> <SpectrumIdentificationResult identifier="ident_pep_1_1"> <SpectrumIdentificationHypothesis identifier="SIH1" peptide_ref="1_1"> <HypothesisValueProperty value="670.86261" valuePropertyTerm_ref="internal_ref:calc ulatedMassToCharge"/> <!-- plus OntologyCollection entry --> <HypothesisValueProperty value="2" valuePropertyTerm_ref="internal_ref:char geState"/> <!-- plus OntologyCollection entry --> <HypothesisValueProperty value="671.9" valuePropertyTerm_ref="internal_ref:expe rimentalMassToCharge"/> <!-- plus OntologyCollection entry --> <HypothesisValueProperty value="62.72" valuePropertyTerm_ref="DP:1:mascot_ions_ score"/> </SpectrumIdentificationHypothesis> <SpectrumElement spectrumID="S1" spectraDataInputRef_ref="file://F:/spect ra_file_1.mzML"/> </SpectrumIdentificationResult> </SpectrumIdentificationResultSet> |
From: Simon H. <sim...@ma...> - 2008-05-02 14:44:08
|
apologies, was planning to make todays call but now can't. Jenny is also out of the office too, but hope that Julian Selley from Manchester will join in - he was in Toledo and is following the threads. Hope we have a schema agreed soon - good luck -Simon- _______________________________________________________________ Dr. Simon Hubbard, Reader in Bioinformatics Faculty of Life Sciences, The University of Manchester, Michael Smith Building, Manchester M13 9PT mailto:Sim...@ma... http://www.ls.manchester.ac.uk/people/profile/index.asp?id=2524 TEL: +44 (0)161 306 8930 FAX: +44 (0)161 275 5082 |
From: Martin E. <mar...@ru...> - 2008-05-02 11:38:59
|
Btw: The "attributes <-> CV" discussion is closely related to the "elements <-> attributes" discussion (see http://code.google.com/p/psi-pi/issues/d etail?id=8). I have listed all attributes except "id(entifier)" or "*ref" there; we are talking about 18 attributes in the last version of the schema (Sept 2007). Von: psi...@li...urceforge. net [mailto:psi...@li...ur ceforge.net] Im Auftrag von Jones, Andy Gesendet: Thursday, May 01, 2008 1:31 PM An: psi...@li... Betreff: Re: [Psidev-pi-dev] Results schema critical design question from Friday afternoon in Toledo >But start, end, post and pre would now be CV? >btw, Luisa recommends that we don't make too many things like this CV... >Having been enthusiastic about the change, I think I'm now going off it - partly because with all the extra CV, file sizes may well explode. >Please persuade me otherwise! >(btw, I've 'read but ignored' the quantitation suggestions based on decisions in Toledo.) I would favour keeping things as attributes where there is a common understanding across all search engines what these mean, and they will regularly/always be required. "start, end, post and pre" - these all look like good candidates for being attributes. "calculatedMassToCharge="670.86261" chargeState="2" experimentalMassToCharge="671.9"" - I would say the same for these, every additional thing in CV bloats the instance documents and makes more work for implementers. Cheers Andy From: psi...@li...urceforge. net [mailto:psi...@li...ur ceforge.net] On Behalf Of David Creasy Sent: 30 April 2008 18:12 To: Sean L Seymour Cc: psi...@li... Subject: Re: [Psidev-pi-dev] Results schema critical design question from Friday afternoon in Toledo Hi Sean, Thanks very much - must have taken quite a while and is very useful. One thing that may not be obvious to others is where the the <SpectrumIdentificationResultSet> comes from. I believe that this was just a 'rename' of PolypeptideResultSet made by the sub group that you were in at Toledo. As we've usefully discussed, finding a way to communicate effectively is an issue. So, to make 100% sure I've understood I'll talk back to you in XML :) This is a cut down of an example for an ms-ms search of a single spectrum with peptide results and protein inferencing. The protein inferencing (impossibly - 'cos just one peptide!) has a couple of similar proteins in the first group, and one in the second group. <pf:DataCollection> <AnalyteDetectionResultSet type=MS_MS_peptide_matches> <AnalyteDetectionResult> <IdentificationResult> <SpectrumElement spectrumID="9" spectraDataInputRef_ref="file.1"/> <IdentificationHypothesis id="pep_match_x1" ref="peptide1_in_molecule_table"> <pf:cvParam accession="PI:99999" name="score" value="62" /> </IdentificationHypothesis> <IdentificationHypothesis id="pep_match_x2" ref="peptide2_in_molecule_table"> <!-- A poorer match to same spectrum as "pep_match_x1" !> <pf:cvParam accession="PI:99999" name="score" value="12" /> </IdentificationHypothesis> </IdentificationResult> </AnalyteDetectionResult> </AnalyteDetectionResultSet> <AnalyteDetectionResultSet type=Protein_inferencing> <AnalyteDetectionResult id="protein_group_1"> <IdentificationResult> <SomeTagTBD id="PP" ref="pep_match_x"> <pf:cvParam startpos = 23> <pf:cvParam endpos = 29> <SomeTagTBD /> <IdentificationHypothesis id="TRYP_PIG" ref="protein1_in_molecule_table"> <pf:cvParam accession="PI:99999" name="score" value="162" /> </IdentificationHypothesis> <IdentificationHypothesis id="TRYP_BOV" ref="protein2_in_molecule_table"> <pf:cvParam accession="PI:99999" name="score" value="162" /> </IdentificationHypothesis> </IdentificationResult> <IdentificationResult> </IdentificationResult> </AnalyteDetectionResult> </AnalyteDetectionResultSet> <AnalyteDetectionResult id="protein_group_2"> <IdentificationResult> <SomeTagTBD id="PP" ref="pep_match_y"> <pf:cvParam startpos = 123> <pf:cvParam endpos = 129> <SomeTagTBD /> <IdentificationHypothesis id="DODGY" ref="protein99_in_molecule_table"> <pf:cvParam accession="PI:99999" name="score" value="1" /> </IdentificationHypothesis> </IdentificationResult> <IdentificationResult> </IdentificationResult> </AnalyteDetectionResult> </AnalyteDetectionResultSet> </pf:DataCollection> Please correct where I haven't understood. Before, we had in peptide ID: <PolypeptideResultItem identifier="1_1" calculatedMassToCharge="670.86261" chargeState="2" experimentalMassToCharge="671.9" polypeptideReference_ref="xxx"> New proposal is that calculatedMassToCharge, chargeState and experimentalMassToCharge are all just CV? Likewise, for protein inferencing, we had: <_resultItems> <RelationResultItem identifier="" start="160" end="171" polypeptideReference_ref="1_1" post="K" pre="I"> </RelationResultItem> <RelationResultItem identifier="" start="57" end="71" polypeptideReference_ref="3_1" post="K" pre="R"> </RelationResultItem> But start, end, post and pre would now be CV? btw, Luisa recommends that we don't make too many things like this CV... Having been enthusiastic about the change, I think I'm now going off it - partly because with all the extra CV, file sizes may well explode. Please persuade me otherwise! (btw, I've 'read but ignored' the quantitation suggestions based on decisions in Toledo.) One minor comment: Slide 6: ..., but the results are always about the result from the user's perspective - "What did I find and/or measure?", rather than "How did I account for all of the spectra?" - Many users do want to try and account for all their spectra because they believe that they are missing something useful. David Sean L Seymour wrote: Hi all, After the wrap up Friday afternoon, the few remaining people in the PI group had a short meeting where we discussed a potential generalization to the results portion of the schema. The big question that came out of this was whether or not we should keep the result description for the ID of peptides from MS/MS spectra as it was by midday Friday, or whether it made sense to restructure this so that it followed the more general structure for results that we would use for many other things, including protein inference from peptide IDs. I agreed to outline the various use cases and try to lay out the issues. I had hoped to send this out by Monday, but it's taken a lot longer than planned. Apologies for being a day late, but I hope you'll see that a lot of thought went into this. There are two documents. Please look at "AnalysisXML Results Design Question.ppt" first. This lays out the specific schema change question we face. One of the biggest concerns about this proposed change was that it was not immediately obvious to any of us last Friday whether this was a substantial restructuring or essentially a renaming process. As you'll see in the slide showing the alignment, I now believe that the change is largely a renaming process and not a large change. The only real change is the insertion of one additional level, but I can image a way around doing this. In fact, I think that the reason for inserting this level is not specific to the question of the schema change, rather it's simply making up for something that was missing in the original model. There needs to be a way of having things that are attributes of the overall identification rather than an individual identification hypothesis - for example, the probability that at least one of the identification hypotheses (hits/matches) is correct for the spectrum. Assuming we agree that this is true, I think there is zero difference in the schema other than using more generic names, and my opinion is that we should really make this change. The second document, "AnalysisXML Results Use Cases.ppt" tries to capture a lot of more specific use cases that demonstrate why the proposed schema change may be the right thing to do. I've done this using 'pseudo instance documents' which are explained in the slides. I hope this is a useful communication mechanism, and may have some use for documentation as well. If no one finds them useful, no big deal - I was just trying to find a way to communicate clearly. Please excuse inaccuracies in the details of some of the use cases. I was trying to assess whether or not the constant AnalysisResult frame was robust to a large number of variations. I think you'll see that it is, and it's really not clear to my why we should have a special case of element names for the ID of peptides from MS/MS spectra. The only good reason I can see for it is that it's what we already had drawn up in the schema. Please feel free to add, modify, or correct any of this as you see fit! Sean _____ ---------------------------------------- --------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673; 13503038;p?http://java.sun.com/javaone _____ ________________________________________ _______ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/list info/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: Martin E. <mar...@ru...> - 2008-05-02 11:31:06
|
Hi all, thanks, Sean, for the very complete slides! Some of the consequences of that solution (without assessing or valuing them): - The schema documentation will be not self-contained, but as general as the solution itself - Attributes have to move to CV terms - Semantic validation is more laborious; we have to infer the ResultType from the AnalysisRef to validate the correctness In my personal opinion I want to have "special" result sections (syntax validation possible!) at least for the "SpectrumIdentification", "ProteinDetection" and "QualityAssessment" analysis types, but I can imagine to have only "SpectrumIdentificationResultSet" (with your or Alexandres changes) and your solution as "generic results", containing all others. General: I will add an issue to the googlecode project page with this discussion. Further: Closely related to this discussion is the one, which "special" analysis types we need (see http://code.google.com/p/psi-pi/issues/d etail?id=10). If we make both sections (analyses and results) "generic", we need a special "AnalysisType" attribute (or CV), to be able to validate semantically or to parse meaningful information). Bye Martin Von: psi...@li...urceforge. net [mailto:psi...@li...ur ceforge.net] Im Auftrag von David Creasy Gesendet: Friday, May 02, 2008 12:04 PM An: Pierre-Alain Binz Cc: psi...@li... Betreff: Re: [Psidev-pi-dev] Results schema critical design question from Friday afternoon in Toledo Hi Pierre-Alain, Pierre-Alain Binz wrote: Hi all, let allow me to join the discussion (again). Yes please! Simon, agree on the start - end on the principle. Let me refer to the extended fasta format we are putting in place. There, sequences are split in the case of splicing variants, but processing events (and mutations) are annotations of a single entry. Therefore, if the tools do not split the sequences in separate entries, the start and end would not change. If they split, the accession code will change and the start and end refer to two entries, as if they were originating from different genes for instance. Sean, nice exercise. Probably viable for ID. Maybe I missed something, but in all cases where the quant is made across more than one search, what is the mechanism to unify them in one document? (Label free usecases as well as multiple silac runs for instance). Same is true when concatenating ID results (How do you report a Scaffold output?). I have difficulties to include the quant in the id result section. I see issues to report global results on a quant analysis (global normalisation functions and outcomes, for instance) and as you already make an "exception" to the isobaric tag approach, how do you cope with 18O labelling when you want to use both survey scan information and data retzrieved from MS/MS spectra? Just use cases for you to consider. Gentle reminder that we have agreed: "Defer quantitation to v2. Should fit into existing framework. Attempt to guarantee back compatible. Work in parallel to produce a proposal." tiny comments: - All elements you name xxxxSet in AnalysisML are xxxxList in mzML. Would you mind using the same semantic for consistency purpose? Sounds reasonable to me - agree to put all calculatedMassToCharge, chargeState and experimentalMassToCharge into CV (looks similar to mzML also then). In mzML, we have a lot of terms in CV, and these would fall into as well. I'd rather that these were all attributes... David |
From: David C. <dc...@ma...> - 2008-05-02 10:03:44
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=windows-1252" http-equiv="Content-Type"> <title></title> </head> <body bgcolor="#ffffff" text="#000000"> Hi Pierre-Alain,<br> <br> Pierre-Alain Binz wrote: <blockquote cite="mid:481...@is..." type="cite"> <meta content="text/html;charset=windows-1252" http-equiv="Content-Type"> Hi all,<br> let allow me to join the discussion (again).<br> </blockquote> Yes please!<br> <blockquote cite="mid:481...@is..." type="cite"><br> Simon, agree on the start - end on the principle. Let me refer to the extended fasta format we are putting in place. There, sequences are split in the case of splicing variants, but processing events (and mutations) are annotations of a single entry. Therefore, if the tools do not split the sequences in separate entries, the start and end would not change. If they split, the accession code will change and the start and end refer to two entries, as if they were originating from different genes for instance.<br> <br> Sean, nice exercise. Probably viable for ID. <br> Maybe I missed something, but in all cases where the quant is made across more than one search, what is the mechanism to unify them in one document? (Label free usecases as well as multiple silac runs for instance). Same is true when concatenating ID results (How do you report a Scaffold output?).<br> I have difficulties to include the quant in the id result section. I see issues to report global results on a quant analysis (global normalisation functions and outcomes, for instance) and as you already make an "exception" to the isobaric tag approach, how do you cope with 18O labelling when you want to use both survey scan information and data retzrieved from MS/MS spectra? Just use cases for you to consider. <br> </blockquote> Gentle reminder that we have agreed: "Defer quantitation to v2. Should fit into existing framework. Attempt to guarantee back compatible. Work in parallel to produce a proposal."<br> <blockquote cite="mid:481...@is..." type="cite"><br> <br> tiny comments:<br> - All elements you name xxxxSet in AnalysisML are xxxxList in mzML. Would you mind using the same semantic for consistency purpose?<br> </blockquote> Sounds reasonable to me<br> <blockquote cite="mid:481...@is..." type="cite">- agree to put all calculatedMassToCharge, chargeState and experimentalMassToCharge into CV (looks similar to mzML also then). In mzML, we have a lot of terms in CV, and these would fall into as well.<br> </blockquote> I'd rather that these were all attributes...<br> <br> David<br> <br> </body> </html> |
From: David C. <dc...@ma...> - 2008-05-02 09:45:49
|
Hi Simon, Simon Hubbard wrote: > David's XML speak is very useful, at least for me, to help understand > the model and associated issues. Strictly, should the "ref" attribute > in the <SomeTagTBD> bit be "pep_match_x1" rather than "pep_match_x". > (as below) to refer back to the earlier <IdentificationHypothesis > id="pep_match_x1" ref="peptide1_in_molecule_table"> ? Yes - it is a typo > > <AnalyteDetectionResultSet type=Protein_inferencing> > <AnalyteDetectionResult id="protein_group_1"> > <IdentificationResult> > <SomeTagTBD id="PP" ref="pep_match_x1"> > <pf:cvParam startpos = 23> > <pf:cvParam endpos = 29> > <SomeTagTBD /> > > Also, if we have the cvParams for protein groups > such as "startpos" and "endpos" (as shown above) there could > be problems since they are protein (and not protein group) > specific. For example, a protein group contains two versions of > a protein, one with and one without the signal peptide. So any > matching peptide (outside of the signal peptide) will have > different starts in the two isoforms, but WILL match both > proteins (and hence the group). As far as protein inference goes, > one can't tell the two proteins apart and hence a protein group > is important. Is this an issue (ie. where we place cvParams, > if at all)? Yes, you are correct. The <SomeTagTBD> sections should be inside the <IdentificationHypothesis> David > > -Simon- > > David Creasy wrote: >> Hi Sean, >> >> Thanks very much - must have taken quite a while and is very useful. One >> thing that may not be obvious to others is where the the >> <SpectrumIdentificationResultSet> comes from. I believe that this was >> just a 'rename' of PolypeptideResultSet made by the sub group that you >> were in at Toledo. >> >> As we've usefully discussed, finding a way to communicate effectively is >> an issue. So, to make 100% sure I've understood I'll talk back to you in >> XML :) >> >> This is a cut down of an example for an ms-ms search of a single >> spectrum with peptide results and protein inferencing. The protein >> inferencing (impossibly - 'cos just one peptide!) has a couple of >> similar proteins in the first group, and one in the second group. >> >> <pf:DataCollection> >> <AnalyteDetectionResultSet type=MS_MS_peptide_matches> >> <AnalyteDetectionResult> >> <IdentificationResult> >> <SpectrumElement spectrumID="9" spectraDataInputRef_ref="file.1"/> >> <IdentificationHypothesis id="pep_match_x1" >> ref="peptide1_in_molecule_table"> >> <pf:cvParam accession="PI:99999" name="score" value="62" /> >> </IdentificationHypothesis> >> <IdentificationHypothesis id="pep_match_x2" >> ref="peptide2_in_molecule_table"> >> <!-- A poorer match to same spectrum as "pep_match_x1" !> >> <pf:cvParam accession="PI:99999" name="score" value="12" /> >> </IdentificationHypothesis> >> </IdentificationResult> >> </AnalyteDetectionResult> >> </AnalyteDetectionResultSet> >> >> <AnalyteDetectionResultSet type=Protein_inferencing> >> <AnalyteDetectionResult id="protein_group_1"> >> <IdentificationResult> >> <SomeTagTBD id="PP" ref="pep_match_x"> >> <pf:cvParam startpos = 23> >> <pf:cvParam endpos = 29> >> <SomeTagTBD /> >> <IdentificationHypothesis id="TRYP_PIG" >> ref="protein1_in_molecule_table"> >> <pf:cvParam accession="PI:99999" name="score" value="162" /> >> </IdentificationHypothesis> >> <IdentificationHypothesis id="TRYP_BOV" >> ref="protein2_in_molecule_table"> >> <pf:cvParam accession="PI:99999" name="score" value="162" /> >> </IdentificationHypothesis> >> </IdentificationResult> >> <IdentificationResult> # nothing doing here ? [SJH] >> </IdentificationResult> # >> </AnalyteDetectionResult> >> </AnalyteDetectionResultSet> >> <AnalyteDetectionResult id="protein_group_2"> >> <IdentificationResult> >> <SomeTagTBD id="PP" ref="pep_match_y"> >> <pf:cvParam startpos = 123> >> <pf:cvParam endpos = 129> >> <SomeTagTBD /> >> <IdentificationHypothesis id="DODGY" >> ref="protein99_in_molecule_table"> >> <pf:cvParam accession="PI:99999" name="score" value="1" /> >> </IdentificationHypothesis> >> </IdentificationResult> >> <IdentificationResult> >> </IdentificationResult> >> </AnalyteDetectionResult> >> </AnalyteDetectionResultSet> >> </pf:DataCollection> >> >> Please correct where I haven't understood. >> >> Before, we had in peptide ID: >> <PolypeptideResultItem identifier="1_1" >> calculatedMassToCharge="670.86261" chargeState="2" >> experimentalMassToCharge="671.9" polypeptideReference_ref="xxx"> >> New proposal is that calculatedMassToCharge, chargeState and >> experimentalMassToCharge are all just CV? >> >> Likewise, for protein inferencing, we had: >> <_resultItems> >> <RelationResultItem identifier="" start="160" end="171" >> polypeptideReference_ref="1_1" post="K" pre="I"> >> </RelationResultItem> >> <RelationResultItem identifier="" start="57" end="71" >> polypeptideReference_ref="3_1" post="K" pre="R"> >> </RelationResultItem> >> >> But start, end, post and pre would now be CV? >> btw, Luisa recommends that we don't make too many things like this CV... >> Having been enthusiastic about the change, I think I'm now going off it >> - partly because with all the extra CV, file sizes may well explode. >> Please persuade me otherwise! >> (btw, I've 'read but ignored' the quantitation suggestions based on >> decisions in Toledo.) >> >> >> One minor comment: >> >> Slide 6: ..., but the results are always about the result from the >> user’s perspective – “What did I find and/or measure?”, rather than “How >> did I account for all of the spectra?” >> - Many users do want to try and account for all their spectra because >> they believe that they are missing something useful. >> >> >> David >> >> Sean L Seymour wrote: >>> Hi all, >>> >>> After the wrap up Friday afternoon, the few remaining people in the PI >>> group had a short meeting where we discussed a potential >>> generalization to the results portion of the schema. The big question >>> that came out of this was whether or not we should keep the result >>> description for the ID of peptides from MS/MS spectra as it was by >>> midday Friday, or whether it made sense to restructure this so that it >>> followed the more general structure for results that we would use for >>> many other things, including protein inference from peptide IDs. I >>> agreed to outline the various use cases and try to lay out the issues. >>> I had hoped to send this out by Monday, but it's taken a lot longer >>> than planned. Apologies for being a day late, but I hope you'll see >>> that a lot of thought went into this. >>> >>> There are two documents. Please look at "AnalysisXML Results Design >>> Question.ppt" first. This lays out the specific schema change question >>> we face. One of the biggest concerns about this proposed change was >>> that it was not immediately obvious to any of us last Friday whether >>> this was a substantial restructuring or essentially a renaming >>> process. As you'll see in the slide showing the alignment, I now >>> believe that the change is largely a renaming process and not a large >>> change. The only real change is the insertion of one additional level, >>> but I can image a way around doing this. In fact, I think that the >>> reason for inserting this level is not specific to the question of the >>> schema change, rather it's simply making up for something that was >>> missing in the original model. There needs to be a way of having >>> things that are attributes of the overall identification rather than >>> an individual identification hypothesis - for example, the probability >>> that at least one of the identification hypotheses (hits/matches) is >>> correct for the spectrum. Assuming we agree that this is true, I think >>> there is zero difference in the schema other than using more generic >>> names, and my opinion is that we should really make this change. >>> >>> The second document, "AnalysisXML Results Use Cases.ppt" tries to >>> capture a lot of more specific use cases that demonstrate why the >>> proposed schema change may be the right thing to do. I've done this >>> using 'pseudo instance documents' which are explained in the slides. I >>> hope this is a useful communication mechanism, and may have some use >>> for documentation as well. If no one finds them useful, no big deal - >>> I was just trying to find a way to communicate clearly. Please excuse >>> inaccuracies in the details of some of the use cases. I was trying to >>> assess whether or not the constant AnalysisResult frame was robust to >>> a large number of variations. I think you'll see that it is, and it's >>> really not clear to my why we should have a special case of element >>> names for the ID of peptides from MS/MS spectra. The only good reason >>> I can see for it is that it's what we already had drawn up in the schema. >>> >>> Please feel free to add, modify, or correct any of this as you see fit! >>> >>> Sean >>> >>> >>> ------------------------------------------------------------------------ >>> >>> ------------------------------------------------------------------------- >>> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference >>> Don't miss this year's exciting event. There's still time to save $100. >>> Use priority code J8TL2D2. >>> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >> -- >> David Creasy >> Matrix Science >> 64 Baker Street >> London W1U 7GB, UK >> Tel: +44 (0)20 7486 1050 >> Fax: +44 (0)20 7224 1344 >> >> dc...@ma... >> http://www.matrixscience.com >> >> Matrix Science Ltd. is registered in England and Wales >> Company number 3533898 >> >> >> ------------------------------------------------------------------------ >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference >> Don't miss this year's exciting event. There's still time to save $100. >> Use priority code J8TL2D2. >> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > |
From: Jones, A. <And...@li...> - 2008-05-02 09:39:08
|
>in the IdentificationResult element, you described pf:cvParam ... Is this a namespace (also at the root of the DataCollection element) to refer to a vendor-specific CVparam or do you intent to use userParams? I do not get this. In mzML, if I'm right, we can refer to more than one CVs, that are recognised via different prefixes (PI:99999 vs MS:10000x). And there is the possibility to define userParams. For consistency in analysisXML, I’ve put CVParam into the inherited “FuGE light” schema. “pf:” is the proposed namespace for the FuGE light schema. I am waiting for a final decision on userParams and cvParam groups from the mzML working group since it sounded like there were still a few issues to resolve. I can add the current mzML CV/user param part to the schema whenever it’s required, Cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of Pierre-Alain Binz Sent: 02 May 2008 10:16 To: sim...@ma... Cc: psi...@li... Subject: Re: [Psidev-pi-dev] Results schema critical design question from Friday afternoon in Toledo Hi all, let allow me to join the discussion (again). Simon, agree on the start - end on the principle. Let me refer to the extended fasta format we are putting in place. There, sequences are split in the case of splicing variants, but processing events (and mutations) are annotations of a single entry. Therefore, if the tools do not split the sequences in separate entries, the start and end would not change. If they split, the accession code will change and the start and end refer to two entries, as if they were originating from different genes for instance. Sean, nice exercise. Probably viable for ID. Maybe I missed something, but in all cases where the quant is made across more than one search, what is the mechanism to unify them in one document? (Label free usecases as well as multiple silac runs for instance). Same is true when concatenating ID results (How do you report a Scaffold output?). I have difficulties to include the quant in the id result section. I see issues to report global results on a quant analysis (global normalisation functions and outcomes, for instance) and as you already make an "exception" to the isobaric tag approach, how do you cope with 18O labelling when you want to use both survey scan information and data retzrieved from MS/MS spectra? Just use cases for you to consider. tiny comments: - All elements you name xxxxSet in AnalysisML are xxxxList in mzML. Would you mind using the same semantic for consistency purpose? - agree to put all calculatedMassToCharge, chargeState and experimentalMassToCharge into CV (looks similar to mzML also then). In mzML, we have a lot of terms in CV, and these would fall into as well. And a question: to David's xml answer in the IdentificationResult element, you described pf:cvParam ... Is this a namespace (also at the root of the DataCollection element) to refer to a vendor-specific CVparam or do you intent to use userParams? I do not get this. In mzML, if I'm right, we can refer to more than one CVs, that are recognised via different prefixes (PI:99999 vs MS:10000x). And there is the possibility to define userParams. Pierre-Alain Simon Hubbard wrote: David's XML speak is very useful, at least for me, to help understand the model and associated issues. Strictly, should the "ref" attribute in the <SomeTagTBD> bit be "pep_match_x1" rather than "pep_match_x". (as below) to refer back to the earlier <IdentificationHypothesis id="pep_match_x1" ref="peptide1_in_molecule_table"> ? <AnalyteDetectionResultSet type=Protein_inferencing> <AnalyteDetectionResult id="protein_group_1"> <IdentificationResult> <SomeTagTBD id="PP" ref="pep_match_x1"> <pf:cvParam startpos = 23> <pf:cvParam endpos = 29> <SomeTagTBD /> Also, if we have the cvParams for protein groups such as "startpos" and "endpos" (as shown above) there could be problems since they are protein (and not protein group) specific. For example, a protein group contains two versions of a protein, one with and one without the signal peptide. So any matching peptide (outside of the signal peptide) will have different starts in the two isoforms, but WILL match both proteins (and hence the group). As far as protein inference goes, one can't tell the two proteins apart and hence a protein group is important. Is this an issue (ie. where we place cvParams, if at all)? -Simon- David Creasy wrote: Hi Sean, Thanks very much - must have taken quite a while and is very useful. One thing that may not be obvious to others is where the the <SpectrumIdentificationResultSet> comes from. I believe that this was just a 'rename' of PolypeptideResultSet made by the sub group that you were in at Toledo. As we've usefully discussed, finding a way to communicate effectively is an issue. So, to make 100% sure I've understood I'll talk back to you in XML :) This is a cut down of an example for an ms-ms search of a single spectrum with peptide results and protein inferencing. The protein inferencing (impossibly - 'cos just one peptide!) has a couple of similar proteins in the first group, and one in the second group. <pf:DataCollection> <AnalyteDetectionResultSet type=MS_MS_peptide_matches> <AnalyteDetectionResult> <IdentificationResult> <SpectrumElement spectrumID="9" spectraDataInputRef_ref="file.1"/> <IdentificationHypothesis id="pep_match_x1" ref="peptide1_in_molecule_table"> <pf:cvParam accession="PI:99999" name="score" value="62" /> </IdentificationHypothesis> <IdentificationHypothesis id="pep_match_x2" ref="peptide2_in_molecule_table"> <!-- A poorer match to same spectrum as "pep_match_x1" !> <pf:cvParam accession="PI:99999" name="score" value="12" /> </IdentificationHypothesis> </IdentificationResult> </AnalyteDetectionResult> </AnalyteDetectionResultSet> <AnalyteDetectionResultSet type=Protein_inferencing> <AnalyteDetectionResult id="protein_group_1"> <IdentificationResult> <SomeTagTBD id="PP" ref="pep_match_x"> <pf:cvParam startpos = 23> <pf:cvParam endpos = 29> <SomeTagTBD /> <IdentificationHypothesis id="TRYP_PIG" ref="protein1_in_molecule_table"> <pf:cvParam accession="PI:99999" name="score" value="162" /> </IdentificationHypothesis> <IdentificationHypothesis id="TRYP_BOV" ref="protein2_in_molecule_table"> <pf:cvParam accession="PI:99999" name="score" value="162" /> </IdentificationHypothesis> </IdentificationResult> <IdentificationResult> # nothing doing here ? [SJH] </IdentificationResult> # </AnalyteDetectionResult> </AnalyteDetectionResultSet> <AnalyteDetectionResult id="protein_group_2"> <IdentificationResult> <SomeTagTBD id="PP" ref="pep_match_y"> <pf:cvParam startpos = 123> <pf:cvParam endpos = 129> <SomeTagTBD /> <IdentificationHypothesis id="DODGY" ref="protein99_in_molecule_table"> <pf:cvParam accession="PI:99999" name="score" value="1" /> </IdentificationHypothesis> </IdentificationResult> <IdentificationResult> </IdentificationResult> </AnalyteDetectionResult> </AnalyteDetectionResultSet> </pf:DataCollection> Please correct where I haven't understood. Before, we had in peptide ID: <PolypeptideResultItem identifier="1_1" calculatedMassToCharge="670.86261" chargeState="2" experimentalMassToCharge="671.9" polypeptideReference_ref="xxx"> New proposal is that calculatedMassToCharge, chargeState and experimentalMassToCharge are all just CV? Likewise, for protein inferencing, we had: <_resultItems> <RelationResultItem identifier="" start="160" end="171" polypeptideReference_ref="1_1" post="K" pre="I"> </RelationResultItem> <RelationResultItem identifier="" start="57" end="71" polypeptideReference_ref="3_1" post="K" pre="R"> </RelationResultItem> But start, end, post and pre would now be CV? btw, Luisa recommends that we don't make too many things like this CV... Having been enthusiastic about the change, I think I'm now going off it - partly because with all the extra CV, file sizes may well explode. Please persuade me otherwise! (btw, I've 'read but ignored' the quantitation suggestions based on decisions in Toledo.) One minor comment: Slide 6: ..., but the results are always about the result from the user’s perspective – “What did I find and/or measure?”, rather than “How did I account for all of the spectra?” - Many users do want to try and account for all their spectra because they believe that they are missing something useful. David Sean L Seymour wrote: Hi all, After the wrap up Friday afternoon, the few remaining people in the PI group had a short meeting where we discussed a potential generalization to the results portion of the schema. The big question that came out of this was whether or not we should keep the result description for the ID of peptides from MS/MS spectra as it was by midday Friday, or whether it made sense to restructure this so that it followed the more general structure for results that we would use for many other things, including protein inference from peptide IDs. I agreed to outline the various use cases and try to lay out the issues. I had hoped to send this out by Monday, but it's taken a lot longer than planned. Apologies for being a day late, but I hope you'll see that a lot of thought went into this. There are two documents. Please look at "AnalysisXML Results Design Question.ppt" first. This lays out the specific schema change question we face. One of the biggest concerns about this proposed change was that it was not immediately obvious to any of us last Friday whether this was a substantial restructuring or essentially a renaming process. As you'll see in the slide showing the alignment, I now believe that the change is largely a renaming process and not a large change. The only real change is the insertion of one additional level, but I can image a way around doing this. In fact, I think that the reason for inserting this level is not specific to the question of the schema change, rather it's simply making up for something that was missing in the original model. There needs to be a way of having things that are attributes of the overall identification rather than an individual identification hypothesis - for example, the probability that at least one of the identification hypotheses (hits/matches) is correct for the spectrum. Assuming we agree that this is true, I think there is zero difference in the schema other than using more generic names, and my opinion is that we should really make this change. The second document, "AnalysisXML Results Use Cases.ppt" tries to capture a lot of more specific use cases that demonstrate why the proposed schema change may be the right thing to do. I've done this using 'pseudo instance documents' which are explained in the slides. I hope this is a useful communication mechanism, and may have some use for documentation as well. If no one finds them useful, no big deal - I was just trying to find a way to communicate clearly. Please excuse inaccuracies in the details of some of the use cases. I was trying to assess whether or not the constant AnalysisResult frame was robust to a large number of variations. I think you'll see that it is, and it's really not clear to my why we should have a special case of element names for the ID of peptides from MS/MS spectra. The only good reason I can see for it is that it's what we already had drawn up in the schema. Please feel free to add, modify, or correct any of this as you see fit! Sean ------------------------------------------------------------------------ ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ------------------------------------------------------------------------ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ------------------------------------------------------------------------ ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ------------------------------------------------------------------------ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev |
From: Pierre-Alain B. <pie...@is...> - 2008-05-02 09:18:16
|
Hi all, let allow me to join the discussion (again). Simon, agree on the start - end on the principle. Let me refer to the extended fasta format we are putting in place. There, sequences are split in the case of splicing variants, but processing events (and mutations) are annotations of a single entry. Therefore, if the tools do not split the sequences in separate entries, the start and end would not change. If they split, the accession code will change and the start and end refer to two entries, as if they were originating from different genes for instance. Sean, nice exercise. Probably viable for ID. Maybe I missed something, but in all cases where the quant is made across more than one search, what is the mechanism to unify them in one document? (Label free usecases as well as multiple silac runs for instance). Same is true when concatenating ID results (How do you report a Scaffold output?). I have difficulties to include the quant in the id result section. I see issues to report global results on a quant analysis (global normalisation functions and outcomes, for instance) and as you already make an "exception" to the isobaric tag approach, how do you cope with 18O labelling when you want to use both survey scan information and data retzrieved from MS/MS spectra? Just use cases for you to consider. tiny comments: - All elements you name xxxxSet in AnalysisML are xxxxList in mzML. Would you mind using the same semantic for consistency purpose? - agree to put all calculatedMassToCharge, chargeState and experimentalMassToCharge into CV (looks similar to mzML also then). In mzML, we have a lot of terms in CV, and these would fall into as well. And a question: to David's xml answer in the IdentificationResult element, you described pf:cvParam ... Is this a namespace (also at the root of the DataCollection element) to refer to a vendor-specific CVparam or do you intent to use userParams? I do not get this. In mzML, if I'm right, we can refer to more than one CVs, that are recognised via different prefixes (PI:99999 vs MS:10000x). And there is the possibility to define userParams. Pierre-Alain Simon Hubbard wrote: > David's XML speak is very useful, at least for me, to help understand > the model and associated issues. Strictly, should the "ref" attribute > in the <SomeTagTBD> bit be "pep_match_x1" rather than "pep_match_x". > (as below) to refer back to the earlier <IdentificationHypothesis > id="pep_match_x1" ref="peptide1_in_molecule_table"> ? > > <AnalyteDetectionResultSet type=Protein_inferencing> > <AnalyteDetectionResult id="protein_group_1"> > <IdentificationResult> > <SomeTagTBD id="PP" ref="pep_match_x1"> > <pf:cvParam startpos = 23> > <pf:cvParam endpos = 29> > <SomeTagTBD /> > > Also, if we have the cvParams for protein groups > such as "startpos" and "endpos" (as shown above) there could > be problems since they are protein (and not protein group) > specific. For example, a protein group contains two versions of > a protein, one with and one without the signal peptide. So any > matching peptide (outside of the signal peptide) will have > different starts in the two isoforms, but WILL match both > proteins (and hence the group). As far as protein inference goes, > one can't tell the two proteins apart and hence a protein group > is important. Is this an issue (ie. where we place cvParams, > if at all)? > > -Simon- > > David Creasy wrote: > >> Hi Sean, >> >> Thanks very much - must have taken quite a while and is very useful. One >> thing that may not be obvious to others is where the the >> <SpectrumIdentificationResultSet> comes from. I believe that this was >> just a 'rename' of PolypeptideResultSet made by the sub group that you >> were in at Toledo. >> >> As we've usefully discussed, finding a way to communicate effectively is >> an issue. So, to make 100% sure I've understood I'll talk back to you in >> XML :) >> >> This is a cut down of an example for an ms-ms search of a single >> spectrum with peptide results and protein inferencing. The protein >> inferencing (impossibly - 'cos just one peptide!) has a couple of >> similar proteins in the first group, and one in the second group. >> >> <pf:DataCollection> >> <AnalyteDetectionResultSet type=MS_MS_peptide_matches> >> <AnalyteDetectionResult> >> <IdentificationResult> >> <SpectrumElement spectrumID="9" spectraDataInputRef_ref="file.1"/> >> <IdentificationHypothesis id="pep_match_x1" >> ref="peptide1_in_molecule_table"> >> <pf:cvParam accession="PI:99999" name="score" value="62" /> >> </IdentificationHypothesis> >> <IdentificationHypothesis id="pep_match_x2" >> ref="peptide2_in_molecule_table"> >> <!-- A poorer match to same spectrum as "pep_match_x1" !> >> <pf:cvParam accession="PI:99999" name="score" value="12" /> >> </IdentificationHypothesis> >> </IdentificationResult> >> </AnalyteDetectionResult> >> </AnalyteDetectionResultSet> >> >> <AnalyteDetectionResultSet type=Protein_inferencing> >> <AnalyteDetectionResult id="protein_group_1"> >> <IdentificationResult> >> <SomeTagTBD id="PP" ref="pep_match_x"> >> <pf:cvParam startpos = 23> >> <pf:cvParam endpos = 29> >> <SomeTagTBD /> >> <IdentificationHypothesis id="TRYP_PIG" >> ref="protein1_in_molecule_table"> >> <pf:cvParam accession="PI:99999" name="score" value="162" /> >> </IdentificationHypothesis> >> <IdentificationHypothesis id="TRYP_BOV" >> ref="protein2_in_molecule_table"> >> <pf:cvParam accession="PI:99999" name="score" value="162" /> >> </IdentificationHypothesis> >> </IdentificationResult> >> <IdentificationResult> # nothing doing here ? [SJH] >> </IdentificationResult> # >> </AnalyteDetectionResult> >> </AnalyteDetectionResultSet> >> <AnalyteDetectionResult id="protein_group_2"> >> <IdentificationResult> >> <SomeTagTBD id="PP" ref="pep_match_y"> >> <pf:cvParam startpos = 123> >> <pf:cvParam endpos = 129> >> <SomeTagTBD /> >> <IdentificationHypothesis id="DODGY" >> ref="protein99_in_molecule_table"> >> <pf:cvParam accession="PI:99999" name="score" value="1" /> >> </IdentificationHypothesis> >> </IdentificationResult> >> <IdentificationResult> >> </IdentificationResult> >> </AnalyteDetectionResult> >> </AnalyteDetectionResultSet> >> </pf:DataCollection> >> >> Please correct where I haven't understood. >> >> Before, we had in peptide ID: >> <PolypeptideResultItem identifier="1_1" >> calculatedMassToCharge="670.86261" chargeState="2" >> experimentalMassToCharge="671.9" polypeptideReference_ref="xxx"> >> New proposal is that calculatedMassToCharge, chargeState and >> experimentalMassToCharge are all just CV? >> >> Likewise, for protein inferencing, we had: >> <_resultItems> >> <RelationResultItem identifier="" start="160" end="171" >> polypeptideReference_ref="1_1" post="K" pre="I"> >> </RelationResultItem> >> <RelationResultItem identifier="" start="57" end="71" >> polypeptideReference_ref="3_1" post="K" pre="R"> >> </RelationResultItem> >> >> But start, end, post and pre would now be CV? >> btw, Luisa recommends that we don't make too many things like this CV... >> Having been enthusiastic about the change, I think I'm now going off it >> - partly because with all the extra CV, file sizes may well explode. >> Please persuade me otherwise! >> (btw, I've 'read but ignored' the quantitation suggestions based on >> decisions in Toledo.) >> >> >> One minor comment: >> >> Slide 6: ..., but the results are always about the result from the >> user’s perspective – “What did I find and/or measure?”, rather than “How >> did I account for all of the spectra?” >> - Many users do want to try and account for all their spectra because >> they believe that they are missing something useful. >> >> >> David >> >> Sean L Seymour wrote: >> >>> Hi all, >>> >>> After the wrap up Friday afternoon, the few remaining people in the PI >>> group had a short meeting where we discussed a potential >>> generalization to the results portion of the schema. The big question >>> that came out of this was whether or not we should keep the result >>> description for the ID of peptides from MS/MS spectra as it was by >>> midday Friday, or whether it made sense to restructure this so that it >>> followed the more general structure for results that we would use for >>> many other things, including protein inference from peptide IDs. I >>> agreed to outline the various use cases and try to lay out the issues. >>> I had hoped to send this out by Monday, but it's taken a lot longer >>> than planned. Apologies for being a day late, but I hope you'll see >>> that a lot of thought went into this. >>> >>> There are two documents. Please look at "AnalysisXML Results Design >>> Question.ppt" first. This lays out the specific schema change question >>> we face. One of the biggest concerns about this proposed change was >>> that it was not immediately obvious to any of us last Friday whether >>> this was a substantial restructuring or essentially a renaming >>> process. As you'll see in the slide showing the alignment, I now >>> believe that the change is largely a renaming process and not a large >>> change. The only real change is the insertion of one additional level, >>> but I can image a way around doing this. In fact, I think that the >>> reason for inserting this level is not specific to the question of the >>> schema change, rather it's simply making up for something that was >>> missing in the original model. There needs to be a way of having >>> things that are attributes of the overall identification rather than >>> an individual identification hypothesis - for example, the probability >>> that at least one of the identification hypotheses (hits/matches) is >>> correct for the spectrum. Assuming we agree that this is true, I think >>> there is zero difference in the schema other than using more generic >>> names, and my opinion is that we should really make this change. >>> >>> The second document, "AnalysisXML Results Use Cases.ppt" tries to >>> capture a lot of more specific use cases that demonstrate why the >>> proposed schema change may be the right thing to do. I've done this >>> using 'pseudo instance documents' which are explained in the slides. I >>> hope this is a useful communication mechanism, and may have some use >>> for documentation as well. If no one finds them useful, no big deal - >>> I was just trying to find a way to communicate clearly. Please excuse >>> inaccuracies in the details of some of the use cases. I was trying to >>> assess whether or not the constant AnalysisResult frame was robust to >>> a large number of variations. I think you'll see that it is, and it's >>> really not clear to my why we should have a special case of element >>> names for the ID of peptides from MS/MS spectra. The only good reason >>> I can see for it is that it's what we already had drawn up in the schema. >>> >>> Please feel free to add, modify, or correct any of this as you see fit! >>> >>> Sean >>> >>> >>> ------------------------------------------------------------------------ >>> >>> ------------------------------------------------------------------------- >>> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference >>> Don't miss this year's exciting event. There's still time to save $100. >>> Use priority code J8TL2D2. >>> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> >> -- >> David Creasy >> Matrix Science >> 64 Baker Street >> London W1U 7GB, UK >> Tel: +44 (0)20 7486 1050 >> Fax: +44 (0)20 7224 1344 >> >> dc...@ma... >> http://www.matrixscience.com >> >> Matrix Science Ltd. is registered in England and Wales >> Company number 3533898 >> >> >> ------------------------------------------------------------------------ >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference >> Don't miss this year's exciting event. There's still time to save $100. >> Use priority code J8TL2D2. >> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> > > |
From: Simon H. <sim...@ma...> - 2008-05-01 16:35:53
|
David's XML speak is very useful, at least for me, to help understand the model and associated issues. Strictly, should the "ref" attribute in the <SomeTagTBD> bit be "pep_match_x1" rather than "pep_match_x". (as below) to refer back to the earlier <IdentificationHypothesis id="pep_match_x1" ref="peptide1_in_molecule_table"> ? <AnalyteDetectionResultSet type=Protein_inferencing> <AnalyteDetectionResult id="protein_group_1"> <IdentificationResult> <SomeTagTBD id="PP" ref="pep_match_x1"> <pf:cvParam startpos = 23> <pf:cvParam endpos = 29> <SomeTagTBD /> Also, if we have the cvParams for protein groups such as "startpos" and "endpos" (as shown above) there could be problems since they are protein (and not protein group) specific. For example, a protein group contains two versions of a protein, one with and one without the signal peptide. So any matching peptide (outside of the signal peptide) will have different starts in the two isoforms, but WILL match both proteins (and hence the group). As far as protein inference goes, one can't tell the two proteins apart and hence a protein group is important. Is this an issue (ie. where we place cvParams, if at all)? -Simon- David Creasy wrote: > Hi Sean, > > Thanks very much - must have taken quite a while and is very useful. One > thing that may not be obvious to others is where the the > <SpectrumIdentificationResultSet> comes from. I believe that this was > just a 'rename' of PolypeptideResultSet made by the sub group that you > were in at Toledo. > > As we've usefully discussed, finding a way to communicate effectively is > an issue. So, to make 100% sure I've understood I'll talk back to you in > XML :) > > This is a cut down of an example for an ms-ms search of a single > spectrum with peptide results and protein inferencing. The protein > inferencing (impossibly - 'cos just one peptide!) has a couple of > similar proteins in the first group, and one in the second group. > > <pf:DataCollection> > <AnalyteDetectionResultSet type=MS_MS_peptide_matches> > <AnalyteDetectionResult> > <IdentificationResult> > <SpectrumElement spectrumID="9" spectraDataInputRef_ref="file.1"/> > <IdentificationHypothesis id="pep_match_x1" > ref="peptide1_in_molecule_table"> > <pf:cvParam accession="PI:99999" name="score" value="62" /> > </IdentificationHypothesis> > <IdentificationHypothesis id="pep_match_x2" > ref="peptide2_in_molecule_table"> > <!-- A poorer match to same spectrum as "pep_match_x1" !> > <pf:cvParam accession="PI:99999" name="score" value="12" /> > </IdentificationHypothesis> > </IdentificationResult> > </AnalyteDetectionResult> > </AnalyteDetectionResultSet> > > <AnalyteDetectionResultSet type=Protein_inferencing> > <AnalyteDetectionResult id="protein_group_1"> > <IdentificationResult> > <SomeTagTBD id="PP" ref="pep_match_x"> > <pf:cvParam startpos = 23> > <pf:cvParam endpos = 29> > <SomeTagTBD /> > <IdentificationHypothesis id="TRYP_PIG" > ref="protein1_in_molecule_table"> > <pf:cvParam accession="PI:99999" name="score" value="162" /> > </IdentificationHypothesis> > <IdentificationHypothesis id="TRYP_BOV" > ref="protein2_in_molecule_table"> > <pf:cvParam accession="PI:99999" name="score" value="162" /> > </IdentificationHypothesis> > </IdentificationResult> > <IdentificationResult> # nothing doing here ? [SJH] > </IdentificationResult> # > </AnalyteDetectionResult> > </AnalyteDetectionResultSet> > <AnalyteDetectionResult id="protein_group_2"> > <IdentificationResult> > <SomeTagTBD id="PP" ref="pep_match_y"> > <pf:cvParam startpos = 123> > <pf:cvParam endpos = 129> > <SomeTagTBD /> > <IdentificationHypothesis id="DODGY" > ref="protein99_in_molecule_table"> > <pf:cvParam accession="PI:99999" name="score" value="1" /> > </IdentificationHypothesis> > </IdentificationResult> > <IdentificationResult> > </IdentificationResult> > </AnalyteDetectionResult> > </AnalyteDetectionResultSet> > </pf:DataCollection> > > Please correct where I haven't understood. > > Before, we had in peptide ID: > <PolypeptideResultItem identifier="1_1" > calculatedMassToCharge="670.86261" chargeState="2" > experimentalMassToCharge="671.9" polypeptideReference_ref="xxx"> > New proposal is that calculatedMassToCharge, chargeState and > experimentalMassToCharge are all just CV? > > Likewise, for protein inferencing, we had: > <_resultItems> > <RelationResultItem identifier="" start="160" end="171" > polypeptideReference_ref="1_1" post="K" pre="I"> > </RelationResultItem> > <RelationResultItem identifier="" start="57" end="71" > polypeptideReference_ref="3_1" post="K" pre="R"> > </RelationResultItem> > > But start, end, post and pre would now be CV? > btw, Luisa recommends that we don't make too many things like this CV... > Having been enthusiastic about the change, I think I'm now going off it > - partly because with all the extra CV, file sizes may well explode. > Please persuade me otherwise! > (btw, I've 'read but ignored' the quantitation suggestions based on > decisions in Toledo.) > > > One minor comment: > > Slide 6: ..., but the results are always about the result from the > user’s perspective – “What did I find and/or measure?”, rather than “How > did I account for all of the spectra?” > - Many users do want to try and account for all their spectra because > they believe that they are missing something useful. > > > David > > Sean L Seymour wrote: >> >> Hi all, >> >> After the wrap up Friday afternoon, the few remaining people in the PI >> group had a short meeting where we discussed a potential >> generalization to the results portion of the schema. The big question >> that came out of this was whether or not we should keep the result >> description for the ID of peptides from MS/MS spectra as it was by >> midday Friday, or whether it made sense to restructure this so that it >> followed the more general structure for results that we would use for >> many other things, including protein inference from peptide IDs. I >> agreed to outline the various use cases and try to lay out the issues. >> I had hoped to send this out by Monday, but it's taken a lot longer >> than planned. Apologies for being a day late, but I hope you'll see >> that a lot of thought went into this. >> >> There are two documents. Please look at "AnalysisXML Results Design >> Question.ppt" first. This lays out the specific schema change question >> we face. One of the biggest concerns about this proposed change was >> that it was not immediately obvious to any of us last Friday whether >> this was a substantial restructuring or essentially a renaming >> process. As you'll see in the slide showing the alignment, I now >> believe that the change is largely a renaming process and not a large >> change. The only real change is the insertion of one additional level, >> but I can image a way around doing this. In fact, I think that the >> reason for inserting this level is not specific to the question of the >> schema change, rather it's simply making up for something that was >> missing in the original model. There needs to be a way of having >> things that are attributes of the overall identification rather than >> an individual identification hypothesis - for example, the probability >> that at least one of the identification hypotheses (hits/matches) is >> correct for the spectrum. Assuming we agree that this is true, I think >> there is zero difference in the schema other than using more generic >> names, and my opinion is that we should really make this change. >> >> The second document, "AnalysisXML Results Use Cases.ppt" tries to >> capture a lot of more specific use cases that demonstrate why the >> proposed schema change may be the right thing to do. I've done this >> using 'pseudo instance documents' which are explained in the slides. I >> hope this is a useful communication mechanism, and may have some use >> for documentation as well. If no one finds them useful, no big deal - >> I was just trying to find a way to communicate clearly. Please excuse >> inaccuracies in the details of some of the use cases. I was trying to >> assess whether or not the constant AnalysisResult frame was robust to >> a large number of variations. I think you'll see that it is, and it's >> really not clear to my why we should have a special case of element >> names for the ID of peptides from MS/MS spectra. The only good reason >> I can see for it is that it's what we already had drawn up in the schema. >> >> Please feel free to add, modify, or correct any of this as you see fit! >> >> Sean >> >> >> ------------------------------------------------------------------------ >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference >> Don't miss this year's exciting event. There's still time to save $100. >> Use priority code J8TL2D2. >> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> > > -- > David Creasy > Matrix Science > 64 Baker Street > London W1U 7GB, UK > Tel: +44 (0)20 7486 1050 > Fax: +44 (0)20 7224 1344 > > dc...@ma... > http://www.matrixscience.com > > Matrix Science Ltd. is registered in England and Wales > Company number 3533898 > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > > > ------------------------------------------------------------------------ > > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- _______________________________________________________________ Dr. Simon Hubbard, Reader in Bioinformatics Faculty of Life Sciences, The University of Manchester, Michael Smith Building, Manchester M13 9PT mailto:Sim...@ma... http://www.ls.manchester.ac.uk/people/profile/index.asp?id=2524 TEL: +44 (0)161 306 8930 FAX: +44 (0)161 275 5082 |
From: Jones, A. <And...@li...> - 2008-05-01 12:40:13
|
>But start, end, post and pre would now be CV? >btw, Luisa recommends that we don't make too many things like this CV... >Having been enthusiastic about the change, I think I'm now going off it - partly because with all the extra CV, file sizes may well explode. >Please persuade me otherwise! >(btw, I've 'read but ignored' the quantitation suggestions based on decisions in Toledo.) I would favour keeping things as attributes where there is a common understanding across all search engines what these mean, and they will regularly/always be required. “start, end, post and pre” – these all look like good candidates for being attributes. “calculatedMassToCharge="670.86261" chargeState="2" experimentalMassToCharge="671.9"” – I would say the same for these, every additional thing in CV bloats the instance documents and makes more work for implementers. Cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of David Creasy Sent: 30 April 2008 18:12 To: Sean L Seymour Cc: psi...@li... Subject: Re: [Psidev-pi-dev] Results schema critical design question from Friday afternoon in Toledo Hi Sean, Thanks very much - must have taken quite a while and is very useful. One thing that may not be obvious to others is where the the <SpectrumIdentificationResultSet> comes from. I believe that this was just a 'rename' of PolypeptideResultSet made by the sub group that you were in at Toledo. As we've usefully discussed, finding a way to communicate effectively is an issue. So, to make 100% sure I've understood I'll talk back to you in XML :) This is a cut down of an example for an ms-ms search of a single spectrum with peptide results and protein inferencing. The protein inferencing (impossibly - 'cos just one peptide!) has a couple of similar proteins in the first group, and one in the second group. <pf:DataCollection> <AnalyteDetectionResultSet type=MS_MS_peptide_matches> <AnalyteDetectionResult> <IdentificationResult> <SpectrumElement spectrumID="9" spectraDataInputRef_ref="file.1"/> <IdentificationHypothesis id="pep_match_x1" ref="peptide1_in_molecule_table"> <pf:cvParam accession="PI:99999" name="score" value="62" /> </IdentificationHypothesis> <IdentificationHypothesis id="pep_match_x2" ref="peptide2_in_molecule_table"> <!-- A poorer match to same spectrum as "pep_match_x1" !> <pf:cvParam accession="PI:99999" name="score" value="12" /> </IdentificationHypothesis> </IdentificationResult> </AnalyteDetectionResult> </AnalyteDetectionResultSet> <AnalyteDetectionResultSet type=Protein_inferencing> <AnalyteDetectionResult id="protein_group_1"> <IdentificationResult> <SomeTagTBD id="PP" ref="pep_match_x"> <pf:cvParam startpos = 23> <pf:cvParam endpos = 29> <SomeTagTBD /> <IdentificationHypothesis id="TRYP_PIG" ref="protein1_in_molecule_table"> <pf:cvParam accession="PI:99999" name="score" value="162" /> </IdentificationHypothesis> <IdentificationHypothesis id="TRYP_BOV" ref="protein2_in_molecule_table"> <pf:cvParam accession="PI:99999" name="score" value="162" /> </IdentificationHypothesis> </IdentificationResult> <IdentificationResult> </IdentificationResult> </AnalyteDetectionResult> </AnalyteDetectionResultSet> <AnalyteDetectionResult id="protein_group_2"> <IdentificationResult> <SomeTagTBD id="PP" ref="pep_match_y"> <pf:cvParam startpos = 123> <pf:cvParam endpos = 129> <SomeTagTBD /> <IdentificationHypothesis id="DODGY" ref="protein99_in_molecule_table"> <pf:cvParam accession="PI:99999" name="score" value="1" /> </IdentificationHypothesis> </IdentificationResult> <IdentificationResult> </IdentificationResult> </AnalyteDetectionResult> </AnalyteDetectionResultSet> </pf:DataCollection> Please correct where I haven't understood. Before, we had in peptide ID: <PolypeptideResultItem identifier="1_1" calculatedMassToCharge="670.86261" chargeState="2" experimentalMassToCharge="671.9" polypeptideReference_ref="xxx"> New proposal is that calculatedMassToCharge, chargeState and experimentalMassToCharge are all just CV? Likewise, for protein inferencing, we had: <_resultItems> <RelationResultItem identifier="" start="160" end="171" polypeptideReference_ref="1_1" post="K" pre="I"> </RelationResultItem> <RelationResultItem identifier="" start="57" end="71" polypeptideReference_ref="3_1" post="K" pre="R"> </RelationResultItem> But start, end, post and pre would now be CV? btw, Luisa recommends that we don't make too many things like this CV... Having been enthusiastic about the change, I think I'm now going off it - partly because with all the extra CV, file sizes may well explode. Please persuade me otherwise! (btw, I've 'read but ignored' the quantitation suggestions based on decisions in Toledo.) One minor comment: Slide 6: ..., but the results are always about the result from the user’s perspective – “What did I find and/or measure?”, rather than “How did I account for all of the spectra?” - Many users do want to try and account for all their spectra because they believe that they are missing something useful. David Sean L Seymour wrote: Hi all, After the wrap up Friday afternoon, the few remaining people in the PI group had a short meeting where we discussed a potential generalization to the results portion of the schema. The big question that came out of this was whether or not we should keep the result description for the ID of peptides from MS/MS spectra as it was by midday Friday, or whether it made sense to restructure this so that it followed the more general structure for results that we would use for many other things, including protein inference from peptide IDs. I agreed to outline the various use cases and try to lay out the issues. I had hoped to send this out by Monday, but it's taken a lot longer than planned. Apologies for being a day late, but I hope you'll see that a lot of thought went into this. There are two documents. Please look at "AnalysisXML Results Design Question.ppt" first. This lays out the specific schema change question we face. One of the biggest concerns about this proposed change was that it was not immediately obvious to any of us last Friday whether this was a substantial restructuring or essentially a renaming process. As you'll see in the slide showing the alignment, I now believe that the change is largely a renaming process and not a large change. The only real change is the insertion of one additional level, but I can image a way around doing this. In fact, I think that the reason for inserting this level is not specific to the question of the schema change, rather it's simply making up for something that was missing in the original model. There needs to be a way of having things that are attributes of the overall identification rather than an individual identification hypothesis - for example, the probability that at least one of the identification hypotheses (hits/matches) is correct for the spectrum. Assuming we agree that this is true, I think there is zero difference in the schema other than using more generic names, and my opinion is that we should really make this change. The second document, "AnalysisXML Results Use Cases.ppt" tries to capture a lot of more specific use cases that demonstrate why the proposed schema change may be the right thing to do. I've done this using 'pseudo instance documents' which are explained in the slides. I hope this is a useful communication mechanism, and may have some use for documentation as well. If no one finds them useful, no big deal - I was just trying to find a way to communicate clearly. Please excuse inaccuracies in the details of some of the use cases. I was trying to assess whether or not the constant AnalysisResult frame was robust to a large number of variations. I think you'll see that it is, and it's really not clear to my why we should have a special case of element names for the ID of peptides from MS/MS spectra. The only good reason I can see for it is that it's what we already had drawn up in the schema. Please feel free to add, modify, or correct any of this as you see fit! Sean ________________________________ ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: David C. <dc...@ma...> - 2008-04-30 17:11:53
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type"> <title></title> </head> <body bgcolor="#ffffff" text="#000000"> Hi Sean,<br> <br> Thanks very much - must have taken quite a while and is very useful. One thing that may not be obvious to others is where the the <SpectrumIdentificationResultSet> comes from. I believe that this was just a 'rename' of PolypeptideResultSet made by the sub group that you were in at Toledo.<br> <br> As we've usefully discussed, finding a way to communicate effectively is an issue. So, to make 100% sure I've understood I'll talk back to you in XML :)<br> <br> This is a cut down of an example for an ms-ms search of a single spectrum with peptide results and protein inferencing. The protein inferencing (impossibly - 'cos just one peptide!) has a couple of similar proteins in the first group, and one in the second group.<br> <br> <tt><pf:DataCollection><br> <AnalyteDetectionResultSet type=MS_MS_peptide_matches><br> <AnalyteDetectionResult><br> <IdentificationResult><br> <SpectrumElement spectrumID="9" spectraDataInputRef_ref="file.1"/><br> <IdentificationHypothesis id="pep_match_x1" ref="peptide1_in_molecule_table"><br> <pf:cvParam accession="PI:99999" name="score" value="62" /><br> </IdentificationHypothesis><br> <IdentificationHypothesis id="pep_match_x2" ref="peptide2_in_molecule_table"><br> <!-- A poorer match to same spectrum as "</tt><tt>pep_match_</tt><tt>x1" !><br> <pf:cvParam accession="PI:99999" name="score" value="12" /><br> </IdentificationHypothesis><br> </IdentificationResult><br> </AnalyteDetectionResult><br> </AnalyteDetectionResultSet><br> <br> <AnalyteDetectionResultSet type=Protein_inferencing><br> </tt><tt> <AnalyteDetectionResult id="protein_group_1"><br> <IdentificationResult><br> <SomeTagTBD id="PP" ref="</tt><tt>pep_match_x</tt><tt>"><br> </tt><tt><pf:cvParam startpos = 23></tt><br> <tt> </tt><tt><pf:cvParam endpos = 29></tt><br> <tt> <SomeTagTBD /><br> </tt><tt> <IdentificationHypothesis id="TRYP_PIG" ref="protein1_in_molecule_table"><br> <pf:cvParam accession="PI:99999" name="score" value="162" /><br> </IdentificationHypothesis><br> </tt><tt> <IdentificationHypothesis id="TRYP_BOV" ref="protein2_in_molecule_table"><br> <pf:cvParam accession="PI:99999" name="score" value="162" /><br> </IdentificationHypothesis><br> </tt><tt> </IdentificationResult><br> </tt><tt> <IdentificationResult><br> </tt><tt> </IdentificationResult><br> </tt><tt> </AnalyteDetectionResult><br> </tt><tt> </AnalyteDetectionResultSet><br> </tt><tt> <AnalyteDetectionResult id="protein_group_2"><br> <IdentificationResult><br> <SomeTagTBD id="PP" ref="</tt><tt>pep_match_y</tt><tt>"><br> </tt><tt><pf:cvParam startpos = 123></tt><br> <tt> </tt><tt><pf:cvParam endpos = 129></tt><br> <tt> <SomeTagTBD /><br> </tt><tt> <IdentificationHypothesis id="DODGY" ref="protein99_in_molecule_table"><br> <pf:cvParam accession="PI:99999" name="score" value="1" /><br> </IdentificationHypothesis><br> </tt><tt> </IdentificationResult><br> </tt><tt> <IdentificationResult><br> </tt><tt> </IdentificationResult><br> </tt><tt> </AnalyteDetectionResult><br> </tt><tt> </AnalyteDetectionResultSet><br> </tt><tt></pf:DataCollection><br> <br> </tt>Please correct where I haven't understood.<br> <br> Before, we had in peptide ID:<br> <PolypeptideResultItem identifier="1_1" calculatedMassToCharge="670.86261" chargeState="2" experimentalMassToCharge="671.9" polypeptideReference_ref="xxx"><br> New proposal is that calculatedMassToCharge, chargeState and experimentalMassToCharge are all just CV?<br> <br> Likewise, for protein inferencing, we had:<br> <_resultItems><br> <RelationResultItem identifier="" start="160" end="171" polypeptideReference_ref="1_1" post="K" pre="I"><br> </RelationResultItem><br> <RelationResultItem identifier="" start="57" end="71" polypeptideReference_ref="3_1" post="K" pre="R"><br> </RelationResultItem><br> <br> But start, end, post and pre would now be CV?<br> btw, Luisa recommends that we don't make too many things like this CV...<br> Having been enthusiastic about the change, I think I'm now going off it - partly because with all the extra CV, file sizes may well explode. Please persuade me otherwise!<br> (btw, I've 'read but ignored' the quantitation suggestions based on decisions in Toledo.)<br> <br> <br> One minor comment:<br> <br> Slide 6: ..., but the results are always about the result from the user’s perspective – “What did I find and/or measure?”, rather than “How did I account for all of the spectra?” <br> - Many users do want to try and account for all their spectra because they believe that they are missing something useful.<br> <br> <br> David<br> <br> Sean L Seymour wrote: <blockquote cite="mid:OFC...@ap..." type="cite"><br> <font face="sans-serif" size="2">Hi all,</font> <br> <br> <font face="sans-serif" size="2">After the wrap up Friday afternoon, the few remaining people in the PI group had a short meeting where we discussed a potential generalization to the results portion of the schema. The big question that came out of this was whether or not we should keep the result description for the ID of peptides from MS/MS spectra as it was by midday Friday, or whether it made sense to restructure this so that it followed the more general structure for results that we would use for many other things, including protein inference from peptide IDs. I agreed to outline the various use cases and try to lay out the issues. I had hoped to send this out by Monday, but it's taken a lot longer than planned. Apologies for being a day late, but I hope you'll see that a lot of thought went into this.</font> <br> <br> <font face="sans-serif" size="2">There are two documents. Please look at "AnalysisXML Results Design Question.ppt" first. This lays out the specific schema change question we face. One of the biggest concerns about this proposed change was that it was not immediately obvious to any of us last Friday whether this was a substantial restructuring or essentially a renaming process. As you'll see in the slide showing the alignment, I now believe that the change is largely a renaming process and not a large change. The only real change is the insertion of one additional level, but I can image a way around doing this. In fact, I think that the reason for inserting this level is not specific to the question of the schema change, rather it's simply making up for something that was missing in the original model. There needs to be a way of having things that are attributes of the overall identification rather than an individual identification hypothesis - for example, the probability that at least one of the identification hypotheses (hits/matches) is correct for the spectrum. Assuming we agree that this is true, I think there is zero difference in the schema other than using more generic names, and my opinion is that we should really make this change.</font> <br> <br> <font face="sans-serif" size="2">The second document, "AnalysisXML Results Use Cases.ppt" tries to capture a lot of more specific use cases that demonstrate why the proposed schema change may be the right thing to do. I've done this using 'pseudo instance documents' which are explained in the slides. I hope this is a useful communication mechanism, and may have some use for documentation as well. If no one finds them useful, no big deal - I was just trying to find a way to communicate clearly. Please excuse inaccuracies in the details of some of the use cases. I was trying to assess whether or not the constant AnalysisResult frame was robust to a large number of variations. I think you'll see that it is, and it's really not clear to my why we should have a special case of element names for the ID of peptides from MS/MS spectra. The only good reason I can see for it is that it's what we already had drawn up in the schema. </font> <br> <br> <font face="sans-serif" size="2">Please feel free to add, modify, or correct any of this as you see fit!</font> <br> <br> <font face="sans-serif" size="2">Sean</font> <br> <br> <br> <pre wrap=""> <hr size="4" width="90%"> ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. <a class="moz-txt-link-freetext" href="http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone">http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone</a></pre> <pre wrap=""> <hr size="4" width="90%"> _______________________________________________ Psidev-pi-dev mailing list <a class="moz-txt-link-abbreviated" href="mailto:Psi...@li...">Psi...@li...</a> <a class="moz-txt-link-freetext" href="https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev">https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev</a> </pre> </blockquote> <br> <pre class="moz-signature" cols="72">-- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 <a class="moz-txt-link-abbreviated" href="mailto:dc...@ma...">dc...@ma...</a> <a class="moz-txt-link-freetext" href="http://www.matrixscience.com">http://www.matrixscience.com</a> Matrix Science Ltd. is registered in England and Wales Company number 3533898</pre> </body> </html> |
From: David C. <dc...@ma...> - 2008-04-30 15:05:24
|
Hi everyone, There will be an AnalysisXML working group conference call this Friday at: http://www.timeanddate.com/worldclock/fixedtime.html?day=2&month=5&year=2008&hour=16&min=0&sec=0&p1=136 The aim is to work through the current issues list: http://code.google.com/p/psi-pi/issues/list Please feel free to add comments to the list (or additional items to the list) before Friday. Hopefully we can then target another release of the schema. + Germany: 08001012079 + Switzerland: 0800000860 + UK: 08081095644 + USA: 1-866-314-3683 + Generic international: +44 2083222500 (UK number) access code: 297427 -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: David C. <dc...@ma...> - 2008-04-29 18:19:59
|
Dear All, We had a successful and enjoyable meeting in Toledo last week with about 20 people attending the AnalysisXML sessions. The slides and some notes from the meeting are here: http://www.psidev.info/index.php?q=node/105#meetings As you'll see from the notes, we still have some work to do and a 'few' remaining issues. The issue list is being held at http://code.google.com/p/psi-pi/issues/list and I'd encourage you to make comments for each issue. To get the latest schema, cv etc. See: http://www.psidev.info/index.php?q=node/105 and scroll down to 'Obtaining the Current Development...' Time plan: 28 Apr 2008. Phil Update web site. (Nearly done, but ongoing) 29 Apr 2008. David Put all issues on issue list. Done 29 Apr 2008. Andy Decision about UML or conversion to xsd. Done 02 May 2008. David Update and tidy use cases. 02 May 2008. Andy All outstanding issues -> schema 14 May 2008. Sean Instance docs for Parragon David Instance docs for Mascot Alex Instance docs for Phenyx Martin Instance docs for Sequest Andreas Instance docs for X!Tandem 14 May 2008. Andreas Parser for AnalysisXML <-> OpenMS 31 May 2008. Zsuzsanna +Randy,Luisa - CV 31 May 2008. Andy First draft of documentation ?? ??? 2008. EBI Validator 20 Jun 2008 All Submission to PSI document process. Sep 2008. Release 1.0 If you'd like to volunteer to help, please send an email to the list or to me. Thanks, David |
From: Sean L S. <Sey...@ap...> - 2008-04-29 15:43:54
|
I agree with Martin. I would add that peptides are often identified many times due to redundant acquisition as well. There should be value at the protein level as well. You have to be able to put the full protein sequence and I wouldn't want to do that more than once. That said, it may be reasonable to have use cases where definition of what protein accession was identified is done by a pointer to an external resource via URI. Thus, use of the molecule table could be optional, but I really think you need it. Sean "Martin Eisenacher" <mar...@ru...> Sent by: psi...@li... 04/29/2008 08:14 AM To <psi...@li...> cc Subject Re: [Psidev-pi-dev] molecule table thoughts Meanwhile I think it saves us a lot, if a peptide (with modifications) is found in more than one spectrum. In the SpectrumSearchResultItem there is the link between spectrum and peptide (with mods) TOGETHER with a score. Additionally we have to report the peptide sequence (with mods) only once, although it may be found with more than one charge states. Bye Martin Von: psi...@li... [mailto:psi...@li...] Im Auftrag von Angel Pizarro Gesendet: Sunday, April 27, 2008 4:46 PM An: psi...@li... Betreff: [Psidev-pi-dev] molecule table thoughts More and more I don't think our molecule lookup table is gaining us anything. Specifically, last time at the EBI, Martin and I were discussing modifications and how best and proper to represent them and we came to the conclusion that the modification information should live directly next to the scoring information. Also taking into account that a protein determination step is an analysis, we can see that representing the protein groups can also be defined within the result set containing the scoring information. So the only thing that the molecule table buys us is an easy look-up for information without context and possibly (although not proven) file size savings. The cost is that the format MUST reference the lookup table, adding complexity to encoding and reading the actual results. I don't think the convenience is worth the cost. Can we move the identified compound information to be collocated with the results information? -angel ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev |
From: Martin E. <mar...@ru...> - 2008-04-29 14:50:36
|
Meanwhile I think it saves us a lot, if a peptide (with modifications) is found in more than one spectrum. In the SpectrumSearchResultItem there is the link between spectrum and peptide (with mods) TOGETHER with a score. Additionally we have to report the peptide sequence (with mods) only once, although it may be found with more than one charge states. Bye Martin Von: psi...@li...urceforge. net [mailto:psi...@li...ur ceforge.net] Im Auftrag von Angel Pizarro Gesendet: Sunday, April 27, 2008 4:46 PM An: psi...@li... Betreff: [Psidev-pi-dev] molecule table thoughts More and more I don't think our molecule lookup table is gaining us anything. Specifically, last time at the EBI, Martin and I were discussing modifications and how best and proper to represent them and we came to the conclusion that the modification information should live directly next to the scoring information. Also taking into account that a protein determination step is an analysis, we can see that representing the protein groups can also be defined within the result set containing the scoring information. So the only thing that the molecule table buys us is an easy look-up for information without context and possibly (although not proven) file size savings. The cost is that the format MUST reference the lookup table, adding complexity to encoding and reading the actual results. I don't think the convenience is worth the cost. Can we move the identified compound information to be collocated with the results information? -angel |
From: David C. <dc...@ma...> - 2008-04-29 09:27:22
|
Hi Luisa, Thanks very much - looks good. I've put the .obo file in http://code.google.com/p/psi-pi/source/browse/trunk/cv/psi-pi.obo Some comments below: Luisa Montecchi wrote: > Dear All, > > here is draft obo file, loadable with the obo editor (you can download > the editor at > https://sourceforge.net/project/showfiles.php?group_id=36855&package_id=192411) > > > Few questions: > Do you want to report all the information available in the spreadsheet > as comment (like 'required by MIAPE' 'MCP' +all columns after the term) Maybe. An issue that I don't know how to resolve is that ideally the validator will need to check, for example that if the search engine is Sequest, then a sequest:xcorr value should be specified. If the search engine is X!Tandem, and a sequest:xcorr value is specified, this would be an error. Any ideas for how best to keep track of all this so that we don't lose or forget things? We have the spreadsheet, the mapping file and the .obo file. So maybe we need to keep the spreadsheet up to date as well. Perhaps Zsuzsanna, you and Randy can agree on an approach? > Is the PI namespace fine with you? Yes. > I put all term lower case, as it is good practice in CVs, is it fine? yes, luisa, this looks ok. > Would you authorize me to change few names, an ugly one being for > example 'database(s) searched min=0 for denovo?'. Ideally I would also > like to put all terms singular. Yes please. > > In general I think number of those terms should be XML attributes in the > schema (like 'sample id', or 'date / time search performed' or > 'modification position', whereas CV are fine and should be the reference > for descriptif information like 'database filtering' or 'search engines > scores'. Yes please. Could you make a list and let us know? I've added a tracker item: http://code.google.com/p/psi-pi/issues/detail?id=15 So, please add to the list there. > > We can plan a phone conference if you want to browse the file together, > I can organise one next week at EBI, or I you can call me at anytime > this Tuesday, Wednesday We will need to go through the file at some point. Best to liaise also with Zsuzsanna and Randy. > (not thursday it is may first in the continental > world). But you are in the UK! Does this mean you take the 5th as holiday too ;) Thanks very much, David > > Cheers, > > Luisa > > -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: Angel P. <an...@ma...> - 2008-04-27 14:45:35
|
More and more I don't think our molecule lookup table is gaining us anything. Specifically, last time at the EBI, Martin and I were discussing modifications and how best and proper to represent them and we came to the conclusion that the modification information should live directly next to the scoring information. Also taking into account that a protein determination step is an analysis, we can see that representing the protein groups can also be defined within the result set containing the scoring information. So the only thing that the molecule table buys us is an easy look-up for information without context and possibly (although not proven) file size savings. The cost is that the format MUST reference the lookup table, adding complexity to encoding and reading the actual results. I don't think the convenience is worth the cost. Can we move the identified compound information to be collocated with the results information? -angel |
From: Juan P. A. <jp...@pr...> - 2008-03-27 18:28:00
|
Dear Friends, Sorry if you have already registered and you get multiple posts. The HUPO-PSI Spring Meeting 2008 is almost here, register now! (DEADLINE FOR REGISTRATION APRIL, 1ST) Come and join us April 23-25th, 2008, at Hotel Beatriz in Toledo, Spain for the 2008 Spring Meeting of the HUPO Proteomics Standards Initiative. A unique opportunity to participate in the development of current standards in Proteomics. The PSI working groups are targeting to discuss and deliver a number of standards, including (not exhaustively) - Adaptation of MI XML format to new data types - PSICQUIC definition - mzML stable format (merging mzData and mzXML) - Controlled Vocabularies for a number of PSI modules and the PSI Beta Validator framework presentation, - MIAPE documentation for still remaining modules - and much more Detailed information and registration form can be found at http://www.psidev.info/index.php?q=node/305 Looking forward to seeing you in Toledo, From the meeting organization committee and the PSI Steering Group Juan Pablo Albar -- Dr. Juan Pablo Albar ProteoRed, Coordinator Centro Nacional de Biotecnología/CSIC UAM Campus Cantoblanco Darwin, 3 Madrid E-28049 Spain http://www.proteored.org/ Telf (+34) 91585-4668 Fax (+34) 91585-4506 |
From: Juan P. A. <jp...@pr...> - 2008-03-12 19:05:39
|
Dear Friends, Sorry if you get multiple posts, and please distribute to people who might be interested. The HUPO-PSI Spring Meeting 2008 is quickly approaching, register now! Come and join us April 23-25th, 2008, at Hotel Beatriz Toledo in Toledo, Spain for the Spring Meeting 2008 of the HUPO Proteomics Standards Initiative. A unique opportunity to participate in the development of current standards in Proteomics. The PSI working groups are targeting to discuss and deliver a number of standards, including (not exhaustively) - Adaptation of MI XML format to new data types - PSICQUIC definition - mzML stable format (merging mzData and mzXML) - Controlled Vocabularies for a number of PSI modules and the PSI Beta Validator framework presentation, - MIAPE documentation for still remaining modules - and much more Detailed information and registration form can be found at http://www.psidev.info/index.php?q=node/305 Looking forward to seeing you in Toledo, From the meeting organization committee and the PSI Steering Group Juan Pablo Albar -- Dr. Juan Pablo Albar ProteoRed, Coordinator Centro Nacional de Biotecnología/CSIC UAM Campus Cantoblanco Darwin, 3 Madrid E-28049 Spain http://www.proteored.org/ Telf (+34) 91585-4668 Fax (+34) 91585-4506 |