From: David C. <dc...@ma...> - 2008-04-30 17:11:53
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type"> <title></title> </head> <body bgcolor="#ffffff" text="#000000"> Hi Sean,<br> <br> Thanks very much - must have taken quite a while and is very useful. One thing that may not be obvious to others is where the the <SpectrumIdentificationResultSet> comes from. I believe that this was just a 'rename' of PolypeptideResultSet made by the sub group that you were in at Toledo.<br> <br> As we've usefully discussed, finding a way to communicate effectively is an issue. So, to make 100% sure I've understood I'll talk back to you in XML :)<br> <br> This is a cut down of an example for an ms-ms search of a single spectrum with peptide results and protein inferencing. The protein inferencing (impossibly - 'cos just one peptide!) has a couple of similar proteins in the first group, and one in the second group.<br> <br> <tt><pf:DataCollection><br> <AnalyteDetectionResultSet type=MS_MS_peptide_matches><br> <AnalyteDetectionResult><br> <IdentificationResult><br> <SpectrumElement spectrumID="9" spectraDataInputRef_ref="file.1"/><br> <IdentificationHypothesis id="pep_match_x1" ref="peptide1_in_molecule_table"><br> <pf:cvParam accession="PI:99999" name="score" value="62" /><br> </IdentificationHypothesis><br> <IdentificationHypothesis id="pep_match_x2" ref="peptide2_in_molecule_table"><br> <!-- A poorer match to same spectrum as "</tt><tt>pep_match_</tt><tt>x1" !><br> <pf:cvParam accession="PI:99999" name="score" value="12" /><br> </IdentificationHypothesis><br> </IdentificationResult><br> </AnalyteDetectionResult><br> </AnalyteDetectionResultSet><br> <br> <AnalyteDetectionResultSet type=Protein_inferencing><br> </tt><tt> <AnalyteDetectionResult id="protein_group_1"><br> <IdentificationResult><br> <SomeTagTBD id="PP" ref="</tt><tt>pep_match_x</tt><tt>"><br> </tt><tt><pf:cvParam startpos = 23></tt><br> <tt> </tt><tt><pf:cvParam endpos = 29></tt><br> <tt> <SomeTagTBD /><br> </tt><tt> <IdentificationHypothesis id="TRYP_PIG" ref="protein1_in_molecule_table"><br> <pf:cvParam accession="PI:99999" name="score" value="162" /><br> </IdentificationHypothesis><br> </tt><tt> <IdentificationHypothesis id="TRYP_BOV" ref="protein2_in_molecule_table"><br> <pf:cvParam accession="PI:99999" name="score" value="162" /><br> </IdentificationHypothesis><br> </tt><tt> </IdentificationResult><br> </tt><tt> <IdentificationResult><br> </tt><tt> </IdentificationResult><br> </tt><tt> </AnalyteDetectionResult><br> </tt><tt> </AnalyteDetectionResultSet><br> </tt><tt> <AnalyteDetectionResult id="protein_group_2"><br> <IdentificationResult><br> <SomeTagTBD id="PP" ref="</tt><tt>pep_match_y</tt><tt>"><br> </tt><tt><pf:cvParam startpos = 123></tt><br> <tt> </tt><tt><pf:cvParam endpos = 129></tt><br> <tt> <SomeTagTBD /><br> </tt><tt> <IdentificationHypothesis id="DODGY" ref="protein99_in_molecule_table"><br> <pf:cvParam accession="PI:99999" name="score" value="1" /><br> </IdentificationHypothesis><br> </tt><tt> </IdentificationResult><br> </tt><tt> <IdentificationResult><br> </tt><tt> </IdentificationResult><br> </tt><tt> </AnalyteDetectionResult><br> </tt><tt> </AnalyteDetectionResultSet><br> </tt><tt></pf:DataCollection><br> <br> </tt>Please correct where I haven't understood.<br> <br> Before, we had in peptide ID:<br> <PolypeptideResultItem identifier="1_1" calculatedMassToCharge="670.86261" chargeState="2" experimentalMassToCharge="671.9" polypeptideReference_ref="xxx"><br> New proposal is that calculatedMassToCharge, chargeState and experimentalMassToCharge are all just CV?<br> <br> Likewise, for protein inferencing, we had:<br> <_resultItems><br> <RelationResultItem identifier="" start="160" end="171" polypeptideReference_ref="1_1" post="K" pre="I"><br> </RelationResultItem><br> <RelationResultItem identifier="" start="57" end="71" polypeptideReference_ref="3_1" post="K" pre="R"><br> </RelationResultItem><br> <br> But start, end, post and pre would now be CV?<br> btw, Luisa recommends that we don't make too many things like this CV...<br> Having been enthusiastic about the change, I think I'm now going off it - partly because with all the extra CV, file sizes may well explode. Please persuade me otherwise!<br> (btw, I've 'read but ignored' the quantitation suggestions based on decisions in Toledo.)<br> <br> <br> One minor comment:<br> <br> Slide 6: ..., but the results are always about the result from the user’s perspective – “What did I find and/or measure?”, rather than “How did I account for all of the spectra?” <br> - Many users do want to try and account for all their spectra because they believe that they are missing something useful.<br> <br> <br> David<br> <br> Sean L Seymour wrote: <blockquote cite="mid:OFC...@ap..." type="cite"><br> <font face="sans-serif" size="2">Hi all,</font> <br> <br> <font face="sans-serif" size="2">After the wrap up Friday afternoon, the few remaining people in the PI group had a short meeting where we discussed a potential generalization to the results portion of the schema. The big question that came out of this was whether or not we should keep the result description for the ID of peptides from MS/MS spectra as it was by midday Friday, or whether it made sense to restructure this so that it followed the more general structure for results that we would use for many other things, including protein inference from peptide IDs. I agreed to outline the various use cases and try to lay out the issues. I had hoped to send this out by Monday, but it's taken a lot longer than planned. Apologies for being a day late, but I hope you'll see that a lot of thought went into this.</font> <br> <br> <font face="sans-serif" size="2">There are two documents. Please look at "AnalysisXML Results Design Question.ppt" first. This lays out the specific schema change question we face. One of the biggest concerns about this proposed change was that it was not immediately obvious to any of us last Friday whether this was a substantial restructuring or essentially a renaming process. As you'll see in the slide showing the alignment, I now believe that the change is largely a renaming process and not a large change. The only real change is the insertion of one additional level, but I can image a way around doing this. In fact, I think that the reason for inserting this level is not specific to the question of the schema change, rather it's simply making up for something that was missing in the original model. There needs to be a way of having things that are attributes of the overall identification rather than an individual identification hypothesis - for example, the probability that at least one of the identification hypotheses (hits/matches) is correct for the spectrum. Assuming we agree that this is true, I think there is zero difference in the schema other than using more generic names, and my opinion is that we should really make this change.</font> <br> <br> <font face="sans-serif" size="2">The second document, "AnalysisXML Results Use Cases.ppt" tries to capture a lot of more specific use cases that demonstrate why the proposed schema change may be the right thing to do. I've done this using 'pseudo instance documents' which are explained in the slides. I hope this is a useful communication mechanism, and may have some use for documentation as well. If no one finds them useful, no big deal - I was just trying to find a way to communicate clearly. Please excuse inaccuracies in the details of some of the use cases. I was trying to assess whether or not the constant AnalysisResult frame was robust to a large number of variations. I think you'll see that it is, and it's really not clear to my why we should have a special case of element names for the ID of peptides from MS/MS spectra. The only good reason I can see for it is that it's what we already had drawn up in the schema. </font> <br> <br> <font face="sans-serif" size="2">Please feel free to add, modify, or correct any of this as you see fit!</font> <br> <br> <font face="sans-serif" size="2">Sean</font> <br> <br> <br> <pre wrap=""> <hr size="4" width="90%"> ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. <a class="moz-txt-link-freetext" href="http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone">http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone</a></pre> <pre wrap=""> <hr size="4" width="90%"> _______________________________________________ Psidev-pi-dev mailing list <a class="moz-txt-link-abbreviated" href="mailto:Psi...@li...">Psi...@li...</a> <a class="moz-txt-link-freetext" href="https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev">https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev</a> </pre> </blockquote> <br> <pre class="moz-signature" cols="72">-- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 <a class="moz-txt-link-abbreviated" href="mailto:dc...@ma...">dc...@ma...</a> <a class="moz-txt-link-freetext" href="http://www.matrixscience.com">http://www.matrixscience.com</a> Matrix Science Ltd. is registered in England and Wales Company number 3533898</pre> </body> </html> |
From: Jones, A. <And...@li...> - 2008-05-01 12:40:13
|
>But start, end, post and pre would now be CV? >btw, Luisa recommends that we don't make too many things like this CV... >Having been enthusiastic about the change, I think I'm now going off it - partly because with all the extra CV, file sizes may well explode. >Please persuade me otherwise! >(btw, I've 'read but ignored' the quantitation suggestions based on decisions in Toledo.) I would favour keeping things as attributes where there is a common understanding across all search engines what these mean, and they will regularly/always be required. “start, end, post and pre” – these all look like good candidates for being attributes. “calculatedMassToCharge="670.86261" chargeState="2" experimentalMassToCharge="671.9"” – I would say the same for these, every additional thing in CV bloats the instance documents and makes more work for implementers. Cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of David Creasy Sent: 30 April 2008 18:12 To: Sean L Seymour Cc: psi...@li... Subject: Re: [Psidev-pi-dev] Results schema critical design question from Friday afternoon in Toledo Hi Sean, Thanks very much - must have taken quite a while and is very useful. One thing that may not be obvious to others is where the the <SpectrumIdentificationResultSet> comes from. I believe that this was just a 'rename' of PolypeptideResultSet made by the sub group that you were in at Toledo. As we've usefully discussed, finding a way to communicate effectively is an issue. So, to make 100% sure I've understood I'll talk back to you in XML :) This is a cut down of an example for an ms-ms search of a single spectrum with peptide results and protein inferencing. The protein inferencing (impossibly - 'cos just one peptide!) has a couple of similar proteins in the first group, and one in the second group. <pf:DataCollection> <AnalyteDetectionResultSet type=MS_MS_peptide_matches> <AnalyteDetectionResult> <IdentificationResult> <SpectrumElement spectrumID="9" spectraDataInputRef_ref="file.1"/> <IdentificationHypothesis id="pep_match_x1" ref="peptide1_in_molecule_table"> <pf:cvParam accession="PI:99999" name="score" value="62" /> </IdentificationHypothesis> <IdentificationHypothesis id="pep_match_x2" ref="peptide2_in_molecule_table"> <!-- A poorer match to same spectrum as "pep_match_x1" !> <pf:cvParam accession="PI:99999" name="score" value="12" /> </IdentificationHypothesis> </IdentificationResult> </AnalyteDetectionResult> </AnalyteDetectionResultSet> <AnalyteDetectionResultSet type=Protein_inferencing> <AnalyteDetectionResult id="protein_group_1"> <IdentificationResult> <SomeTagTBD id="PP" ref="pep_match_x"> <pf:cvParam startpos = 23> <pf:cvParam endpos = 29> <SomeTagTBD /> <IdentificationHypothesis id="TRYP_PIG" ref="protein1_in_molecule_table"> <pf:cvParam accession="PI:99999" name="score" value="162" /> </IdentificationHypothesis> <IdentificationHypothesis id="TRYP_BOV" ref="protein2_in_molecule_table"> <pf:cvParam accession="PI:99999" name="score" value="162" /> </IdentificationHypothesis> </IdentificationResult> <IdentificationResult> </IdentificationResult> </AnalyteDetectionResult> </AnalyteDetectionResultSet> <AnalyteDetectionResult id="protein_group_2"> <IdentificationResult> <SomeTagTBD id="PP" ref="pep_match_y"> <pf:cvParam startpos = 123> <pf:cvParam endpos = 129> <SomeTagTBD /> <IdentificationHypothesis id="DODGY" ref="protein99_in_molecule_table"> <pf:cvParam accession="PI:99999" name="score" value="1" /> </IdentificationHypothesis> </IdentificationResult> <IdentificationResult> </IdentificationResult> </AnalyteDetectionResult> </AnalyteDetectionResultSet> </pf:DataCollection> Please correct where I haven't understood. Before, we had in peptide ID: <PolypeptideResultItem identifier="1_1" calculatedMassToCharge="670.86261" chargeState="2" experimentalMassToCharge="671.9" polypeptideReference_ref="xxx"> New proposal is that calculatedMassToCharge, chargeState and experimentalMassToCharge are all just CV? Likewise, for protein inferencing, we had: <_resultItems> <RelationResultItem identifier="" start="160" end="171" polypeptideReference_ref="1_1" post="K" pre="I"> </RelationResultItem> <RelationResultItem identifier="" start="57" end="71" polypeptideReference_ref="3_1" post="K" pre="R"> </RelationResultItem> But start, end, post and pre would now be CV? btw, Luisa recommends that we don't make too many things like this CV... Having been enthusiastic about the change, I think I'm now going off it - partly because with all the extra CV, file sizes may well explode. Please persuade me otherwise! (btw, I've 'read but ignored' the quantitation suggestions based on decisions in Toledo.) One minor comment: Slide 6: ..., but the results are always about the result from the user’s perspective – “What did I find and/or measure?”, rather than “How did I account for all of the spectra?” - Many users do want to try and account for all their spectra because they believe that they are missing something useful. David Sean L Seymour wrote: Hi all, After the wrap up Friday afternoon, the few remaining people in the PI group had a short meeting where we discussed a potential generalization to the results portion of the schema. The big question that came out of this was whether or not we should keep the result description for the ID of peptides from MS/MS spectra as it was by midday Friday, or whether it made sense to restructure this so that it followed the more general structure for results that we would use for many other things, including protein inference from peptide IDs. I agreed to outline the various use cases and try to lay out the issues. I had hoped to send this out by Monday, but it's taken a lot longer than planned. Apologies for being a day late, but I hope you'll see that a lot of thought went into this. There are two documents. Please look at "AnalysisXML Results Design Question.ppt" first. This lays out the specific schema change question we face. One of the biggest concerns about this proposed change was that it was not immediately obvious to any of us last Friday whether this was a substantial restructuring or essentially a renaming process. As you'll see in the slide showing the alignment, I now believe that the change is largely a renaming process and not a large change. The only real change is the insertion of one additional level, but I can image a way around doing this. In fact, I think that the reason for inserting this level is not specific to the question of the schema change, rather it's simply making up for something that was missing in the original model. There needs to be a way of having things that are attributes of the overall identification rather than an individual identification hypothesis - for example, the probability that at least one of the identification hypotheses (hits/matches) is correct for the spectrum. Assuming we agree that this is true, I think there is zero difference in the schema other than using more generic names, and my opinion is that we should really make this change. The second document, "AnalysisXML Results Use Cases.ppt" tries to capture a lot of more specific use cases that demonstrate why the proposed schema change may be the right thing to do. I've done this using 'pseudo instance documents' which are explained in the slides. I hope this is a useful communication mechanism, and may have some use for documentation as well. If no one finds them useful, no big deal - I was just trying to find a way to communicate clearly. Please excuse inaccuracies in the details of some of the use cases. I was trying to assess whether or not the constant AnalysisResult frame was robust to a large number of variations. I think you'll see that it is, and it's really not clear to my why we should have a special case of element names for the ID of peptides from MS/MS spectra. The only good reason I can see for it is that it's what we already had drawn up in the schema. Please feel free to add, modify, or correct any of this as you see fit! Sean ________________________________ ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: Martin E. <mar...@ru...> - 2008-05-02 11:38:59
|
Btw: The "attributes <-> CV" discussion is closely related to the "elements <-> attributes" discussion (see http://code.google.com/p/psi-pi/issues/d etail?id=8). I have listed all attributes except "id(entifier)" or "*ref" there; we are talking about 18 attributes in the last version of the schema (Sept 2007). Von: psi...@li...urceforge. net [mailto:psi...@li...ur ceforge.net] Im Auftrag von Jones, Andy Gesendet: Thursday, May 01, 2008 1:31 PM An: psi...@li... Betreff: Re: [Psidev-pi-dev] Results schema critical design question from Friday afternoon in Toledo >But start, end, post and pre would now be CV? >btw, Luisa recommends that we don't make too many things like this CV... >Having been enthusiastic about the change, I think I'm now going off it - partly because with all the extra CV, file sizes may well explode. >Please persuade me otherwise! >(btw, I've 'read but ignored' the quantitation suggestions based on decisions in Toledo.) I would favour keeping things as attributes where there is a common understanding across all search engines what these mean, and they will regularly/always be required. "start, end, post and pre" - these all look like good candidates for being attributes. "calculatedMassToCharge="670.86261" chargeState="2" experimentalMassToCharge="671.9"" - I would say the same for these, every additional thing in CV bloats the instance documents and makes more work for implementers. Cheers Andy From: psi...@li...urceforge. net [mailto:psi...@li...ur ceforge.net] On Behalf Of David Creasy Sent: 30 April 2008 18:12 To: Sean L Seymour Cc: psi...@li... Subject: Re: [Psidev-pi-dev] Results schema critical design question from Friday afternoon in Toledo Hi Sean, Thanks very much - must have taken quite a while and is very useful. One thing that may not be obvious to others is where the the <SpectrumIdentificationResultSet> comes from. I believe that this was just a 'rename' of PolypeptideResultSet made by the sub group that you were in at Toledo. As we've usefully discussed, finding a way to communicate effectively is an issue. So, to make 100% sure I've understood I'll talk back to you in XML :) This is a cut down of an example for an ms-ms search of a single spectrum with peptide results and protein inferencing. The protein inferencing (impossibly - 'cos just one peptide!) has a couple of similar proteins in the first group, and one in the second group. <pf:DataCollection> <AnalyteDetectionResultSet type=MS_MS_peptide_matches> <AnalyteDetectionResult> <IdentificationResult> <SpectrumElement spectrumID="9" spectraDataInputRef_ref="file.1"/> <IdentificationHypothesis id="pep_match_x1" ref="peptide1_in_molecule_table"> <pf:cvParam accession="PI:99999" name="score" value="62" /> </IdentificationHypothesis> <IdentificationHypothesis id="pep_match_x2" ref="peptide2_in_molecule_table"> <!-- A poorer match to same spectrum as "pep_match_x1" !> <pf:cvParam accession="PI:99999" name="score" value="12" /> </IdentificationHypothesis> </IdentificationResult> </AnalyteDetectionResult> </AnalyteDetectionResultSet> <AnalyteDetectionResultSet type=Protein_inferencing> <AnalyteDetectionResult id="protein_group_1"> <IdentificationResult> <SomeTagTBD id="PP" ref="pep_match_x"> <pf:cvParam startpos = 23> <pf:cvParam endpos = 29> <SomeTagTBD /> <IdentificationHypothesis id="TRYP_PIG" ref="protein1_in_molecule_table"> <pf:cvParam accession="PI:99999" name="score" value="162" /> </IdentificationHypothesis> <IdentificationHypothesis id="TRYP_BOV" ref="protein2_in_molecule_table"> <pf:cvParam accession="PI:99999" name="score" value="162" /> </IdentificationHypothesis> </IdentificationResult> <IdentificationResult> </IdentificationResult> </AnalyteDetectionResult> </AnalyteDetectionResultSet> <AnalyteDetectionResult id="protein_group_2"> <IdentificationResult> <SomeTagTBD id="PP" ref="pep_match_y"> <pf:cvParam startpos = 123> <pf:cvParam endpos = 129> <SomeTagTBD /> <IdentificationHypothesis id="DODGY" ref="protein99_in_molecule_table"> <pf:cvParam accession="PI:99999" name="score" value="1" /> </IdentificationHypothesis> </IdentificationResult> <IdentificationResult> </IdentificationResult> </AnalyteDetectionResult> </AnalyteDetectionResultSet> </pf:DataCollection> Please correct where I haven't understood. Before, we had in peptide ID: <PolypeptideResultItem identifier="1_1" calculatedMassToCharge="670.86261" chargeState="2" experimentalMassToCharge="671.9" polypeptideReference_ref="xxx"> New proposal is that calculatedMassToCharge, chargeState and experimentalMassToCharge are all just CV? Likewise, for protein inferencing, we had: <_resultItems> <RelationResultItem identifier="" start="160" end="171" polypeptideReference_ref="1_1" post="K" pre="I"> </RelationResultItem> <RelationResultItem identifier="" start="57" end="71" polypeptideReference_ref="3_1" post="K" pre="R"> </RelationResultItem> But start, end, post and pre would now be CV? btw, Luisa recommends that we don't make too many things like this CV... Having been enthusiastic about the change, I think I'm now going off it - partly because with all the extra CV, file sizes may well explode. Please persuade me otherwise! (btw, I've 'read but ignored' the quantitation suggestions based on decisions in Toledo.) One minor comment: Slide 6: ..., but the results are always about the result from the user's perspective - "What did I find and/or measure?", rather than "How did I account for all of the spectra?" - Many users do want to try and account for all their spectra because they believe that they are missing something useful. David Sean L Seymour wrote: Hi all, After the wrap up Friday afternoon, the few remaining people in the PI group had a short meeting where we discussed a potential generalization to the results portion of the schema. The big question that came out of this was whether or not we should keep the result description for the ID of peptides from MS/MS spectra as it was by midday Friday, or whether it made sense to restructure this so that it followed the more general structure for results that we would use for many other things, including protein inference from peptide IDs. I agreed to outline the various use cases and try to lay out the issues. I had hoped to send this out by Monday, but it's taken a lot longer than planned. Apologies for being a day late, but I hope you'll see that a lot of thought went into this. There are two documents. Please look at "AnalysisXML Results Design Question.ppt" first. This lays out the specific schema change question we face. One of the biggest concerns about this proposed change was that it was not immediately obvious to any of us last Friday whether this was a substantial restructuring or essentially a renaming process. As you'll see in the slide showing the alignment, I now believe that the change is largely a renaming process and not a large change. The only real change is the insertion of one additional level, but I can image a way around doing this. In fact, I think that the reason for inserting this level is not specific to the question of the schema change, rather it's simply making up for something that was missing in the original model. There needs to be a way of having things that are attributes of the overall identification rather than an individual identification hypothesis - for example, the probability that at least one of the identification hypotheses (hits/matches) is correct for the spectrum. Assuming we agree that this is true, I think there is zero difference in the schema other than using more generic names, and my opinion is that we should really make this change. The second document, "AnalysisXML Results Use Cases.ppt" tries to capture a lot of more specific use cases that demonstrate why the proposed schema change may be the right thing to do. I've done this using 'pseudo instance documents' which are explained in the slides. I hope this is a useful communication mechanism, and may have some use for documentation as well. If no one finds them useful, no big deal - I was just trying to find a way to communicate clearly. Please excuse inaccuracies in the details of some of the use cases. I was trying to assess whether or not the constant AnalysisResult frame was robust to a large number of variations. I think you'll see that it is, and it's really not clear to my why we should have a special case of element names for the ID of peptides from MS/MS spectra. The only good reason I can see for it is that it's what we already had drawn up in the schema. Please feel free to add, modify, or correct any of this as you see fit! Sean _____ ---------------------------------------- --------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673; 13503038;p?http://java.sun.com/javaone _____ ________________________________________ _______ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/list info/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: Simon H. <sim...@ma...> - 2008-05-01 16:35:53
|
David's XML speak is very useful, at least for me, to help understand the model and associated issues. Strictly, should the "ref" attribute in the <SomeTagTBD> bit be "pep_match_x1" rather than "pep_match_x". (as below) to refer back to the earlier <IdentificationHypothesis id="pep_match_x1" ref="peptide1_in_molecule_table"> ? <AnalyteDetectionResultSet type=Protein_inferencing> <AnalyteDetectionResult id="protein_group_1"> <IdentificationResult> <SomeTagTBD id="PP" ref="pep_match_x1"> <pf:cvParam startpos = 23> <pf:cvParam endpos = 29> <SomeTagTBD /> Also, if we have the cvParams for protein groups such as "startpos" and "endpos" (as shown above) there could be problems since they are protein (and not protein group) specific. For example, a protein group contains two versions of a protein, one with and one without the signal peptide. So any matching peptide (outside of the signal peptide) will have different starts in the two isoforms, but WILL match both proteins (and hence the group). As far as protein inference goes, one can't tell the two proteins apart and hence a protein group is important. Is this an issue (ie. where we place cvParams, if at all)? -Simon- David Creasy wrote: > Hi Sean, > > Thanks very much - must have taken quite a while and is very useful. One > thing that may not be obvious to others is where the the > <SpectrumIdentificationResultSet> comes from. I believe that this was > just a 'rename' of PolypeptideResultSet made by the sub group that you > were in at Toledo. > > As we've usefully discussed, finding a way to communicate effectively is > an issue. So, to make 100% sure I've understood I'll talk back to you in > XML :) > > This is a cut down of an example for an ms-ms search of a single > spectrum with peptide results and protein inferencing. The protein > inferencing (impossibly - 'cos just one peptide!) has a couple of > similar proteins in the first group, and one in the second group. > > <pf:DataCollection> > <AnalyteDetectionResultSet type=MS_MS_peptide_matches> > <AnalyteDetectionResult> > <IdentificationResult> > <SpectrumElement spectrumID="9" spectraDataInputRef_ref="file.1"/> > <IdentificationHypothesis id="pep_match_x1" > ref="peptide1_in_molecule_table"> > <pf:cvParam accession="PI:99999" name="score" value="62" /> > </IdentificationHypothesis> > <IdentificationHypothesis id="pep_match_x2" > ref="peptide2_in_molecule_table"> > <!-- A poorer match to same spectrum as "pep_match_x1" !> > <pf:cvParam accession="PI:99999" name="score" value="12" /> > </IdentificationHypothesis> > </IdentificationResult> > </AnalyteDetectionResult> > </AnalyteDetectionResultSet> > > <AnalyteDetectionResultSet type=Protein_inferencing> > <AnalyteDetectionResult id="protein_group_1"> > <IdentificationResult> > <SomeTagTBD id="PP" ref="pep_match_x"> > <pf:cvParam startpos = 23> > <pf:cvParam endpos = 29> > <SomeTagTBD /> > <IdentificationHypothesis id="TRYP_PIG" > ref="protein1_in_molecule_table"> > <pf:cvParam accession="PI:99999" name="score" value="162" /> > </IdentificationHypothesis> > <IdentificationHypothesis id="TRYP_BOV" > ref="protein2_in_molecule_table"> > <pf:cvParam accession="PI:99999" name="score" value="162" /> > </IdentificationHypothesis> > </IdentificationResult> > <IdentificationResult> # nothing doing here ? [SJH] > </IdentificationResult> # > </AnalyteDetectionResult> > </AnalyteDetectionResultSet> > <AnalyteDetectionResult id="protein_group_2"> > <IdentificationResult> > <SomeTagTBD id="PP" ref="pep_match_y"> > <pf:cvParam startpos = 123> > <pf:cvParam endpos = 129> > <SomeTagTBD /> > <IdentificationHypothesis id="DODGY" > ref="protein99_in_molecule_table"> > <pf:cvParam accession="PI:99999" name="score" value="1" /> > </IdentificationHypothesis> > </IdentificationResult> > <IdentificationResult> > </IdentificationResult> > </AnalyteDetectionResult> > </AnalyteDetectionResultSet> > </pf:DataCollection> > > Please correct where I haven't understood. > > Before, we had in peptide ID: > <PolypeptideResultItem identifier="1_1" > calculatedMassToCharge="670.86261" chargeState="2" > experimentalMassToCharge="671.9" polypeptideReference_ref="xxx"> > New proposal is that calculatedMassToCharge, chargeState and > experimentalMassToCharge are all just CV? > > Likewise, for protein inferencing, we had: > <_resultItems> > <RelationResultItem identifier="" start="160" end="171" > polypeptideReference_ref="1_1" post="K" pre="I"> > </RelationResultItem> > <RelationResultItem identifier="" start="57" end="71" > polypeptideReference_ref="3_1" post="K" pre="R"> > </RelationResultItem> > > But start, end, post and pre would now be CV? > btw, Luisa recommends that we don't make too many things like this CV... > Having been enthusiastic about the change, I think I'm now going off it > - partly because with all the extra CV, file sizes may well explode. > Please persuade me otherwise! > (btw, I've 'read but ignored' the quantitation suggestions based on > decisions in Toledo.) > > > One minor comment: > > Slide 6: ..., but the results are always about the result from the > user’s perspective – “What did I find and/or measure?”, rather than “How > did I account for all of the spectra?” > - Many users do want to try and account for all their spectra because > they believe that they are missing something useful. > > > David > > Sean L Seymour wrote: >> >> Hi all, >> >> After the wrap up Friday afternoon, the few remaining people in the PI >> group had a short meeting where we discussed a potential >> generalization to the results portion of the schema. The big question >> that came out of this was whether or not we should keep the result >> description for the ID of peptides from MS/MS spectra as it was by >> midday Friday, or whether it made sense to restructure this so that it >> followed the more general structure for results that we would use for >> many other things, including protein inference from peptide IDs. I >> agreed to outline the various use cases and try to lay out the issues. >> I had hoped to send this out by Monday, but it's taken a lot longer >> than planned. Apologies for being a day late, but I hope you'll see >> that a lot of thought went into this. >> >> There are two documents. Please look at "AnalysisXML Results Design >> Question.ppt" first. This lays out the specific schema change question >> we face. One of the biggest concerns about this proposed change was >> that it was not immediately obvious to any of us last Friday whether >> this was a substantial restructuring or essentially a renaming >> process. As you'll see in the slide showing the alignment, I now >> believe that the change is largely a renaming process and not a large >> change. The only real change is the insertion of one additional level, >> but I can image a way around doing this. In fact, I think that the >> reason for inserting this level is not specific to the question of the >> schema change, rather it's simply making up for something that was >> missing in the original model. There needs to be a way of having >> things that are attributes of the overall identification rather than >> an individual identification hypothesis - for example, the probability >> that at least one of the identification hypotheses (hits/matches) is >> correct for the spectrum. Assuming we agree that this is true, I think >> there is zero difference in the schema other than using more generic >> names, and my opinion is that we should really make this change. >> >> The second document, "AnalysisXML Results Use Cases.ppt" tries to >> capture a lot of more specific use cases that demonstrate why the >> proposed schema change may be the right thing to do. I've done this >> using 'pseudo instance documents' which are explained in the slides. I >> hope this is a useful communication mechanism, and may have some use >> for documentation as well. If no one finds them useful, no big deal - >> I was just trying to find a way to communicate clearly. Please excuse >> inaccuracies in the details of some of the use cases. I was trying to >> assess whether or not the constant AnalysisResult frame was robust to >> a large number of variations. I think you'll see that it is, and it's >> really not clear to my why we should have a special case of element >> names for the ID of peptides from MS/MS spectra. The only good reason >> I can see for it is that it's what we already had drawn up in the schema. >> >> Please feel free to add, modify, or correct any of this as you see fit! >> >> Sean >> >> >> ------------------------------------------------------------------------ >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference >> Don't miss this year's exciting event. There's still time to save $100. >> Use priority code J8TL2D2. >> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> > > -- > David Creasy > Matrix Science > 64 Baker Street > London W1U 7GB, UK > Tel: +44 (0)20 7486 1050 > Fax: +44 (0)20 7224 1344 > > dc...@ma... > http://www.matrixscience.com > > Matrix Science Ltd. is registered in England and Wales > Company number 3533898 > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > > > ------------------------------------------------------------------------ > > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- _______________________________________________________________ Dr. Simon Hubbard, Reader in Bioinformatics Faculty of Life Sciences, The University of Manchester, Michael Smith Building, Manchester M13 9PT mailto:Sim...@ma... http://www.ls.manchester.ac.uk/people/profile/index.asp?id=2524 TEL: +44 (0)161 306 8930 FAX: +44 (0)161 275 5082 |
From: Pierre-Alain B. <pie...@is...> - 2008-05-02 09:18:16
|
Hi all, let allow me to join the discussion (again). Simon, agree on the start - end on the principle. Let me refer to the extended fasta format we are putting in place. There, sequences are split in the case of splicing variants, but processing events (and mutations) are annotations of a single entry. Therefore, if the tools do not split the sequences in separate entries, the start and end would not change. If they split, the accession code will change and the start and end refer to two entries, as if they were originating from different genes for instance. Sean, nice exercise. Probably viable for ID. Maybe I missed something, but in all cases where the quant is made across more than one search, what is the mechanism to unify them in one document? (Label free usecases as well as multiple silac runs for instance). Same is true when concatenating ID results (How do you report a Scaffold output?). I have difficulties to include the quant in the id result section. I see issues to report global results on a quant analysis (global normalisation functions and outcomes, for instance) and as you already make an "exception" to the isobaric tag approach, how do you cope with 18O labelling when you want to use both survey scan information and data retzrieved from MS/MS spectra? Just use cases for you to consider. tiny comments: - All elements you name xxxxSet in AnalysisML are xxxxList in mzML. Would you mind using the same semantic for consistency purpose? - agree to put all calculatedMassToCharge, chargeState and experimentalMassToCharge into CV (looks similar to mzML also then). In mzML, we have a lot of terms in CV, and these would fall into as well. And a question: to David's xml answer in the IdentificationResult element, you described pf:cvParam ... Is this a namespace (also at the root of the DataCollection element) to refer to a vendor-specific CVparam or do you intent to use userParams? I do not get this. In mzML, if I'm right, we can refer to more than one CVs, that are recognised via different prefixes (PI:99999 vs MS:10000x). And there is the possibility to define userParams. Pierre-Alain Simon Hubbard wrote: > David's XML speak is very useful, at least for me, to help understand > the model and associated issues. Strictly, should the "ref" attribute > in the <SomeTagTBD> bit be "pep_match_x1" rather than "pep_match_x". > (as below) to refer back to the earlier <IdentificationHypothesis > id="pep_match_x1" ref="peptide1_in_molecule_table"> ? > > <AnalyteDetectionResultSet type=Protein_inferencing> > <AnalyteDetectionResult id="protein_group_1"> > <IdentificationResult> > <SomeTagTBD id="PP" ref="pep_match_x1"> > <pf:cvParam startpos = 23> > <pf:cvParam endpos = 29> > <SomeTagTBD /> > > Also, if we have the cvParams for protein groups > such as "startpos" and "endpos" (as shown above) there could > be problems since they are protein (and not protein group) > specific. For example, a protein group contains two versions of > a protein, one with and one without the signal peptide. So any > matching peptide (outside of the signal peptide) will have > different starts in the two isoforms, but WILL match both > proteins (and hence the group). As far as protein inference goes, > one can't tell the two proteins apart and hence a protein group > is important. Is this an issue (ie. where we place cvParams, > if at all)? > > -Simon- > > David Creasy wrote: > >> Hi Sean, >> >> Thanks very much - must have taken quite a while and is very useful. One >> thing that may not be obvious to others is where the the >> <SpectrumIdentificationResultSet> comes from. I believe that this was >> just a 'rename' of PolypeptideResultSet made by the sub group that you >> were in at Toledo. >> >> As we've usefully discussed, finding a way to communicate effectively is >> an issue. So, to make 100% sure I've understood I'll talk back to you in >> XML :) >> >> This is a cut down of an example for an ms-ms search of a single >> spectrum with peptide results and protein inferencing. The protein >> inferencing (impossibly - 'cos just one peptide!) has a couple of >> similar proteins in the first group, and one in the second group. >> >> <pf:DataCollection> >> <AnalyteDetectionResultSet type=MS_MS_peptide_matches> >> <AnalyteDetectionResult> >> <IdentificationResult> >> <SpectrumElement spectrumID="9" spectraDataInputRef_ref="file.1"/> >> <IdentificationHypothesis id="pep_match_x1" >> ref="peptide1_in_molecule_table"> >> <pf:cvParam accession="PI:99999" name="score" value="62" /> >> </IdentificationHypothesis> >> <IdentificationHypothesis id="pep_match_x2" >> ref="peptide2_in_molecule_table"> >> <!-- A poorer match to same spectrum as "pep_match_x1" !> >> <pf:cvParam accession="PI:99999" name="score" value="12" /> >> </IdentificationHypothesis> >> </IdentificationResult> >> </AnalyteDetectionResult> >> </AnalyteDetectionResultSet> >> >> <AnalyteDetectionResultSet type=Protein_inferencing> >> <AnalyteDetectionResult id="protein_group_1"> >> <IdentificationResult> >> <SomeTagTBD id="PP" ref="pep_match_x"> >> <pf:cvParam startpos = 23> >> <pf:cvParam endpos = 29> >> <SomeTagTBD /> >> <IdentificationHypothesis id="TRYP_PIG" >> ref="protein1_in_molecule_table"> >> <pf:cvParam accession="PI:99999" name="score" value="162" /> >> </IdentificationHypothesis> >> <IdentificationHypothesis id="TRYP_BOV" >> ref="protein2_in_molecule_table"> >> <pf:cvParam accession="PI:99999" name="score" value="162" /> >> </IdentificationHypothesis> >> </IdentificationResult> >> <IdentificationResult> # nothing doing here ? [SJH] >> </IdentificationResult> # >> </AnalyteDetectionResult> >> </AnalyteDetectionResultSet> >> <AnalyteDetectionResult id="protein_group_2"> >> <IdentificationResult> >> <SomeTagTBD id="PP" ref="pep_match_y"> >> <pf:cvParam startpos = 123> >> <pf:cvParam endpos = 129> >> <SomeTagTBD /> >> <IdentificationHypothesis id="DODGY" >> ref="protein99_in_molecule_table"> >> <pf:cvParam accession="PI:99999" name="score" value="1" /> >> </IdentificationHypothesis> >> </IdentificationResult> >> <IdentificationResult> >> </IdentificationResult> >> </AnalyteDetectionResult> >> </AnalyteDetectionResultSet> >> </pf:DataCollection> >> >> Please correct where I haven't understood. >> >> Before, we had in peptide ID: >> <PolypeptideResultItem identifier="1_1" >> calculatedMassToCharge="670.86261" chargeState="2" >> experimentalMassToCharge="671.9" polypeptideReference_ref="xxx"> >> New proposal is that calculatedMassToCharge, chargeState and >> experimentalMassToCharge are all just CV? >> >> Likewise, for protein inferencing, we had: >> <_resultItems> >> <RelationResultItem identifier="" start="160" end="171" >> polypeptideReference_ref="1_1" post="K" pre="I"> >> </RelationResultItem> >> <RelationResultItem identifier="" start="57" end="71" >> polypeptideReference_ref="3_1" post="K" pre="R"> >> </RelationResultItem> >> >> But start, end, post and pre would now be CV? >> btw, Luisa recommends that we don't make too many things like this CV... >> Having been enthusiastic about the change, I think I'm now going off it >> - partly because with all the extra CV, file sizes may well explode. >> Please persuade me otherwise! >> (btw, I've 'read but ignored' the quantitation suggestions based on >> decisions in Toledo.) >> >> >> One minor comment: >> >> Slide 6: ..., but the results are always about the result from the >> user’s perspective – “What did I find and/or measure?”, rather than “How >> did I account for all of the spectra?” >> - Many users do want to try and account for all their spectra because >> they believe that they are missing something useful. >> >> >> David >> >> Sean L Seymour wrote: >> >>> Hi all, >>> >>> After the wrap up Friday afternoon, the few remaining people in the PI >>> group had a short meeting where we discussed a potential >>> generalization to the results portion of the schema. The big question >>> that came out of this was whether or not we should keep the result >>> description for the ID of peptides from MS/MS spectra as it was by >>> midday Friday, or whether it made sense to restructure this so that it >>> followed the more general structure for results that we would use for >>> many other things, including protein inference from peptide IDs. I >>> agreed to outline the various use cases and try to lay out the issues. >>> I had hoped to send this out by Monday, but it's taken a lot longer >>> than planned. Apologies for being a day late, but I hope you'll see >>> that a lot of thought went into this. >>> >>> There are two documents. Please look at "AnalysisXML Results Design >>> Question.ppt" first. This lays out the specific schema change question >>> we face. One of the biggest concerns about this proposed change was >>> that it was not immediately obvious to any of us last Friday whether >>> this was a substantial restructuring or essentially a renaming >>> process. As you'll see in the slide showing the alignment, I now >>> believe that the change is largely a renaming process and not a large >>> change. The only real change is the insertion of one additional level, >>> but I can image a way around doing this. In fact, I think that the >>> reason for inserting this level is not specific to the question of the >>> schema change, rather it's simply making up for something that was >>> missing in the original model. There needs to be a way of having >>> things that are attributes of the overall identification rather than >>> an individual identification hypothesis - for example, the probability >>> that at least one of the identification hypotheses (hits/matches) is >>> correct for the spectrum. Assuming we agree that this is true, I think >>> there is zero difference in the schema other than using more generic >>> names, and my opinion is that we should really make this change. >>> >>> The second document, "AnalysisXML Results Use Cases.ppt" tries to >>> capture a lot of more specific use cases that demonstrate why the >>> proposed schema change may be the right thing to do. I've done this >>> using 'pseudo instance documents' which are explained in the slides. I >>> hope this is a useful communication mechanism, and may have some use >>> for documentation as well. If no one finds them useful, no big deal - >>> I was just trying to find a way to communicate clearly. Please excuse >>> inaccuracies in the details of some of the use cases. I was trying to >>> assess whether or not the constant AnalysisResult frame was robust to >>> a large number of variations. I think you'll see that it is, and it's >>> really not clear to my why we should have a special case of element >>> names for the ID of peptides from MS/MS spectra. The only good reason >>> I can see for it is that it's what we already had drawn up in the schema. >>> >>> Please feel free to add, modify, or correct any of this as you see fit! >>> >>> Sean >>> >>> >>> ------------------------------------------------------------------------ >>> >>> ------------------------------------------------------------------------- >>> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference >>> Don't miss this year's exciting event. There's still time to save $100. >>> Use priority code J8TL2D2. >>> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> >> -- >> David Creasy >> Matrix Science >> 64 Baker Street >> London W1U 7GB, UK >> Tel: +44 (0)20 7486 1050 >> Fax: +44 (0)20 7224 1344 >> >> dc...@ma... >> http://www.matrixscience.com >> >> Matrix Science Ltd. is registered in England and Wales >> Company number 3533898 >> >> >> ------------------------------------------------------------------------ >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference >> Don't miss this year's exciting event. There's still time to save $100. >> Use priority code J8TL2D2. >> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> > > |
From: Jones, A. <And...@li...> - 2008-05-02 09:39:08
|
>in the IdentificationResult element, you described pf:cvParam ... Is this a namespace (also at the root of the DataCollection element) to refer to a vendor-specific CVparam or do you intent to use userParams? I do not get this. In mzML, if I'm right, we can refer to more than one CVs, that are recognised via different prefixes (PI:99999 vs MS:10000x). And there is the possibility to define userParams. For consistency in analysisXML, I’ve put CVParam into the inherited “FuGE light” schema. “pf:” is the proposed namespace for the FuGE light schema. I am waiting for a final decision on userParams and cvParam groups from the mzML working group since it sounded like there were still a few issues to resolve. I can add the current mzML CV/user param part to the schema whenever it’s required, Cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of Pierre-Alain Binz Sent: 02 May 2008 10:16 To: sim...@ma... Cc: psi...@li... Subject: Re: [Psidev-pi-dev] Results schema critical design question from Friday afternoon in Toledo Hi all, let allow me to join the discussion (again). Simon, agree on the start - end on the principle. Let me refer to the extended fasta format we are putting in place. There, sequences are split in the case of splicing variants, but processing events (and mutations) are annotations of a single entry. Therefore, if the tools do not split the sequences in separate entries, the start and end would not change. If they split, the accession code will change and the start and end refer to two entries, as if they were originating from different genes for instance. Sean, nice exercise. Probably viable for ID. Maybe I missed something, but in all cases where the quant is made across more than one search, what is the mechanism to unify them in one document? (Label free usecases as well as multiple silac runs for instance). Same is true when concatenating ID results (How do you report a Scaffold output?). I have difficulties to include the quant in the id result section. I see issues to report global results on a quant analysis (global normalisation functions and outcomes, for instance) and as you already make an "exception" to the isobaric tag approach, how do you cope with 18O labelling when you want to use both survey scan information and data retzrieved from MS/MS spectra? Just use cases for you to consider. tiny comments: - All elements you name xxxxSet in AnalysisML are xxxxList in mzML. Would you mind using the same semantic for consistency purpose? - agree to put all calculatedMassToCharge, chargeState and experimentalMassToCharge into CV (looks similar to mzML also then). In mzML, we have a lot of terms in CV, and these would fall into as well. And a question: to David's xml answer in the IdentificationResult element, you described pf:cvParam ... Is this a namespace (also at the root of the DataCollection element) to refer to a vendor-specific CVparam or do you intent to use userParams? I do not get this. In mzML, if I'm right, we can refer to more than one CVs, that are recognised via different prefixes (PI:99999 vs MS:10000x). And there is the possibility to define userParams. Pierre-Alain Simon Hubbard wrote: David's XML speak is very useful, at least for me, to help understand the model and associated issues. Strictly, should the "ref" attribute in the <SomeTagTBD> bit be "pep_match_x1" rather than "pep_match_x". (as below) to refer back to the earlier <IdentificationHypothesis id="pep_match_x1" ref="peptide1_in_molecule_table"> ? <AnalyteDetectionResultSet type=Protein_inferencing> <AnalyteDetectionResult id="protein_group_1"> <IdentificationResult> <SomeTagTBD id="PP" ref="pep_match_x1"> <pf:cvParam startpos = 23> <pf:cvParam endpos = 29> <SomeTagTBD /> Also, if we have the cvParams for protein groups such as "startpos" and "endpos" (as shown above) there could be problems since they are protein (and not protein group) specific. For example, a protein group contains two versions of a protein, one with and one without the signal peptide. So any matching peptide (outside of the signal peptide) will have different starts in the two isoforms, but WILL match both proteins (and hence the group). As far as protein inference goes, one can't tell the two proteins apart and hence a protein group is important. Is this an issue (ie. where we place cvParams, if at all)? -Simon- David Creasy wrote: Hi Sean, Thanks very much - must have taken quite a while and is very useful. One thing that may not be obvious to others is where the the <SpectrumIdentificationResultSet> comes from. I believe that this was just a 'rename' of PolypeptideResultSet made by the sub group that you were in at Toledo. As we've usefully discussed, finding a way to communicate effectively is an issue. So, to make 100% sure I've understood I'll talk back to you in XML :) This is a cut down of an example for an ms-ms search of a single spectrum with peptide results and protein inferencing. The protein inferencing (impossibly - 'cos just one peptide!) has a couple of similar proteins in the first group, and one in the second group. <pf:DataCollection> <AnalyteDetectionResultSet type=MS_MS_peptide_matches> <AnalyteDetectionResult> <IdentificationResult> <SpectrumElement spectrumID="9" spectraDataInputRef_ref="file.1"/> <IdentificationHypothesis id="pep_match_x1" ref="peptide1_in_molecule_table"> <pf:cvParam accession="PI:99999" name="score" value="62" /> </IdentificationHypothesis> <IdentificationHypothesis id="pep_match_x2" ref="peptide2_in_molecule_table"> <!-- A poorer match to same spectrum as "pep_match_x1" !> <pf:cvParam accession="PI:99999" name="score" value="12" /> </IdentificationHypothesis> </IdentificationResult> </AnalyteDetectionResult> </AnalyteDetectionResultSet> <AnalyteDetectionResultSet type=Protein_inferencing> <AnalyteDetectionResult id="protein_group_1"> <IdentificationResult> <SomeTagTBD id="PP" ref="pep_match_x"> <pf:cvParam startpos = 23> <pf:cvParam endpos = 29> <SomeTagTBD /> <IdentificationHypothesis id="TRYP_PIG" ref="protein1_in_molecule_table"> <pf:cvParam accession="PI:99999" name="score" value="162" /> </IdentificationHypothesis> <IdentificationHypothesis id="TRYP_BOV" ref="protein2_in_molecule_table"> <pf:cvParam accession="PI:99999" name="score" value="162" /> </IdentificationHypothesis> </IdentificationResult> <IdentificationResult> # nothing doing here ? [SJH] </IdentificationResult> # </AnalyteDetectionResult> </AnalyteDetectionResultSet> <AnalyteDetectionResult id="protein_group_2"> <IdentificationResult> <SomeTagTBD id="PP" ref="pep_match_y"> <pf:cvParam startpos = 123> <pf:cvParam endpos = 129> <SomeTagTBD /> <IdentificationHypothesis id="DODGY" ref="protein99_in_molecule_table"> <pf:cvParam accession="PI:99999" name="score" value="1" /> </IdentificationHypothesis> </IdentificationResult> <IdentificationResult> </IdentificationResult> </AnalyteDetectionResult> </AnalyteDetectionResultSet> </pf:DataCollection> Please correct where I haven't understood. Before, we had in peptide ID: <PolypeptideResultItem identifier="1_1" calculatedMassToCharge="670.86261" chargeState="2" experimentalMassToCharge="671.9" polypeptideReference_ref="xxx"> New proposal is that calculatedMassToCharge, chargeState and experimentalMassToCharge are all just CV? Likewise, for protein inferencing, we had: <_resultItems> <RelationResultItem identifier="" start="160" end="171" polypeptideReference_ref="1_1" post="K" pre="I"> </RelationResultItem> <RelationResultItem identifier="" start="57" end="71" polypeptideReference_ref="3_1" post="K" pre="R"> </RelationResultItem> But start, end, post and pre would now be CV? btw, Luisa recommends that we don't make too many things like this CV... Having been enthusiastic about the change, I think I'm now going off it - partly because with all the extra CV, file sizes may well explode. Please persuade me otherwise! (btw, I've 'read but ignored' the quantitation suggestions based on decisions in Toledo.) One minor comment: Slide 6: ..., but the results are always about the result from the user’s perspective – “What did I find and/or measure?”, rather than “How did I account for all of the spectra?” - Many users do want to try and account for all their spectra because they believe that they are missing something useful. David Sean L Seymour wrote: Hi all, After the wrap up Friday afternoon, the few remaining people in the PI group had a short meeting where we discussed a potential generalization to the results portion of the schema. The big question that came out of this was whether or not we should keep the result description for the ID of peptides from MS/MS spectra as it was by midday Friday, or whether it made sense to restructure this so that it followed the more general structure for results that we would use for many other things, including protein inference from peptide IDs. I agreed to outline the various use cases and try to lay out the issues. I had hoped to send this out by Monday, but it's taken a lot longer than planned. Apologies for being a day late, but I hope you'll see that a lot of thought went into this. There are two documents. Please look at "AnalysisXML Results Design Question.ppt" first. This lays out the specific schema change question we face. One of the biggest concerns about this proposed change was that it was not immediately obvious to any of us last Friday whether this was a substantial restructuring or essentially a renaming process. As you'll see in the slide showing the alignment, I now believe that the change is largely a renaming process and not a large change. The only real change is the insertion of one additional level, but I can image a way around doing this. In fact, I think that the reason for inserting this level is not specific to the question of the schema change, rather it's simply making up for something that was missing in the original model. There needs to be a way of having things that are attributes of the overall identification rather than an individual identification hypothesis - for example, the probability that at least one of the identification hypotheses (hits/matches) is correct for the spectrum. Assuming we agree that this is true, I think there is zero difference in the schema other than using more generic names, and my opinion is that we should really make this change. The second document, "AnalysisXML Results Use Cases.ppt" tries to capture a lot of more specific use cases that demonstrate why the proposed schema change may be the right thing to do. I've done this using 'pseudo instance documents' which are explained in the slides. I hope this is a useful communication mechanism, and may have some use for documentation as well. If no one finds them useful, no big deal - I was just trying to find a way to communicate clearly. Please excuse inaccuracies in the details of some of the use cases. I was trying to assess whether or not the constant AnalysisResult frame was robust to a large number of variations. I think you'll see that it is, and it's really not clear to my why we should have a special case of element names for the ID of peptides from MS/MS spectra. The only good reason I can see for it is that it's what we already had drawn up in the schema. Please feel free to add, modify, or correct any of this as you see fit! Sean ------------------------------------------------------------------------ ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ------------------------------------------------------------------------ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ------------------------------------------------------------------------ ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ------------------------------------------------------------------------ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev |
From: David C. <dc...@ma...> - 2008-05-02 10:03:44
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=windows-1252" http-equiv="Content-Type"> <title></title> </head> <body bgcolor="#ffffff" text="#000000"> Hi Pierre-Alain,<br> <br> Pierre-Alain Binz wrote: <blockquote cite="mid:481...@is..." type="cite"> <meta content="text/html;charset=windows-1252" http-equiv="Content-Type"> Hi all,<br> let allow me to join the discussion (again).<br> </blockquote> Yes please!<br> <blockquote cite="mid:481...@is..." type="cite"><br> Simon, agree on the start - end on the principle. Let me refer to the extended fasta format we are putting in place. There, sequences are split in the case of splicing variants, but processing events (and mutations) are annotations of a single entry. Therefore, if the tools do not split the sequences in separate entries, the start and end would not change. If they split, the accession code will change and the start and end refer to two entries, as if they were originating from different genes for instance.<br> <br> Sean, nice exercise. Probably viable for ID. <br> Maybe I missed something, but in all cases where the quant is made across more than one search, what is the mechanism to unify them in one document? (Label free usecases as well as multiple silac runs for instance). Same is true when concatenating ID results (How do you report a Scaffold output?).<br> I have difficulties to include the quant in the id result section. I see issues to report global results on a quant analysis (global normalisation functions and outcomes, for instance) and as you already make an "exception" to the isobaric tag approach, how do you cope with 18O labelling when you want to use both survey scan information and data retzrieved from MS/MS spectra? Just use cases for you to consider. <br> </blockquote> Gentle reminder that we have agreed: "Defer quantitation to v2. Should fit into existing framework. Attempt to guarantee back compatible. Work in parallel to produce a proposal."<br> <blockquote cite="mid:481...@is..." type="cite"><br> <br> tiny comments:<br> - All elements you name xxxxSet in AnalysisML are xxxxList in mzML. Would you mind using the same semantic for consistency purpose?<br> </blockquote> Sounds reasonable to me<br> <blockquote cite="mid:481...@is..." type="cite">- agree to put all calculatedMassToCharge, chargeState and experimentalMassToCharge into CV (looks similar to mzML also then). In mzML, we have a lot of terms in CV, and these would fall into as well.<br> </blockquote> I'd rather that these were all attributes...<br> <br> David<br> <br> </body> </html> |
From: Martin E. <mar...@ru...> - 2008-05-02 11:31:06
|
Hi all, thanks, Sean, for the very complete slides! Some of the consequences of that solution (without assessing or valuing them): - The schema documentation will be not self-contained, but as general as the solution itself - Attributes have to move to CV terms - Semantic validation is more laborious; we have to infer the ResultType from the AnalysisRef to validate the correctness In my personal opinion I want to have "special" result sections (syntax validation possible!) at least for the "SpectrumIdentification", "ProteinDetection" and "QualityAssessment" analysis types, but I can imagine to have only "SpectrumIdentificationResultSet" (with your or Alexandres changes) and your solution as "generic results", containing all others. General: I will add an issue to the googlecode project page with this discussion. Further: Closely related to this discussion is the one, which "special" analysis types we need (see http://code.google.com/p/psi-pi/issues/d etail?id=10). If we make both sections (analyses and results) "generic", we need a special "AnalysisType" attribute (or CV), to be able to validate semantically or to parse meaningful information). Bye Martin Von: psi...@li...urceforge. net [mailto:psi...@li...ur ceforge.net] Im Auftrag von David Creasy Gesendet: Friday, May 02, 2008 12:04 PM An: Pierre-Alain Binz Cc: psi...@li... Betreff: Re: [Psidev-pi-dev] Results schema critical design question from Friday afternoon in Toledo Hi Pierre-Alain, Pierre-Alain Binz wrote: Hi all, let allow me to join the discussion (again). Yes please! Simon, agree on the start - end on the principle. Let me refer to the extended fasta format we are putting in place. There, sequences are split in the case of splicing variants, but processing events (and mutations) are annotations of a single entry. Therefore, if the tools do not split the sequences in separate entries, the start and end would not change. If they split, the accession code will change and the start and end refer to two entries, as if they were originating from different genes for instance. Sean, nice exercise. Probably viable for ID. Maybe I missed something, but in all cases where the quant is made across more than one search, what is the mechanism to unify them in one document? (Label free usecases as well as multiple silac runs for instance). Same is true when concatenating ID results (How do you report a Scaffold output?). I have difficulties to include the quant in the id result section. I see issues to report global results on a quant analysis (global normalisation functions and outcomes, for instance) and as you already make an "exception" to the isobaric tag approach, how do you cope with 18O labelling when you want to use both survey scan information and data retzrieved from MS/MS spectra? Just use cases for you to consider. Gentle reminder that we have agreed: "Defer quantitation to v2. Should fit into existing framework. Attempt to guarantee back compatible. Work in parallel to produce a proposal." tiny comments: - All elements you name xxxxSet in AnalysisML are xxxxList in mzML. Would you mind using the same semantic for consistency purpose? Sounds reasonable to me - agree to put all calculatedMassToCharge, chargeState and experimentalMassToCharge into CV (looks similar to mzML also then). In mzML, we have a lot of terms in CV, and these would fall into as well. I'd rather that these were all attributes... David |
From: David C. <dc...@ma...> - 2008-05-02 09:45:49
|
Hi Simon, Simon Hubbard wrote: > David's XML speak is very useful, at least for me, to help understand > the model and associated issues. Strictly, should the "ref" attribute > in the <SomeTagTBD> bit be "pep_match_x1" rather than "pep_match_x". > (as below) to refer back to the earlier <IdentificationHypothesis > id="pep_match_x1" ref="peptide1_in_molecule_table"> ? Yes - it is a typo > > <AnalyteDetectionResultSet type=Protein_inferencing> > <AnalyteDetectionResult id="protein_group_1"> > <IdentificationResult> > <SomeTagTBD id="PP" ref="pep_match_x1"> > <pf:cvParam startpos = 23> > <pf:cvParam endpos = 29> > <SomeTagTBD /> > > Also, if we have the cvParams for protein groups > such as "startpos" and "endpos" (as shown above) there could > be problems since they are protein (and not protein group) > specific. For example, a protein group contains two versions of > a protein, one with and one without the signal peptide. So any > matching peptide (outside of the signal peptide) will have > different starts in the two isoforms, but WILL match both > proteins (and hence the group). As far as protein inference goes, > one can't tell the two proteins apart and hence a protein group > is important. Is this an issue (ie. where we place cvParams, > if at all)? Yes, you are correct. The <SomeTagTBD> sections should be inside the <IdentificationHypothesis> David > > -Simon- > > David Creasy wrote: >> Hi Sean, >> >> Thanks very much - must have taken quite a while and is very useful. One >> thing that may not be obvious to others is where the the >> <SpectrumIdentificationResultSet> comes from. I believe that this was >> just a 'rename' of PolypeptideResultSet made by the sub group that you >> were in at Toledo. >> >> As we've usefully discussed, finding a way to communicate effectively is >> an issue. So, to make 100% sure I've understood I'll talk back to you in >> XML :) >> >> This is a cut down of an example for an ms-ms search of a single >> spectrum with peptide results and protein inferencing. The protein >> inferencing (impossibly - 'cos just one peptide!) has a couple of >> similar proteins in the first group, and one in the second group. >> >> <pf:DataCollection> >> <AnalyteDetectionResultSet type=MS_MS_peptide_matches> >> <AnalyteDetectionResult> >> <IdentificationResult> >> <SpectrumElement spectrumID="9" spectraDataInputRef_ref="file.1"/> >> <IdentificationHypothesis id="pep_match_x1" >> ref="peptide1_in_molecule_table"> >> <pf:cvParam accession="PI:99999" name="score" value="62" /> >> </IdentificationHypothesis> >> <IdentificationHypothesis id="pep_match_x2" >> ref="peptide2_in_molecule_table"> >> <!-- A poorer match to same spectrum as "pep_match_x1" !> >> <pf:cvParam accession="PI:99999" name="score" value="12" /> >> </IdentificationHypothesis> >> </IdentificationResult> >> </AnalyteDetectionResult> >> </AnalyteDetectionResultSet> >> >> <AnalyteDetectionResultSet type=Protein_inferencing> >> <AnalyteDetectionResult id="protein_group_1"> >> <IdentificationResult> >> <SomeTagTBD id="PP" ref="pep_match_x"> >> <pf:cvParam startpos = 23> >> <pf:cvParam endpos = 29> >> <SomeTagTBD /> >> <IdentificationHypothesis id="TRYP_PIG" >> ref="protein1_in_molecule_table"> >> <pf:cvParam accession="PI:99999" name="score" value="162" /> >> </IdentificationHypothesis> >> <IdentificationHypothesis id="TRYP_BOV" >> ref="protein2_in_molecule_table"> >> <pf:cvParam accession="PI:99999" name="score" value="162" /> >> </IdentificationHypothesis> >> </IdentificationResult> >> <IdentificationResult> # nothing doing here ? [SJH] >> </IdentificationResult> # >> </AnalyteDetectionResult> >> </AnalyteDetectionResultSet> >> <AnalyteDetectionResult id="protein_group_2"> >> <IdentificationResult> >> <SomeTagTBD id="PP" ref="pep_match_y"> >> <pf:cvParam startpos = 123> >> <pf:cvParam endpos = 129> >> <SomeTagTBD /> >> <IdentificationHypothesis id="DODGY" >> ref="protein99_in_molecule_table"> >> <pf:cvParam accession="PI:99999" name="score" value="1" /> >> </IdentificationHypothesis> >> </IdentificationResult> >> <IdentificationResult> >> </IdentificationResult> >> </AnalyteDetectionResult> >> </AnalyteDetectionResultSet> >> </pf:DataCollection> >> >> Please correct where I haven't understood. >> >> Before, we had in peptide ID: >> <PolypeptideResultItem identifier="1_1" >> calculatedMassToCharge="670.86261" chargeState="2" >> experimentalMassToCharge="671.9" polypeptideReference_ref="xxx"> >> New proposal is that calculatedMassToCharge, chargeState and >> experimentalMassToCharge are all just CV? >> >> Likewise, for protein inferencing, we had: >> <_resultItems> >> <RelationResultItem identifier="" start="160" end="171" >> polypeptideReference_ref="1_1" post="K" pre="I"> >> </RelationResultItem> >> <RelationResultItem identifier="" start="57" end="71" >> polypeptideReference_ref="3_1" post="K" pre="R"> >> </RelationResultItem> >> >> But start, end, post and pre would now be CV? >> btw, Luisa recommends that we don't make too many things like this CV... >> Having been enthusiastic about the change, I think I'm now going off it >> - partly because with all the extra CV, file sizes may well explode. >> Please persuade me otherwise! >> (btw, I've 'read but ignored' the quantitation suggestions based on >> decisions in Toledo.) >> >> >> One minor comment: >> >> Slide 6: ..., but the results are always about the result from the >> user’s perspective – “What did I find and/or measure?”, rather than “How >> did I account for all of the spectra?” >> - Many users do want to try and account for all their spectra because >> they believe that they are missing something useful. >> >> >> David >> >> Sean L Seymour wrote: >>> Hi all, >>> >>> After the wrap up Friday afternoon, the few remaining people in the PI >>> group had a short meeting where we discussed a potential >>> generalization to the results portion of the schema. The big question >>> that came out of this was whether or not we should keep the result >>> description for the ID of peptides from MS/MS spectra as it was by >>> midday Friday, or whether it made sense to restructure this so that it >>> followed the more general structure for results that we would use for >>> many other things, including protein inference from peptide IDs. I >>> agreed to outline the various use cases and try to lay out the issues. >>> I had hoped to send this out by Monday, but it's taken a lot longer >>> than planned. Apologies for being a day late, but I hope you'll see >>> that a lot of thought went into this. >>> >>> There are two documents. Please look at "AnalysisXML Results Design >>> Question.ppt" first. This lays out the specific schema change question >>> we face. One of the biggest concerns about this proposed change was >>> that it was not immediately obvious to any of us last Friday whether >>> this was a substantial restructuring or essentially a renaming >>> process. As you'll see in the slide showing the alignment, I now >>> believe that the change is largely a renaming process and not a large >>> change. The only real change is the insertion of one additional level, >>> but I can image a way around doing this. In fact, I think that the >>> reason for inserting this level is not specific to the question of the >>> schema change, rather it's simply making up for something that was >>> missing in the original model. There needs to be a way of having >>> things that are attributes of the overall identification rather than >>> an individual identification hypothesis - for example, the probability >>> that at least one of the identification hypotheses (hits/matches) is >>> correct for the spectrum. Assuming we agree that this is true, I think >>> there is zero difference in the schema other than using more generic >>> names, and my opinion is that we should really make this change. >>> >>> The second document, "AnalysisXML Results Use Cases.ppt" tries to >>> capture a lot of more specific use cases that demonstrate why the >>> proposed schema change may be the right thing to do. I've done this >>> using 'pseudo instance documents' which are explained in the slides. I >>> hope this is a useful communication mechanism, and may have some use >>> for documentation as well. If no one finds them useful, no big deal - >>> I was just trying to find a way to communicate clearly. Please excuse >>> inaccuracies in the details of some of the use cases. I was trying to >>> assess whether or not the constant AnalysisResult frame was robust to >>> a large number of variations. I think you'll see that it is, and it's >>> really not clear to my why we should have a special case of element >>> names for the ID of peptides from MS/MS spectra. The only good reason >>> I can see for it is that it's what we already had drawn up in the schema. >>> >>> Please feel free to add, modify, or correct any of this as you see fit! >>> >>> Sean >>> >>> >>> ------------------------------------------------------------------------ >>> >>> ------------------------------------------------------------------------- >>> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference >>> Don't miss this year's exciting event. There's still time to save $100. >>> Use priority code J8TL2D2. >>> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >> -- >> David Creasy >> Matrix Science >> 64 Baker Street >> London W1U 7GB, UK >> Tel: +44 (0)20 7486 1050 >> Fax: +44 (0)20 7224 1344 >> >> dc...@ma... >> http://www.matrixscience.com >> >> Matrix Science Ltd. is registered in England and Wales >> Company number 3533898 >> >> >> ------------------------------------------------------------------------ >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference >> Don't miss this year's exciting event. There's still time to save $100. >> Use priority code J8TL2D2. >> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > |