Re: [Psidev-pi-dev] Results schema critical design question from Friday afternoon in Toledo

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

David's XML speak is very useful, at least for me, to help understand
the model and associated issues. Strictly, should the "ref" attribute
in the <SomeTagTBD> bit be "pep_match_x1" rather than "pep_match_x".
(as below) to refer back to the earlier <IdentificationHypothesis 
id="pep_match_x1" ref="peptide1_in_molecule_table"> ?

<AnalyteDetectionResultSet type=Protein_inferencing>
      <AnalyteDetectionResult id="protein_group_1">
        <IdentificationResult>
          <SomeTagTBD id="PP" ref="pep_match_x1">
            <pf:cvParam startpos = 23>
            <pf:cvParam endpos = 29>
          <SomeTagTBD />

Also, if we have the cvParams for protein groups
such as "startpos" and "endpos" (as shown above) there could
be problems since they are protein (and not protein group)
specific. For example, a protein group contains two versions of
a protein, one with and one without the signal peptide. So any
matching peptide (outside of the signal peptide) will have
different starts in the two isoforms, but WILL match both
proteins (and hence the group). As far as protein inference goes,
one can't tell the two proteins apart and hence a protein group
is important. Is this an issue (ie. where we place cvParams,
if at all)?

-Simon-

David Creasy wrote:
> Hi Sean,
> 
> Thanks very much - must have taken quite a while and is very useful. One 
> thing that may not be obvious to others is where the the 
> <SpectrumIdentificationResultSet> comes from. I believe that this was 
> just a 'rename' of PolypeptideResultSet made by the sub group that you 
> were in at Toledo.
> 
> As we've usefully discussed, finding a way to communicate effectively is 
> an issue. So, to make 100% sure I've understood I'll talk back to you in 
> XML :)
> 
> This is a cut down of an example for an ms-ms search of a single 
> spectrum with peptide results and protein inferencing. The protein 
> inferencing (impossibly - 'cos just one peptide!) has a couple of 
> similar proteins in the first group, and one in the second group.
> 
> <pf:DataCollection>
>   <AnalyteDetectionResultSet type=MS_MS_peptide_matches>
>     <AnalyteDetectionResult>
>       <IdentificationResult>
>         <SpectrumElement spectrumID="9" spectraDataInputRef_ref="file.1"/>
>         <IdentificationHypothesis id="pep_match_x1" 
> ref="peptide1_in_molecule_table">
>           <pf:cvParam accession="PI:99999" name="score" value="62" />
>         </IdentificationHypothesis>
>         <IdentificationHypothesis id="pep_match_x2" 
> ref="peptide2_in_molecule_table">
>           <!-- A poorer match to same spectrum as "pep_match_x1" !>
>           <pf:cvParam accession="PI:99999" name="score" value="12" />
>         </IdentificationHypothesis>
>       </IdentificationResult>
>     </AnalyteDetectionResult>
>   </AnalyteDetectionResultSet>
> 
>   <AnalyteDetectionResultSet type=Protein_inferencing>
>     <AnalyteDetectionResult id="protein_group_1">
>       <IdentificationResult>
>         <SomeTagTBD id="PP" ref="pep_match_x">
>           <pf:cvParam startpos = 23>
>           <pf:cvParam endpos = 29>
>         <SomeTagTBD />
>         <IdentificationHypothesis id="TRYP_PIG" 
> ref="protein1_in_molecule_table">
>           <pf:cvParam accession="PI:99999" name="score" value="162" />
>         </IdentificationHypothesis>
>         <IdentificationHypothesis id="TRYP_BOV" 
> ref="protein2_in_molecule_table">
>           <pf:cvParam accession="PI:99999" name="score" value="162" />
>         </IdentificationHypothesis>
>       </IdentificationResult>
>       <IdentificationResult>         # nothing doing here ? [SJH]
>       </IdentificationResult>        #  
>     </AnalyteDetectionResult>
>   </AnalyteDetectionResultSet>
>     <AnalyteDetectionResult id="protein_group_2">
>       <IdentificationResult>
>         <SomeTagTBD id="PP" ref="pep_match_y">
>           <pf:cvParam startpos = 123>
>           <pf:cvParam endpos = 129>
>         <SomeTagTBD />
>         <IdentificationHypothesis id="DODGY" 
> ref="protein99_in_molecule_table">
>           <pf:cvParam accession="PI:99999" name="score" value="1" />
>         </IdentificationHypothesis>
>       </IdentificationResult>
>       <IdentificationResult>
>       </IdentificationResult>
>     </AnalyteDetectionResult>
>   </AnalyteDetectionResultSet>
> </pf:DataCollection>
> 
> Please correct where I haven't understood.
> 
> Before, we had in peptide ID:
> <PolypeptideResultItem identifier="1_1"  
> calculatedMassToCharge="670.86261" chargeState="2" 
> experimentalMassToCharge="671.9" polypeptideReference_ref="xxx">
> New proposal is that calculatedMassToCharge, chargeState and 
> experimentalMassToCharge are all just CV?
> 
> Likewise, for protein inferencing, we had:
>           <_resultItems>
>             <RelationResultItem identifier="" start="160" end="171" 
> polypeptideReference_ref="1_1" post="K" pre="I">
>             </RelationResultItem>
>             <RelationResultItem identifier="" start="57" end="71" 
> polypeptideReference_ref="3_1" post="K" pre="R">
>             </RelationResultItem>
> 
> But start, end, post and pre would now be CV?
> btw, Luisa recommends that we don't make too many things like this CV...
> Having been enthusiastic about the change, I think I'm now going off it 
> - partly because with all the extra CV, file sizes may well explode. 
> Please persuade me otherwise!
> (btw, I've 'read but ignored' the quantitation suggestions based on 
> decisions in Toledo.)
> 
> 
> One minor comment:
> 
> Slide 6: ..., but the results are always about the result from the 
> user’s perspective – “What did I find and/or measure?”, rather than “How 
> did I account for all of the spectra?”
>  - Many users do want to try and account for all their spectra because 
> they believe that they are missing something useful.
> 
> 
> David
> 
> Sean L Seymour wrote:
>>
>> Hi all,
>>
>> After the wrap up Friday afternoon, the few remaining people in the PI 
>> group had a short meeting where we discussed a potential 
>> generalization to the results portion of the schema. The big question 
>> that came out of this was whether or not we should keep the result 
>> description for the ID of peptides from MS/MS spectra as it was by 
>> midday Friday, or whether it made sense to restructure this so that it 
>> followed the more general structure for results that we would use for 
>> many other things, including protein inference from peptide IDs. I 
>> agreed to outline the various use cases and try to lay out the issues. 
>> I had hoped to send this out by Monday, but it's taken a lot longer 
>> than planned. Apologies for being a day late, but I hope you'll see 
>> that a lot of thought went into this.
>>
>> There are two documents. Please look at "AnalysisXML Results Design 
>> Question.ppt" first. This lays out the specific schema change question 
>> we face. One of the biggest concerns about this proposed change was 
>> that it was not immediately obvious to any of us last Friday whether 
>> this was a substantial restructuring or essentially a renaming 
>> process. As you'll see in the slide showing the alignment, I now 
>> believe that the change is largely a renaming process and not a large 
>> change. The only real change is the insertion of one additional level, 
>> but I can image a way around doing this. In fact, I think that the 
>> reason for inserting this level is not specific to the question of the 
>> schema change, rather it's simply making up for something that was 
>> missing in the original model. There needs to be a way of having 
>> things that are attributes of the overall identification rather than 
>> an individual identification hypothesis - for example, the probability 
>> that at least one of the identification hypotheses (hits/matches) is 
>> correct for the spectrum. Assuming we agree that this is true, I think 
>> there is zero difference in the schema other than using more generic 
>> names, and my opinion is that we should really make this change.
>>
>> The second document, "AnalysisXML Results Use Cases.ppt" tries to 
>> capture a lot of more specific use cases that demonstrate why the 
>> proposed schema change may be the right thing to do. I've done this 
>> using 'pseudo instance documents' which are explained in the slides. I 
>> hope this is a useful communication mechanism, and may have some use 
>> for documentation as well. If no one finds them useful, no big deal - 
>> I was just trying to find a way to communicate clearly. Please excuse 
>> inaccuracies in the details of some of the use cases. I was trying to 
>> assess whether or not the constant AnalysisResult frame was robust to 
>> a large number of variations. I think you'll see that it is, and it's 
>> really not clear to my why we should have a special case of element 
>> names for the ID of peptides from MS/MS spectra. The only good reason 
>> I can see for it is that it's what we already had drawn up in the schema.
>>
>> Please feel free to add, modify, or correct any of this as you see fit!
>>
>> Sean
>>
>>
>> ------------------------------------------------------------------------
>>
>> -------------------------------------------------------------------------
>> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
>> Don't miss this year's exciting event. There's still time to save $100. 
>> Use priority code J8TL2D2. 
>> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Psidev-pi-dev mailing list
>> Psi...@li...
>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev
>>   
> 
> -- 
> David Creasy
> Matrix Science
> 64 Baker Street
> London W1U 7GB, UK
> Tel: +44 (0)20 7486 1050
> Fax: +44 (0)20 7224 1344
> 
> dc...@ma...
> http://www.matrixscience.com
> 
> Matrix Science Ltd. is registered in England and Wales
> Company number 3533898
> 
> 
> ------------------------------------------------------------------------
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
> Don't miss this year's exciting event. There's still time to save $100. 
> Use priority code J8TL2D2. 
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Psidev-pi-dev mailing list
> Psi...@li...
> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev

-- 
_______________________________________________________________
Dr. Simon Hubbard, Reader in Bioinformatics
Faculty of Life Sciences, The University of Manchester,
Michael Smith Building, Manchester M13 9PT
mailto:Sim...@ma...
http://www.ls.manchester.ac.uk/people/profile/index.asp?id=2524
TEL: +44 (0)161 306 8930  FAX: +44 (0)161 275 5082