Thread: Re: [Psidev-pi-dev] Results schema critical design question from Friday afternoon in Toledo

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
  <title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
Hi Sean,<br>
<br>
Thanks very much - must have taken quite a while and is very useful.
One thing that may not be obvious to others is where the the
&lt;SpectrumIdentificationResultSet&gt; comes from. I believe that this
was just a 'rename' of PolypeptideResultSet made by the sub group that
you were in at Toledo.<br>
<br>
As we've usefully discussed, finding a way to communicate effectively
is an issue. So, to make 100% sure I've understood I'll talk back to
you in XML :)<br>
<br>
This is a cut down of an example for an ms-ms search of a single
spectrum with peptide results and protein inferencing. The protein
inferencing (impossibly - 'cos just one peptide!) has a couple of
similar proteins in the first group, and one in the second group.<br>
<br>
<tt>&lt;pf:DataCollection&gt;<br>
&nbsp; &lt;AnalyteDetectionResultSet type=MS_MS_peptide_matches&gt;<br>
&nbsp;&nbsp;&nbsp; &lt;AnalyteDetectionResult&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;IdentificationResult&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;SpectrumElement spectrumID="9"
spectraDataInputRef_ref="file.1"/&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;IdentificationHypothesis id="pep_match_x1"
ref="peptide1_in_molecule_table"&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;pf:cvParam accession="PI:99999" name="score" value="62"
/&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/IdentificationHypothesis&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;IdentificationHypothesis id="pep_match_x2"
ref="peptide2_in_molecule_table"&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;!-- A poorer match to same spectrum as "</tt><tt>pep_match_</tt><tt>x1"
!&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;pf:cvParam accession="PI:99999" name="score" value="12"
/&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/IdentificationHypothesis&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/IdentificationResult&gt;<br>
&nbsp;&nbsp;&nbsp; &lt;/AnalyteDetectionResult&gt;<br>
&nbsp; &lt;/AnalyteDetectionResultSet&gt;<br>
<br>
&nbsp; &lt;AnalyteDetectionResultSet type=Protein_inferencing&gt;<br>
</tt><tt>&nbsp;&nbsp;&nbsp; &lt;AnalyteDetectionResult id="protein_group_1"&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;IdentificationResult&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;SomeTagTBD id="PP" ref="</tt><tt>pep_match_x</tt><tt>"&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </tt><tt>&lt;pf:cvParam startpos = 23&gt;</tt><br>
<tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </tt><tt>&lt;pf:cvParam endpos = 29&gt;</tt><br>
<tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;SomeTagTBD /&gt;<br>
</tt><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;IdentificationHypothesis id="TRYP_PIG"
ref="protein1_in_molecule_table"&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;pf:cvParam accession="PI:99999" name="score" value="162"
/&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/IdentificationHypothesis&gt;<br>
</tt><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;IdentificationHypothesis id="TRYP_BOV"
ref="protein2_in_molecule_table"&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;pf:cvParam accession="PI:99999" name="score" value="162"
/&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/IdentificationHypothesis&gt;<br>
</tt><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/IdentificationResult&gt;<br>
</tt><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;IdentificationResult&gt;<br>
</tt><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/IdentificationResult&gt;<br>
</tt><tt>&nbsp;&nbsp;&nbsp; &lt;/AnalyteDetectionResult&gt;<br>
</tt><tt>&nbsp; &lt;/AnalyteDetectionResultSet&gt;<br>
</tt><tt>&nbsp;&nbsp;&nbsp; &lt;AnalyteDetectionResult id="protein_group_2"&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;IdentificationResult&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;SomeTagTBD id="PP" ref="</tt><tt>pep_match_y</tt><tt>"&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </tt><tt>&lt;pf:cvParam startpos = 123&gt;</tt><br>
<tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </tt><tt>&lt;pf:cvParam endpos = 129&gt;</tt><br>
<tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;SomeTagTBD /&gt;<br>
</tt><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;IdentificationHypothesis id="DODGY"
ref="protein99_in_molecule_table"&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;pf:cvParam accession="PI:99999" name="score" value="1"
/&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/IdentificationHypothesis&gt;<br>
</tt><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/IdentificationResult&gt;<br>
</tt><tt>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;IdentificationResult&gt;<br>
</tt><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/IdentificationResult&gt;<br>
</tt><tt>&nbsp;&nbsp;&nbsp; &lt;/AnalyteDetectionResult&gt;<br>
</tt><tt>&nbsp; &lt;/AnalyteDetectionResultSet&gt;<br>
</tt><tt>&lt;/pf:DataCollection&gt;<br>
<br>
</tt>Please correct where I haven't understood.<br>
<br>
Before, we had in peptide ID:<br>
&lt;PolypeptideResultItem identifier="1_1"&nbsp;
calculatedMassToCharge="670.86261" chargeState="2"
experimentalMassToCharge="671.9" polypeptideReference_ref="xxx"&gt;<br>
New proposal is that calculatedMassToCharge, chargeState and
experimentalMassToCharge are all just CV?<br>
<br>
Likewise, for protein inferencing, we had:<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;_resultItems&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;RelationResultItem identifier="" start="160" end="171"
polypeptideReference_ref="1_1" post="K" pre="I"&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/RelationResultItem&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;RelationResultItem identifier="" start="57" end="71"
polypeptideReference_ref="3_1" post="K" pre="R"&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/RelationResultItem&gt;<br>
<br>
But start, end, post and pre would now be CV?<br>
btw, Luisa recommends that we don't make too many things like this CV...<br>
Having been enthusiastic about the change, I think I'm now going off it
- partly because with all the extra CV, file sizes may well explode.
Please persuade me otherwise!<br>
(btw, I've 'read but ignored' the quantitation suggestions based on
decisions in Toledo.)<br>
<br>
<br>
One minor comment:<br>
<br>
Slide 6: ..., but the results are always about the result from the
user&#8217;s perspective &#8211; &#8220;What did I find and/or measure?&#8221;, rather than
&#8220;How did I account for all of the spectra?&#8221; <br>
&nbsp;- Many users do want to try and account for all their spectra because
they believe that they are missing something useful.<br>
<br>
<br>
David<br>
<br>
Sean L Seymour wrote:
<blockquote
 cite="mid:OFC...@ap..."
 type="cite"><br>
  <font face="sans-serif" size="2">Hi all,</font>
  <br>
  <br>
  <font face="sans-serif" size="2">After the wrap up Friday afternoon,
the few remaining people in the PI group had a short meeting where we
discussed
a potential generalization to the results portion of the schema. The
big
question that came out of this was whether or not we should keep the
result
description for the ID of peptides from MS/MS spectra as it was by
midday
Friday, or whether it made sense to restructure this so that it
followed
the more general structure for results that we would use for many other
things, including protein inference from peptide IDs. I agreed to
outline
the various use cases and try to lay out the issues. I had hoped to
send
this out by Monday, but it's taken a lot longer than planned. Apologies
for being a day late, but I hope you'll see that a lot of thought went
into this.</font>
  <br>
  <br>
  <font face="sans-serif" size="2">There are two documents. Please look
at "AnalysisXML Results Design Question.ppt" first. This lays
out the specific schema change question we face. One of the biggest
concerns
about this proposed change was that it was not immediately obvious to
any
of us last Friday whether this was a substantial restructuring or
essentially
a renaming process. As you'll see in the slide showing the alignment, I
now believe that the change is largely a renaming process and not a
large
change. The only real change is the insertion of one additional level,
but I can image a way around doing this. In fact, I think that the
reason
for inserting this level is not specific to the question of the schema
change, rather it's simply making up for something that was missing in
the original model. There needs to be a way of having things that are
attributes
of the overall identification rather than an individual identification
hypothesis - for example, the probability that at least one of the
identification
hypotheses (hits/matches) is correct for the spectrum. Assuming we
agree
that this is true, I think there is zero difference in the schema other
than using more generic names, and my opinion is that we should really
make this change.</font>
  <br>
  <br>
  <font face="sans-serif" size="2">The second document, "AnalysisXML
Results Use Cases.ppt" tries to capture a lot of more specific use
cases that demonstrate why the proposed schema change may be the right
thing to do. I've done this using 'pseudo instance documents' which are
explained in the slides. I hope this is a useful communication
mechanism,
and may have some use for documentation as well. If no one finds them
useful,
no big deal - I was just trying to find a way to communicate clearly.
Please
excuse inaccuracies in the details of some of the use cases. I was
trying
to assess whether or not the constant AnalysisResult frame was robust
to
a large number of variations. I think you'll see that it is, and it's
really
not clear to my why we should have a special case of element names for
the ID of peptides from MS/MS spectra. The only good reason I can see
for
it is that it's what we already had drawn up in the schema. </font>
  <br>
  <br>
  <font face="sans-serif" size="2">Please feel free to add, modify, or
correct any of this as you see fit!</font>
  <br>
  <br>
  <font face="sans-serif" size="2">Sean</font>
  <br>
  <br>
  <br>
  <pre wrap="">
<hr size="4" width="90%">
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
<a class="moz-txt-link-freetext" href="http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone">http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone</a></pre>
  <pre wrap="">
<hr size="4" width="90%">
_______________________________________________
Psidev-pi-dev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Psi...@li...">Psi...@li...</a>
<a class="moz-txt-link-freetext" href="https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev">https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev</a>
  </pre>
</blockquote>
<br>
<pre class="moz-signature" cols="72">-- 
David Creasy
Matrix Science
64 Baker Street
London W1U 7GB, UK
Tel: +44 (0)20 7486 1050
Fax: +44 (0)20 7224 1344

<a class="moz-txt-link-abbreviated" href="mailto:dc...@ma...">dc...@ma...</a>
<a class="moz-txt-link-freetext" href="http://www.matrixscience.com">http://www.matrixscience.com</a>

Matrix Science Ltd. is registered in England and Wales
Company number 3533898</pre>
</body>
</html>