From: Jones, A. <And...@li...> - 2008-06-27 13:54:43
|
Sorry meant to send to the list From: Jones, Andy Sent: 27 June 2008 14:54 To: 'David Creasy' Subject: RE: [Psidev-pi-dev] Representing Sequences id and name are standard for all elements that inherit from FuGE identifiable – this is perhaps a separate discussion as to whether the optional name attribute should be there. I agree that length may be useful – is this just an integer value with no unit? I’m less sure about pI and mass since mass at least can be calculated very simply, and pI values (in my opinion) are pretty inaccurate and fairly meaningless – unless someone can convince me otherwise? Cheers Andy From: David Creasy [mailto:dc...@ma...] Sent: 27 June 2008 14:51 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] Representing Sequences Hi Andy, length may be useful, because some people won't want to output the actual sequence for space reasons. The other things we wanted to add before were pI and mass. Why do we want name? Is this for, say, a description line? (Also, identifier -> id?) David Jones, Andy wrote: Hi all, It was decided on the call that we would like to flag that Sequences in the ConceptualMoleculeCollection should have a Boolean attribute to capture if they are decoy sequences. At the moment we are using the FuGE:Sequence element. I don’t really want to add another attribute to this (it’s less problematic cutting down FuGE than adding new things), so I’m wondering if we should define our own Sequence type in AnalysisXML. This would also allow us to choose exactly the relevant attributes. At the moment, Sequence can have all of the following: <pf:Sequence isCircular="true" sequence="String" length="0" isApproximateLength="true" SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" name="String"> Several of these attributes were created to represent concepts that probably will never be required or implemented in AnalysisXML. How about the following: <DBSequence identifier = “” name = “” isDecoy = “true”> <seq>MCTMG...</seq> <pf:DatabaseReference Database_ref="" accession="Rev_IPI00013808.1"/> </DBSequence> Are any of the other attributes on Sequence actually required? I’ll post a new version of the schema with other changes WRT to PeptideEvidence shortly, Cheers Andy ________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: David C. <dc...@ma...> - 2008-06-27 14:12:26
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=windows-1252" http-equiv="Content-Type"> </head> <body bgcolor="#ffffff" text="#000000"> <br> Jones, Andy wrote: <blockquote cite="mid:08D...@EV..." type="cite"> <meta http-equiv="Content-Type" content="text/html; "> <meta name="Generator" content="Microsoft Word 12 (filtered medium)"> <!--[if !mso]> <style> v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} </style> <![endif]--> <style> <!-- /* Font Definitions */ @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;} @font-face {font-family:Tahoma; panose-1:2 11 6 4 3 5 4 4 2 4;} @font-face {font-family:Consolas; panose-1:2 11 6 9 2 2 4 3 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0cm; margin-bottom:.0001pt; font-size:11.0pt; font-family:"Calibri","sans-serif"; color:black;} a:link, span.MsoHyperlink {mso-style-priority:99; color:blue; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {mso-style-priority:99; color:purple; text-decoration:underline;} pre {mso-style-priority:99; mso-style-link:"HTML Preformatted Char"; margin:0cm; margin-bottom:.0001pt; font-size:10.0pt; font-family:"Courier New"; color:black;} span.HTMLPreformattedChar {mso-style-name:"HTML Preformatted Char"; mso-style-priority:99; mso-style-link:"HTML Preformatted"; font-family:Consolas; color:black;} span.EmailStyle19 {mso-style-type:personal; font-family:"Calibri","sans-serif"; color:windowtext;} span.EmailStyle20 {mso-style-type:personal; font-family:"Calibri","sans-serif"; color:#1F497D;} span.EmailStyle21 {mso-style-type:personal-reply; font-family:"Calibri","sans-serif"; color:#1F497D;} .MsoChpDefault {mso-style-type:export-only; font-size:10.0pt;} @page Section1 {size:612.0pt 792.0pt; margin:72.0pt 72.0pt 72.0pt 72.0pt;} div.Section1 {page:Section1;} --> </style><!--[if gte mso 9]><xml> <o:shapedefaults v:ext="edit" spidmax="1026" /> </xml><![endif]--><!--[if gte mso 9]><xml> <o:shapelayout v:ext="edit"> <o:idmap v:ext="edit" data="1" /> </o:shapelayout></xml><![endif]--> <div class="Section1"> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Sorry meant to send to the list<o:p></o:p></span></p> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <div> <div style="border-style: solid none none; border-color: rgb(181, 196, 223) -moz-use-text-color -moz-use-text-color; border-width: 1pt medium medium; padding: 3pt 0cm 0cm;"> <p class="MsoNormal"><b><span style="font-size: 10pt; font-family: "Tahoma","sans-serif"; color: windowtext;" lang="EN-US">From:</span></b><span style="font-size: 10pt; font-family: "Tahoma","sans-serif"; color: windowtext;" lang="EN-US"> Jones, Andy <br> <b>Sent:</b> 27 June 2008 14:54<br> <b>To:</b> 'David Creasy'<br> <b>Subject:</b> RE: [Psidev-pi-dev] Representing Sequences<o:p></o:p></span></p> </div> </div> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">id and name are standard for all elements that inherit from FuGE identifiable – this is perhaps a separate discussion as to whether the optional name attribute should be there.<o:p></o:p></span></p> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">I agree that length may be useful – is this just an integer value with no unit? </span></p> </div> </div> </blockquote> Yes, I think so.<br> <blockquote cite="mid:08D...@EV..." type="cite"> <div class="Section1"> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">I’m less sure about pI and mass since mass at least can be calculated very simply</span></p> </div> </div> </blockquote> Only if you have the sequence... (we have residue masses in the file).<br> <blockquote cite="mid:08D...@EV..." type="cite"> <div class="Section1"> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">, and pI values (in my opinion) are pretty inaccurate and fairly meaningless </span></p> </div> </div> </blockquote> Scandalous! (I happen to agree, but now some people will never speak to either of us ever again).<br> <br> The main problem with mass and pI is that these are 'irrelevant' if the sequence is nuleic acid rather than residues.<br> Why not just allow CV there? We can share the same CV as the PEFF format, which includes, taxonomy, sequence type, gene ID, and lots of wonderful other things?<br> <br> <br> <blockquote cite="mid:08D...@EV..." type="cite"> <div class="Section1"> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">– unless someone can convince me otherwise?<o:p></o:p></span></p> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Cheers<o:p></o:p></span></p> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Andy<o:p></o:p></span></p> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <div> <div style="border-style: solid none none; border-color: rgb(181, 196, 223) -moz-use-text-color -moz-use-text-color; border-width: 1pt medium medium; padding: 3pt 0cm 0cm;"> <p class="MsoNormal"><b><span style="font-size: 10pt; font-family: "Tahoma","sans-serif"; color: windowtext;" lang="EN-US">From:</span></b><span style="font-size: 10pt; font-family: "Tahoma","sans-serif"; color: windowtext;" lang="EN-US"> David Creasy [<a class="moz-txt-link-freetext" href="mailto:dc...@ma...">mailto:dc...@ma...</a>] <br> <b>Sent:</b> 27 June 2008 14:51<br> <b>To:</b> Jones, Andy<br> <b>Cc:</b> <a class="moz-txt-link-abbreviated" href="mailto:psi...@li...">psi...@li...</a><br> <b>Subject:</b> Re: [Psidev-pi-dev] Representing Sequences<o:p></o:p></span></p> </div> </div> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal">Hi Andy,<br> <br> length may be useful, because some people won't want to output the actual sequence for space reasons. The other things we wanted to add before were pI and mass. <br> Why do we want name? Is this for, say, a description line?<br> (Also, identifier -> id?)<br> <br> David<br> <br> Jones, Andy wrote: <o:p></o:p></p> <p class="MsoNormal">Hi all,<o:p></o:p></p> <p class="MsoNormal"> <o:p></o:p></p> <p class="MsoNormal">It was decided on the call that we would like to flag that Sequences in the ConceptualMoleculeCollection should have a Boolean attribute to capture if they are decoy sequences. At the moment we are using the FuGE:Sequence element. I don’t really want to add another attribute to this (it’s less problematic cutting down FuGE than adding new things), so I’m wondering if we should define our own Sequence type in AnalysisXML. This would also allow us to choose exactly the relevant attributes. At the moment, Sequence can have all of the following:<o:p></o:p></p> <p class="MsoNormal"> <o:p></o:p></p> <p class="MsoNormal"><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> </span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"><</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: maroon; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">pf:Sequence</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: red; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> isCircular</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">="</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">true</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">"</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: red; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> sequence</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">="</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">String</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">"</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: red; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> length</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">="</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">0</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">"</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: red; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> isApproximateLength</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">="</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">true</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">"</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: red; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> SequenceAnnotationSet_ref</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">="</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">String</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">"</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: red; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> start</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">="</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">0</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">"</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: red; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> end</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">="</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">0</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">"</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: red; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> identifier</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">="</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">String</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">"</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: red; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> name</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">="</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">String</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">"></span><o:p></o:p></p> <p class="MsoNormal"><span style="font-size: 12pt; font-family: "Times New Roman","serif"; color: blue;"> </span><o:p></o:p></p> <p class="MsoNormal">Several of these attributes were created to represent concepts that probably will never be required or implemented in AnalysisXML. How about the following:<o:p></o:p></p> <p class="MsoNormal"> <o:p></o:p></p> <p class="MsoNormal"><DBSequence identifier = “” name = “” isDecoy = “true”><o:p></o:p></p> <p class="MsoNormal"> <seq>MCTMG...</seq><o:p></o:p></p> <p class="MsoNormal"> <span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"><</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: maroon; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">pf:DatabaseReference</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: red; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> Database_ref</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">=""</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: red; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> accession</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">="Rev_</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">IPI00013808.1</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">"/></span><o:p></o:p></p> <p class="MsoNormal"><span style="font-size: 12pt; font-family: "Times New Roman","serif"; color: blue;"></DBSequence></span><o:p></o:p></p> <p class="MsoNormal"><span style="font-size: 12pt; font-family: "Times New Roman","serif"; color: blue;"> </span><o:p></o:p></p> <p class="MsoNormal">Are any of the other attributes on Sequence actually required? I’ll post a new version of the schema with other changes WRT to PeptideEvidence shortly,<o:p></o:p></p> <p class="MsoNormal">Cheers<o:p></o:p></p> <p class="MsoNormal">Andy<o:p></o:p></p> <p class="MsoNormal"> <o:p></o:p></p> <p class="MsoNormal"> <o:p></o:p></p> <p class="MsoNormal"> <o:p></o:p></p> <p class="MsoNormal"> <o:p></o:p></p> <p class="MsoNormal"> <o:p></o:p></p> <pre><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"> <hr align="center" size="4" width="90%"> </pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre><o:p> </o:p></pre> <pre>-------------------------------------------------------------------------<o:p></o:p></pre> <pre>Check out the new SourceForge.net Marketplace.<o:p></o:p></pre> <pre>It's the best place to buy or sell services for<o:p></o:p></pre> <pre>just about anything Open Source.<o:p></o:p></pre> <pre><a moz-do-not-send="true" href="http://sourceforge.net/services/buy/index.php">http://sourceforge.net/services/buy/index.php</a><o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"> <hr align="center" size="4" width="90%"> </pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre><o:p> </o:p></pre> <pre>_______________________________________________<o:p></o:p></pre> <pre>Psidev-pi-dev mailing list<o:p></o:p></pre> <pre><a moz-do-not-send="true" href="mailto:Psi...@li...">Psi...@li...</a><o:p></o:p></pre> <pre><a moz-do-not-send="true" href="https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev">https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev</a><o:p></o:p></pre> <pre> <o:p></o:p></pre> <p class="MsoNormal" style="margin-bottom: 12pt;"><span style="font-size: 12pt; font-family: "Times New Roman","serif";"><o:p> </o:p></span></p> <pre>-- <o:p></o:p></pre> <pre>David Creasy<o:p></o:p></pre> <pre>Matrix Science<o:p></o:p></pre> <pre>64 Baker Street<o:p></o:p></pre> <pre>London W1U 7GB, UK<o:p></o:p></pre> <pre>Tel: +44 (0)20 7486 1050<o:p></o:p></pre> <pre>Fax: +44 (0)20 7224 1344<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre><a moz-do-not-send="true" href="mailto:dc...@ma...">dc...@ma...</a><o:p></o:p></pre> <pre><a moz-do-not-send="true" href="http://www.matrixscience.com">http://www.matrixscience.com</a><o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre>Matrix Science Ltd. is registered in England and Wales<o:p></o:p></pre> <pre>Company number 3533898<o:p></o:p></pre> </div> </div> </div> <pre wrap=""> <hr size="4" width="90%"> ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. <a class="moz-txt-link-freetext" href="http://sourceforge.net/services/buy/index.php">http://sourceforge.net/services/buy/index.php</a></pre> <pre wrap=""> <hr size="4" width="90%"> _______________________________________________ Psidev-pi-dev mailing list <a class="moz-txt-link-abbreviated" href="mailto:Psi...@li...">Psi...@li...</a> <a class="moz-txt-link-freetext" href="https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev">https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev</a> </pre> </blockquote> <br> <pre class="moz-signature" cols="72">-- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 <a class="moz-txt-link-abbreviated" href="mailto:dc...@ma...">dc...@ma...</a> <a class="moz-txt-link-freetext" href="http://www.matrixscience.com">http://www.matrixscience.com</a> Matrix Science Ltd. is registered in England and Wales Company number 3533898</pre> </body> </html> |
From: Jones, A. <And...@li...> - 2008-06-27 14:21:41
|
So how about include length as an attribute and then let all other things go in the CV (pI, mass, etc.)? From: Jones, Andy Sent: 27 June 2008 14:54 To: 'David Creasy' Subject: RE: [Psidev-pi-dev] Representing Sequences id and name are standard for all elements that inherit from FuGE identifiable – this is perhaps a separate discussion as to whether the optional name attribute should be there. I agree that length may be useful – is this just an integer value with no unit? Yes, I think so. I’m less sure about pI and mass since mass at least can be calculated very simply Only if you have the sequence... (we have residue masses in the file). , and pI values (in my opinion) are pretty inaccurate and fairly meaningless Scandalous! (I happen to agree, but now some people will never speak to either of us ever again). The main problem with mass and pI is that these are 'irrelevant' if the sequence is nuleic acid rather than residues. Why not just allow CV there? We can share the same CV as the PEFF format, which includes, taxonomy, sequence type, gene ID, and lots of wonderful other things? – unless someone can convince me otherwise? Cheers Andy From: David Creasy [mailto:dc...@ma...] Sent: 27 June 2008 14:51 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] Representing Sequences Hi Andy, length may be useful, because some people won't want to output the actual sequence for space reasons. The other things we wanted to add before were pI and mass. Why do we want name? Is this for, say, a description line? (Also, identifier -> id?) David Jones, Andy wrote: Hi all, It was decided on the call that we would like to flag that Sequences in the ConceptualMoleculeCollection should have a Boolean attribute to capture if they are decoy sequences. At the moment we are using the FuGE:Sequence element. I don’t really want to add another attribute to this (it’s less problematic cutting down FuGE than adding new things), so I’m wondering if we should define our own Sequence type in AnalysisXML. This would also allow us to choose exactly the relevant attributes. At the moment, Sequence can have all of the following: <pf:Sequence isCircular="true" sequence="String" length="0" isApproximateLength="true" SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" name="String"> Several of these attributes were created to represent concepts that probably will never be required or implemented in AnalysisXML. How about the following: <DBSequence identifier = “” name = “” isDecoy = “true”> <seq>MCTMG...</seq> <pf:DatabaseReference Database_ref="" accession="Rev_IPI00013808.1"/> </DBSequence> Are any of the other attributes on Sequence actually required? I’ll post a new version of the schema with other changes WRT to PeptideEvidence shortly, Cheers Andy ________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: David C. <dc...@ma...> - 2008-06-27 14:45:50
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=windows-1252" http-equiv="Content-Type"> <title></title> </head> <body bgcolor="#ffffff" text="#000000"> OK by me.<br> <br> Jones, Andy wrote: <blockquote cite="mid:08D...@EV..." type="cite"> <meta http-equiv="Content-Type" content="text/html; "> <meta name="Generator" content="Microsoft Word 12 (filtered medium)"> <!--[if !mso]> <style> v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} </style> <![endif]--> <style> <!-- /* Font Definitions */ @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;} @font-face {font-family:Tahoma; panose-1:2 11 6 4 3 5 4 4 2 4;} @font-face {font-family:Consolas; panose-1:2 11 6 9 2 2 4 3 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0cm; margin-bottom:.0001pt; font-size:11.0pt; font-family:"Calibri","sans-serif"; color:black;} a:link, span.MsoHyperlink {mso-style-priority:99; color:blue; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {mso-style-priority:99; color:purple; text-decoration:underline;} pre {mso-style-priority:99; mso-style-link:"HTML Preformatted Char"; margin:0cm; margin-bottom:.0001pt; font-size:10.0pt; font-family:"Courier New"; color:black;} span.HTMLPreformattedChar {mso-style-name:"HTML Preformatted Char"; mso-style-priority:99; mso-style-link:"HTML Preformatted"; font-family:Consolas; color:black;} span.EmailStyle19 {mso-style-type:personal; font-family:"Calibri","sans-serif"; color:windowtext;} span.EmailStyle20 {mso-style-type:personal; font-family:"Calibri","sans-serif"; color:#1F497D;} span.EmailStyle21 {mso-style-type:personal; font-family:"Calibri","sans-serif"; color:#1F497D;} span.EmailStyle22 {mso-style-type:personal-reply; font-family:"Calibri","sans-serif"; color:#1F497D;} .MsoChpDefault {mso-style-type:export-only; font-size:10.0pt;} @page Section1 {size:612.0pt 792.0pt; margin:72.0pt 72.0pt 72.0pt 72.0pt;} div.Section1 {page:Section1;} --> </style><!--[if gte mso 9]><xml> <o:shapedefaults v:ext="edit" spidmax="1026" /> </xml><![endif]--><!--[if gte mso 9]><xml> <o:shapelayout v:ext="edit"> <o:idmap v:ext="edit" data="1" /> </o:shapelayout></xml><![endif]--> <div class="Section1"> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">So how about include length as an attribute and then let all other things go in the CV (pI, mass, etc.)?<o:p></o:p></span></p> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);"> </span><o:p></o:p></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <div> <div style="border-style: solid none none; border-color: -moz-use-text-color; border-width: 1pt medium medium; padding: 3pt 0cm 0cm;"> <p class="MsoNormal"><b><span style="font-size: 10pt; font-family: "Tahoma","sans-serif"; color: windowtext;" lang="EN-US">From:</span></b><span style="font-size: 10pt; font-family: "Tahoma","sans-serif"; color: windowtext;" lang="EN-US"> Jones, Andy <br> <b>Sent:</b> 27 June 2008 14:54<br> <b>To:</b> 'David Creasy'<br> <b>Subject:</b> RE: [Psidev-pi-dev] Representing Sequences</span><o:p></o:p></p> </div> </div> <p class="MsoNormal"> <o:p></o:p></p> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">id and name are standard for all elements that inherit from FuGE identifiable – this is perhaps a separate discussion as to whether the optional name attribute should be there.</span><o:p></o:p></p> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);"> </span><o:p></o:p></p> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">I agree that length may be useful – is this just an integer value with no unit? </span><o:p></o:p></p> </div> <p class="MsoNormal"><span style="font-size: 12pt; font-family: "Times New Roman","serif";">Yes, I think so.<br> <br> <o:p></o:p></span></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">I’m less sure about pI and mass since mass at least can be calculated very simply</span><o:p></o:p></p> </div> <p class="MsoNormal"><span style="font-size: 12pt; font-family: "Times New Roman","serif";">Only if you have the sequence... (we have residue masses in the file).</span><span style="font-size: 12pt; font-family: "Times New Roman","serif"; color: rgb(31, 73, 125);"><o:p></o:p></span></p> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <p class="MsoNormal"><span style="font-size: 12pt; font-family: "Times New Roman","serif";"><o:p> </o:p></span></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">, and pI values (in my opinion) are pretty inaccurate and fairly meaningless </span><o:p></o:p></p> </div> <p class="MsoNormal"><span style="font-size: 12pt; font-family: "Times New Roman","serif";">Scandalous! (I happen to agree, but now some people will never speak to either of us ever again).<br> <br> The main problem with mass and pI is that these are 'irrelevant' if the sequence is nuleic acid rather than residues.<br> Why not just allow CV there? We can share the same CV as the PEFF format, which includes, taxonomy, sequence type, gene ID, and lots of wonderful other things?<br> <br> <br> <br> <o:p></o:p></span></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">– unless someone can convince me otherwise?</span><o:p></o:p></p> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Cheers</span><o:p></o:p></p> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Andy</span><o:p></o:p></p> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);"> </span><o:p></o:p></p> <p class="MsoNormal"><span style="color: rgb(31, 73, 125);"> </span><o:p></o:p></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <div> <div style="border-style: solid none none; border-color: -moz-use-text-color; border-width: 1pt medium medium; padding: 3pt 0cm 0cm;"> <p class="MsoNormal"><b><span style="font-size: 10pt; font-family: "Tahoma","sans-serif"; color: windowtext;" lang="EN-US">From:</span></b><span style="font-size: 10pt; font-family: "Tahoma","sans-serif"; color: windowtext;" lang="EN-US"> David Creasy [<a moz-do-not-send="true" href="mailto:dc...@ma...">mailto:dc...@ma...</a>] <br> <b>Sent:</b> 27 June 2008 14:51<br> <b>To:</b> Jones, Andy<br> <b>Cc:</b> <a moz-do-not-send="true" href="mailto:psi...@li...">psi...@li...</a><br> <b>Subject:</b> Re: [Psidev-pi-dev] Representing Sequences</span><o:p></o:p></p> </div> </div> <p class="MsoNormal"> <o:p></o:p></p> <p class="MsoNormal">Hi Andy,<br> <br> length may be useful, because some people won't want to output the actual sequence for space reasons. The other things we wanted to add before were pI and mass. <br> Why do we want name? Is this for, say, a description line?<br> (Also, identifier -> id?)<br> <br> David<br> <br> Jones, Andy wrote: <o:p></o:p></p> <p class="MsoNormal">Hi all,<o:p></o:p></p> <p class="MsoNormal"> <o:p></o:p></p> <p class="MsoNormal">It was decided on the call that we would like to flag that Sequences in the ConceptualMoleculeCollection should have a Boolean attribute to capture if they are decoy sequences. At the moment we are using the FuGE:Sequence element. I don’t really want to add another attribute to this (it’s less problematic cutting down FuGE than adding new things), so I’m wondering if we should define our own Sequence type in AnalysisXML. This would also allow us to choose exactly the relevant attributes. At the moment, Sequence can have all of the following:<o:p></o:p></p> <p class="MsoNormal"> <o:p></o:p></p> <p class="MsoNormal"><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> </span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"><</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: maroon; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">pf:Sequence</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: red; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> isCircular</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">="</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">true</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">"</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: red; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> sequence</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">="</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">String</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">"</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: red; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> length</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">="</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">0</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">"</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: red; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> isApproximateLength</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">="</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">true</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">"</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: red; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> SequenceAnnotationSet_ref</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">="</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">String</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">"</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: red; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> start</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">="</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">0</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">"</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: red; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> end</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">="</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">0</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">"</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: red; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> identifier</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">="</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">String</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">"</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: red; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> name</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">="</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">String</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">"></span><o:p></o:p></p> <p class="MsoNormal"><span style="font-size: 12pt; font-family: "Times New Roman","serif"; color: blue;"> </span><o:p></o:p></p> <p class="MsoNormal">Several of these attributes were created to represent concepts that probably will never be required or implemented in AnalysisXML. How about the following:<o:p></o:p></p> <p class="MsoNormal"> <o:p></o:p></p> <p class="MsoNormal"><DBSequence identifier = “” name = “” isDecoy = “true”><o:p></o:p></p> <p class="MsoNormal"> <seq>MCTMG...</seq><o:p></o:p></p> <p class="MsoNormal"> <span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"><</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: maroon; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">pf:DatabaseReference</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: red; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> Database_ref</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">=""</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: red; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> accession</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">="Rev_</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">IPI00013808.1</span><span style="background: white none repeat scroll 0%; font-size: 12pt; font-family: "Times New Roman","serif"; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">"/></span><o:p></o:p></p> <p class="MsoNormal"><span style="font-size: 12pt; font-family: "Times New Roman","serif"; color: blue;"></DBSequence></span><o:p></o:p></p> <p class="MsoNormal"><span style="font-size: 12pt; font-family: "Times New Roman","serif"; color: blue;"> </span><o:p></o:p></p> <p class="MsoNormal">Are any of the other attributes on Sequence actually required? I’ll post a new version of the schema with other changes WRT to PeptideEvidence shortly,<o:p></o:p></p> <p class="MsoNormal">Cheers<o:p></o:p></p> <p class="MsoNormal">Andy<o:p></o:p></p> <p class="MsoNormal"> <o:p></o:p></p> <p class="MsoNormal"> <o:p></o:p></p> <p class="MsoNormal"> <o:p></o:p></p> <p class="MsoNormal"> <o:p></o:p></p> <p class="MsoNormal"> <o:p></o:p></p> <pre> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"> <hr align="center" size="4" width="90%"> </pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre> <o:p></o:p></pre> <pre>-------------------------------------------------------------------------<o:p></o:p></pre> <pre>Check out the new SourceForge.net Marketplace.<o:p></o:p></pre> <pre>It's the best place to buy or sell services for<o:p></o:p></pre> <pre>just about anything Open Source.<o:p></o:p></pre> <pre><a moz-do-not-send="true" href="http://sourceforge.net/services/buy/index.php">http://sourceforge.net/services/buy/index.php</a><o:p></o:p></pre> <pre> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"> <hr align="center" size="4" width="90%"> </pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre> <o:p></o:p></pre> <pre>_______________________________________________<o:p></o:p></pre> <pre>Psidev-pi-dev mailing list<o:p></o:p></pre> <pre><a moz-do-not-send="true" href="mailto:Psi...@li...">Psi...@li...</a><o:p></o:p></pre> <pre><a moz-do-not-send="true" href="https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev">https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev</a><o:p></o:p></pre> <pre> <o:p></o:p></pre> <p class="MsoNormal" style="margin-bottom: 12pt;"><span style="font-size: 12pt; font-family: "Times New Roman","serif";"> </span><o:p></o:p></p> <pre>-- <o:p></o:p></pre> <pre>David Creasy<o:p></o:p></pre> <pre>Matrix Science<o:p></o:p></pre> <pre>64 Baker Street<o:p></o:p></pre> <pre>London W1U 7GB, UK<o:p></o:p></pre> <pre>Tel: +44 (0)20 7486 1050<o:p></o:p></pre> <pre>Fax: +44 (0)20 7224 1344<o:p></o:p></pre> <pre> <o:p></o:p></pre> <pre><a moz-do-not-send="true" href="mailto:dc...@ma...">dc...@ma...</a><o:p></o:p></pre> <pre><a moz-do-not-send="true" href="http://www.matrixscience.com">http://www.matrixscience.com</a><o:p></o:p></pre> <pre> <o:p></o:p></pre> <pre>Matrix Science Ltd. is registered in England and Wales<o:p></o:p></pre> <pre>Company number 3533898<o:p></o:p></pre> </div> </div> <pre><o:p> </o:p></pre> <pre style="text-align: center;"> <hr align="center" size="4" width="90%"> </pre> <pre><o:p> </o:p></pre> <pre>-------------------------------------------------------------------------<o:p></o:p></pre> <pre>Check out the new SourceForge.net Marketplace.<o:p></o:p></pre> <pre>It's the best place to buy or sell services for<o:p></o:p></pre> <pre>just about anything Open Source.<o:p></o:p></pre> <pre><a moz-do-not-send="true" href="http://sourceforge.net/services/buy/index.php">http://sourceforge.net/services/buy/index.php</a><o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre style="text-align: center;"> <hr align="center" size="4" width="90%"> </pre> <pre><o:p> </o:p></pre> <pre>_______________________________________________<o:p></o:p></pre> <pre>Psidev-pi-dev mailing list<o:p></o:p></pre> <pre><a moz-do-not-send="true" href="mailto:Psi...@li...">Psi...@li...</a><o:p></o:p></pre> <pre><a moz-do-not-send="true" href="https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev">https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev</a><o:p></o:p></pre> <pre> <o:p></o:p></pre> <p class="MsoNormal"><span style="font-size: 12pt; font-family: "Times New Roman","serif";"><br> <br> <o:p></o:p></span></p> <pre>-- <o:p></o:p></pre> <pre>David Creasy<o:p></o:p></pre> <pre>Matrix Science<o:p></o:p></pre> <pre>64 Baker Street<o:p></o:p></pre> <pre>London W1U 7GB, UK<o:p></o:p></pre> <pre>Tel: +44 (0)20 7486 1050<o:p></o:p></pre> <pre>Fax: +44 (0)20 7224 1344<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre><a moz-do-not-send="true" href="mailto:dc...@ma...">dc...@ma...</a><o:p></o:p></pre> <pre><a moz-do-not-send="true" href="http://www.matrixscience.com">http://www.matrixscience.com</a><o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre>Matrix Science Ltd. is registered in England and Wales<o:p></o:p></pre> <pre>Company number 3533898<o:p></o:p></pre> </div> </div> <pre wrap=""> <hr size="4" width="90%"> ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. <a class="moz-txt-link-freetext" href="http://sourceforge.net/services/buy/index.php">http://sourceforge.net/services/buy/index.php</a></pre> <pre wrap=""> <hr size="4" width="90%"> _______________________________________________ Psidev-pi-dev mailing list <a class="moz-txt-link-abbreviated" href="mailto:Psi...@li...">Psi...@li...</a> <a class="moz-txt-link-freetext" href="https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev">https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev</a> </pre> </blockquote> <br> <pre class="moz-signature" cols="72">-- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 <a class="moz-txt-link-abbreviated" href="mailto:dc...@ma...">dc...@ma...</a> <a class="moz-txt-link-freetext" href="http://www.matrixscience.com">http://www.matrixscience.com</a> Matrix Science Ltd. is registered in England and Wales Company number 3533898</pre> </body> </html> |
From: Pierre-Alain B. <pie...@is...> - 2008-06-27 15:12:54
|
good to me too (a few comments below) David Creasy wrote: > OK by me. > > Jones, Andy wrote: >> >> So how about include length as an attribute and then let all other >> things go in the CV (pI, mass, etc.)? >> >> >> >> >> >> >> >> *From:* Jones, Andy >> *Sent:* 27 June 2008 14:54 >> *To:* 'David Creasy' >> *Subject:* RE: [Psidev-pi-dev] Representing Sequences >> >> >> >> id and name are standard for all elements that inherit from FuGE >> identifiable – this is perhaps a separate discussion as to whether >> the optional name attribute should be there. >> >> >> >> I agree that length may be useful – is this just an integer value >> with no unit? >> >> Yes, I think so. >> >> I’m less sure about pI and mass since mass at least can be calculated >> very simply >> >> Only if you have the sequence... (we have residue masses in the file). >> ... everyone uses the same algorithm for the pI... >> >> >> >> >> >> , and pI values (in my opinion) are pretty inaccurate and fairly >> meaningless >> >> Scandalous! (I happen to agree, but now some people will never speak >> to either of us ever again). >> >> The main problem with mass and pI is that these are 'irrelevant' if >> the sequence is nuleic acid rather than residues. >> Why not just allow CV there? We can share the same CV as the PEFF >> format, which includes, taxonomy, sequence type, gene ID, and lots of >> wonderful other things? >> good. Therefore things like pI, mass, etc should be added in the PEFF CV? (as they might be representing a sequence entry information?) Pierre-Alain >> >> >> >> – unless someone can convince me otherwise? >> >> Cheers >> >> Andy >> >> >> >> >> >> *From:* David Creasy [mailto:dc...@ma...] >> *Sent:* 27 June 2008 14:51 >> *To:* Jones, Andy >> *Cc:* psi...@li... >> <mailto:psi...@li...> >> *Subject:* Re: [Psidev-pi-dev] Representing Sequences >> >> >> >> Hi Andy, >> >> length may be useful, because some people won't want to output the >> actual sequence for space reasons. The other things we wanted to add >> before were pI and mass. >> Why do we want name? Is this for, say, a description line? >> (Also, identifier -> id?) >> >> David >> >> Jones, Andy wrote: >> >> Hi all, >> >> >> >> It was decided on the call that we would like to flag that Sequences >> in the ConceptualMoleculeCollection should have a Boolean attribute >> to capture if they are decoy sequences. At the moment we are using >> the FuGE:Sequence element. I don’t really want to add another >> attribute to this (it’s less problematic cutting down FuGE than >> adding new things), so I’m wondering if we should define our own >> Sequence type in AnalysisXML. This would also allow us to choose >> exactly the relevant attributes. At the moment, Sequence can have all >> of the following: >> >> >> >> <pf:Sequence isCircular="true" >> sequence="String" length="0" isApproximateLength="true" >> SequenceAnnotationSet_ref="String" start="0" end="0" >> identifier="String" name="String"> >> >> >> >> Several of these attributes were created to represent concepts that >> probably will never be required or implemented in AnalysisXML. How >> about the following: >> >> >> >> <DBSequence identifier = “” name = “” isDecoy = “true”> >> >> <seq>MCTMG...</seq> >> >> <pf:DatabaseReference Database_ref="" >> accession="Rev_IPI00013808.1"/> >> >> </DBSequence> >> >> >> >> Are any of the other attributes on Sequence actually required? I’ll >> post a new version of the schema with other changes WRT to >> PeptideEvidence shortly, >> >> Cheers >> >> Andy >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://sourceforge.net/services/buy/index.php >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... <mailto:Psi...@li...> >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> >> >> >> >> -- >> David Creasy >> Matrix Science >> 64 Baker Street >> London W1U 7GB, UK >> Tel: +44 (0)20 7486 1050 >> Fax: +44 (0)20 7224 1344 >> >> dc...@ma... <mailto:dc...@ma...> >> http://www.matrixscience.com >> >> Matrix Science Ltd. is registered in England and Wales >> Company number 3533898 >> >> ------------------------------------------------------------------------ >> >> >> >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://sourceforge.net/services/buy/index.php >> >> ------------------------------------------------------------------------ >> >> >> >> >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... <mailto:Psi...@li...> >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> >> >> >> >> -- >> David Creasy >> Matrix Science >> 64 Baker Street >> London W1U 7GB, UK >> Tel: +44 (0)20 7486 1050 >> Fax: +44 (0)20 7224 1344 >> >> dc...@ma... <mailto:dc...@ma...> >> http://www.matrixscience.com >> >> Matrix Science Ltd. is registered in England and Wales >> Company number 3533898 >> ------------------------------------------------------------------------ >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://sourceforge.net/services/buy/index.php >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> > > -- > David Creasy > Matrix Science > 64 Baker Street > London W1U 7GB, UK > Tel: +44 (0)20 7486 1050 > Fax: +44 (0)20 7224 1344 > > dc...@ma... > http://www.matrixscience.com > > Matrix Science Ltd. is registered in England and Wales > Company number 3533898 > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > ------------------------------------------------------------------------ > > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > |
From: Angel P. <an...@ma...> - 2008-06-27 14:58:25
|
my 2¢ : You need to be able to extend this to all molecule types, or am I missing the point of this thread, and you mean that this would be a suclass of the conceptual molecule element? Second, and this is is tangentially related, but are decoy sequences really a problem we should be putting our effort into? Is it in our domain to encode semantic information about a sequence, and possibly relating reported sequences as part of our schema? On a personal level I could care less if "isDecoy" is an attribute or not, but the temptation then would be for folks to encode the same accession for two different sequences, effectively making the primary key of the sequence object (accession, isDecoy) Do we want to go there? On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> wrote: > So how about include length as an attribute and then let all other things > go in the CV (pI, mass, etc.)? > > > > > > > > *From:* Jones, Andy > *Sent:* 27 June 2008 14:54 > *To:* 'David Creasy' > *Subject:* RE: [Psidev-pi-dev] Representing Sequences > > > > id and name are standard for all elements that inherit from FuGE > identifiable – this is perhaps a separate discussion as to whether the > optional name attribute should be there. > > > > I agree that length may be useful – is this just an integer value with no > unit? > > Yes, I think so. > > I'm less sure about pI and mass since mass at least can be calculated > very simply > > Only if you have the sequence... (we have residue masses in the file). > > > > > > , and pI values (in my opinion) are pretty inaccurate and fairly > meaningless > > Scandalous! (I happen to agree, but now some people will never speak to > either of us ever again). > > The main problem with mass and pI is that these are 'irrelevant' if the > sequence is nuleic acid rather than residues. > Why not just allow CV there? We can share the same CV as the PEFF format, > which includes, taxonomy, sequence type, gene ID, and lots of wonderful > other things? > > > > – unless someone can convince me otherwise? > > Cheers > > Andy > > > > > > *From:* David Creasy [mailto:dc...@ma...<dc...@ma...>] > > *Sent:* 27 June 2008 14:51 > *To:* Jones, Andy > *Cc:* psi...@li... > *Subject:* Re: [Psidev-pi-dev] Representing Sequences > > > > Hi Andy, > > length may be useful, because some people won't want to output the actual > sequence for space reasons. The other things we wanted to add before were pI > and mass. > Why do we want name? Is this for, say, a description line? > (Also, identifier -> id?) > > David > > Jones, Andy wrote: > > Hi all, > > > > It was decided on the call that we would like to flag that Sequences in the > ConceptualMoleculeCollection should have a Boolean attribute to capture if > they are decoy sequences. At the moment we are using the FuGE:Sequence > element. I don't really want to add another attribute to this (it's less > problematic cutting down FuGE than adding new things), so I'm wondering if > we should define our own Sequence type in AnalysisXML. This would also allow > us to choose exactly the relevant attributes. At the moment, Sequence can > have all of the following: > > > > <pf:Sequence isCircular="true" sequence="String"length > ="0" isApproximateLength="true" SequenceAnnotationSet_ref="String" start=" > 0" end="0" identifier="String" name="String"> > > > > Several of these attributes were created to represent concepts that > probably will never be required or implemented in AnalysisXML. How about the > following: > > > > <DBSequence identifier = "" name = "" isDecoy = "true"> > > <seq>MCTMG...</seq> > > <pf:DatabaseReference Database_ref="" accession="Rev_ > IPI00013808.1"/> > > </DBSequence> > > > > Are any of the other attributes on Sequence actually required? I'll post a > new version of the schema with other changes WRT to PeptideEvidence shortly, > > Cheers > > Andy > > > > > > > > > > > > > > > > > > > > ------------------------------ > > > > > > > > > > > > > > > ------------------------------------------------------------------------- > > Check out the new SourceForge.net Marketplace. > > It's the best place to buy or sell services for > > just about anything Open Source. > > http://sourceforge.net/services/buy/index.php > > > > > > > > > > ------------------------------ > > > > > > > > > > > > > > > _______________________________________________ > > Psidev-pi-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > > > -- > > David Creasy > > Matrix Science > > 64 Baker Street > > London W1U 7GB, UK > > Tel: +44 (0)20 7486 1050 > > Fax: +44 (0)20 7224 1344 > > > > dc...@ma... > > http://www.matrixscience.com > > > > Matrix Science Ltd. is registered in England and Wales > > Company number 3533898 > > > > ------------------------------ > > > > > ------------------------------------------------------------------------- > > Check out the new SourceForge.net Marketplace. > > It's the best place to buy or sell services for > > just about anything Open Source. > > http://sourceforge.net/services/buy/index.php > > > > ------------------------------ > > > > > > _______________________________________________ > > Psidev-pi-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > > > -- > > David Creasy > > Matrix Science > > 64 Baker Street > > London W1U 7GB, UK > > Tel: +44 (0)20 7486 1050 > > Fax: +44 (0)20 7224 1344 > > > > dc...@ma... > > http://www.matrixscience.com > > > > Matrix Science Ltd. is registered in England and Wales > > Company number 3533898 > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Jones, A. <And...@li...> - 2008-06-27 15:23:40
|
I think Angel’s response below might not have made it round the list yet. I tend to agree that isDecoy is redundant information and perhaps this is not the best place to encode semantic information. An alternative would be to have a parameter, say on SpectrumIdentification for cvParam = “decoy_string” value = “Rev”. This would be a more compact representation and we would not have to add what is quite a specific attribute type (isDecoy) to Sequence. From: an...@it... [mailto:an...@it...] On Behalf Of Angel Pizarro Sent: 27 June 2008 15:59 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences my 2¢ : You need to be able to extend this to all molecule types, or am I missing the point of this thread, and you mean that this would be a suclass of the conceptual molecule element? Second, and this is is tangentially related, but are decoy sequences really a problem we should be putting our effort into? Is it in our domain to encode semantic information about a sequence, and possibly relating reported sequences as part of our schema? On a personal level I could care less if "isDecoy" is an attribute or not, but the temptation then would be for folks to encode the same accession for two different sequences, effectively making the primary key of the sequence object (accession, isDecoy) Do we want to go there? On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> wrote: So how about include length as an attribute and then let all other things go in the CV (pI, mass, etc.)? From: Jones, Andy Sent: 27 June 2008 14:54 To: 'David Creasy' Subject: RE: [Psidev-pi-dev] Representing Sequences id and name are standard for all elements that inherit from FuGE identifiable – this is perhaps a separate discussion as to whether the optional name attribute should be there. I agree that length may be useful – is this just an integer value with no unit? Yes, I think so. I'm less sure about pI and mass since mass at least can be calculated very simply Only if you have the sequence... (we have residue masses in the file). , and pI values (in my opinion) are pretty inaccurate and fairly meaningless Scandalous! (I happen to agree, but now some people will never speak to either of us ever again). The main problem with mass and pI is that these are 'irrelevant' if the sequence is nuleic acid rather than residues. Why not just allow CV there? We can share the same CV as the PEFF format, which includes, taxonomy, sequence type, gene ID, and lots of wonderful other things? – unless someone can convince me otherwise? Cheers Andy From: David Creasy [mailto:dc...@ma...] Sent: 27 June 2008 14:51 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] Representing Sequences Hi Andy, length may be useful, because some people won't want to output the actual sequence for space reasons. The other things we wanted to add before were pI and mass. Why do we want name? Is this for, say, a description line? (Also, identifier -> id?) David Jones, Andy wrote: Hi all, It was decided on the call that we would like to flag that Sequences in the ConceptualMoleculeCollection should have a Boolean attribute to capture if they are decoy sequences. At the moment we are using the FuGE:Sequence element. I don't really want to add another attribute to this (it's less problematic cutting down FuGE than adding new things), so I'm wondering if we should define our own Sequence type in AnalysisXML. This would also allow us to choose exactly the relevant attributes. At the moment, Sequence can have all of the following: <pf:Sequence isCircular="true" sequence="String" length="0" isApproximateLength="true" SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" name="String"> Several of these attributes were created to represent concepts that probably will never be required or implemented in AnalysisXML. How about the following: <DBSequence identifier = "" name = "" isDecoy = "true"> <seq>MCTMG...</seq> <pf:DatabaseReference Database_ref="" accession="Rev_IPI00013808.1"/> </DBSequence> Are any of the other attributes on Sequence actually required? I'll post a new version of the schema with other changes WRT to PeptideEvidence shortly, Cheers Andy ________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Jones, A. <And...@li...> - 2008-06-27 15:58:21
|
Hi all, I’ve updated the schema in SVN with the following main changes: - PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the call (simple mappings to proteins are done at this level) - Added DBSequence that should be used instead of Sequence (following some of the discussion below) - Created a new collection class SequenceCollection (rather than ConceptualMoleculeCollection) so that only references can be given to DBSequence and Peptide o In fact, I’m not sure if this is sensible since it prevents other types of ConceptualMolecule being added later... to discuss - In FuGE on cvParam, the value attribute is no longer mandatory I’ve added a simple example that validates under examples\schema_usecase_examples\working27June Feel free to mail me any changes to make on Monday, Cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of Jones, Andy Sent: 27 June 2008 16:24 To: Angel Pizarro Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences I think Angel’s response below might not have made it round the list yet. I tend to agree that isDecoy is redundant information and perhaps this is not the best place to encode semantic information. An alternative would be to have a parameter, say on SpectrumIdentification for cvParam = “decoy_string” value = “Rev”. This would be a more compact representation and we would not have to add what is quite a specific attribute type (isDecoy) to Sequence. From: an...@it... [mailto:an...@it...] On Behalf Of Angel Pizarro Sent: 27 June 2008 15:59 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences my 2¢ : You need to be able to extend this to all molecule types, or am I missing the point of this thread, and you mean that this would be a suclass of the conceptual molecule element? Second, and this is is tangentially related, but are decoy sequences really a problem we should be putting our effort into? Is it in our domain to encode semantic information about a sequence, and possibly relating reported sequences as part of our schema? On a personal level I could care less if "isDecoy" is an attribute or not, but the temptation then would be for folks to encode the same accession for two different sequences, effectively making the primary key of the sequence object (accession, isDecoy) Do we want to go there? On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> wrote: So how about include length as an attribute and then let all other things go in the CV (pI, mass, etc.)? From: Jones, Andy Sent: 27 June 2008 14:54 To: 'David Creasy' Subject: RE: [Psidev-pi-dev] Representing Sequences id and name are standard for all elements that inherit from FuGE identifiable – this is perhaps a separate discussion as to whether the optional name attribute should be there. I agree that length may be useful – is this just an integer value with no unit? Yes, I think so. I'm less sure about pI and mass since mass at least can be calculated very simply Only if you have the sequence... (we have residue masses in the file). , and pI values (in my opinion) are pretty inaccurate and fairly meaningless Scandalous! (I happen to agree, but now some people will never speak to either of us ever again). The main problem with mass and pI is that these are 'irrelevant' if the sequence is nuleic acid rather than residues. Why not just allow CV there? We can share the same CV as the PEFF format, which includes, taxonomy, sequence type, gene ID, and lots of wonderful other things? – unless someone can convince me otherwise? Cheers Andy From: David Creasy [mailto:dc...@ma...] Sent: 27 June 2008 14:51 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] Representing Sequences Hi Andy, length may be useful, because some people won't want to output the actual sequence for space reasons. The other things we wanted to add before were pI and mass. Why do we want name? Is this for, say, a description line? (Also, identifier -> id?) David Jones, Andy wrote: Hi all, It was decided on the call that we would like to flag that Sequences in the ConceptualMoleculeCollection should have a Boolean attribute to capture if they are decoy sequences. At the moment we are using the FuGE:Sequence element. I don't really want to add another attribute to this (it's less problematic cutting down FuGE than adding new things), so I'm wondering if we should define our own Sequence type in AnalysisXML. This would also allow us to choose exactly the relevant attributes. At the moment, Sequence can have all of the following: <pf:Sequence isCircular="true" sequence="String" length="0" isApproximateLength="true" SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" name="String"> Several of these attributes were created to represent concepts that probably will never be required or implemented in AnalysisXML. How about the following: <DBSequence identifier = "" name = "" isDecoy = "true"> <seq>MCTMG...</seq> <pf:DatabaseReference Database_ref="" accession="Rev_IPI00013808.1"/> </DBSequence> Are any of the other attributes on Sequence actually required? I'll post a new version of the schema with other changes WRT to PeptideEvidence shortly, Cheers Andy ________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: David C. <dc...@ma...> - 2008-06-29 21:04:23
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=windows-1252" http-equiv="Content-Type"> </head> <body bgcolor="#ffffff" text="#000000"> Thanks Andy,<br> <br> I've added an updated example document to SVN:<br> <a class="moz-txt-link-freetext" href="http://code.google.com/p/psi-pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F001350.xml">http://code.google.com/p/psi-pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F001350.xml</a><br> <br> Problem is that we have now removed the main point of these recent changes which was to add the decoy flag... I think that we need to add isDecoy to SpectrumIdentificationItem.<br> <br> And yes, I suspect that we should go back to using the ConceptualMoleculeCollection<br> Um, and since we've not actually ended up adding anything to DBSequence... we haven't actually achieved anything?<br> I think we need to discuss this again at the next telecon.<br> <br> David<br> <br> Jones, Andy wrote: <blockquote cite="mid:08D...@EV..." type="cite"> <meta http-equiv="Content-Type" content="text/html; "> <meta name="Generator" content="Microsoft Word 12 (filtered medium)"> <!--[if !mso]> <style> v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} </style> <![endif]--> <style> <!-- /* Font Definitions */ @font-face {font-family:Wingdings; panose-1:5 0 0 0 0 0 0 0 0 0;} @font-face {font-family:Wingdings; panose-1:5 0 0 0 0 0 0 0 0 0;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;} @font-face {font-family:Tahoma; panose-1:2 11 6 4 3 5 4 4 2 4;} @font-face {font-family:Consolas; panose-1:2 11 6 9 2 2 4 3 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0cm; margin-bottom:.0001pt; font-size:12.0pt; font-family:"Times New Roman","serif";} a:link, span.MsoHyperlink {mso-style-priority:99; color:blue; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {mso-style-priority:99; color:purple; text-decoration:underline;} p {mso-style-priority:99; mso-margin-top-alt:auto; margin-right:0cm; mso-margin-bottom-alt:auto; margin-left:0cm; font-size:12.0pt; font-family:"Times New Roman","serif";} pre {mso-style-priority:99; mso-style-link:"HTML Preformatted Char"; margin:0cm; margin-bottom:.0001pt; font-size:10.0pt; font-family:"Courier New";} p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph {mso-style-priority:34; margin-top:0cm; margin-right:0cm; margin-bottom:0cm; margin-left:36.0pt; margin-bottom:.0001pt; font-size:12.0pt; font-family:"Times New Roman","serif";} span.HTMLPreformattedChar {mso-style-name:"HTML Preformatted Char"; mso-style-priority:99; mso-style-link:"HTML Preformatted"; font-family:Consolas;} span.EmailStyle20 {mso-style-type:personal; font-family:"Calibri","sans-serif"; color:#1F497D;} span.EmailStyle21 {mso-style-type:personal-reply; font-family:"Calibri","sans-serif"; color:#1F497D;} .MsoChpDefault {mso-style-type:export-only; font-size:10.0pt;} @page Section1 {size:612.0pt 792.0pt; margin:72.0pt 72.0pt 72.0pt 72.0pt;} div.Section1 {page:Section1;} /* List Definitions */ @list l0 {mso-list-id:723259958; mso-list-type:hybrid; mso-list-template-ids:100932132 2015425440 134807555 134807557 134807553 134807555 134807557 134807553 134807555 134807557;} @list l0:level1 {mso-level-start-at:0; mso-level-number-format:bullet; mso-level-text:-; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-18.0pt; font-family:"Calibri","sans-serif"; mso-fareast-font-family:Calibri; mso-bidi-font-family:"Times New Roman";} @list l0:level2 {mso-level-number-format:bullet; mso-level-text:o; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-18.0pt; font-family:"Courier New";} ol {margin-bottom:0cm;} ul {margin-bottom:0cm;} --> </style><!--[if gte mso 9]><xml> <o:shapedefaults v:ext="edit" spidmax="1026" /> </xml><![endif]--><!--[if gte mso 9]><xml> <o:shapelayout v:ext="edit"> <o:idmap v:ext="edit" data="1" /> </o:shapelayout></xml><![endif]--> <div class="Section1"> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">Hi all,<o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">I’ve updated the schema in SVN with the following main changes:<o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <p class="MsoListParagraph" style="text-indent: -18pt;"><!--[if !supportLists]--><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><span style="">-<span style="font-family: "Times New Roman"; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"> </span></span></span><!--[endif]--><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the call (simple mappings to proteins are done at this level)<o:p></o:p></span></p> <p class="MsoListParagraph" style="text-indent: -18pt;"><!--[if !supportLists]--><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><span style="">-<span style="font-family: "Times New Roman"; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"> </span></span></span><!--[endif]--><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">Added DBSequence that should be used instead of Sequence (following some of the discussion below)<o:p></o:p></span></p> <p class="MsoListParagraph" style="text-indent: -18pt;"><!--[if !supportLists]--><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><span style="">-<span style="font-family: "Times New Roman"; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"> </span></span></span><!--[endif]--><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">Created a new collection class SequenceCollection (rather than ConceptualMoleculeCollection) so that only references can be given to DBSequence and Peptide<o:p></o:p></span></p> <p class="MsoListParagraph" style="margin-left: 72pt; text-indent: -18pt;"><!--[if !supportLists]--><span style="font-size: 11pt; font-family: "Courier New"; color: rgb(31, 73, 125);"><span style="">o<span style="font-family: "Times New Roman"; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"> </span></span></span><!--[endif]--><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">In fact, I’m not sure if this is sensible since it prevents other types of ConceptualMolecule being added later... to discuss<o:p></o:p></span></p> <p class="MsoListParagraph" style="text-indent: -18pt;"><!--[if !supportLists]--><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><span style="">-<span style="font-family: "Times New Roman"; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"> </span></span></span><!--[endif]--><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">In FuGE on cvParam, the value attribute is no longer mandatory<o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">I’ve added a simple example that validates under examples\schema_usecase_examples\working27June<o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">Feel free to mail me any changes to make on Monday,<o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">Cheers<o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">Andy<o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <div> <div style="border-style: solid none none; border-color: rgb(181, 196, 223) -moz-use-text-color -moz-use-text-color; border-width: 1pt medium medium; padding: 3pt 0cm 0cm;"> <p class="MsoNormal"><b><span style="font-size: 10pt; font-family: "Tahoma","sans-serif";" lang="EN-US">From:</span></b><span style="font-size: 10pt; font-family: "Tahoma","sans-serif";" lang="EN-US"> <a class="moz-txt-link-abbreviated" href="mailto:psi...@li...">psi...@li...</a> [<a class="moz-txt-link-freetext" href="mailto:psi...@li...">mailto:psi...@li...</a>] <b>On Behalf Of </b>Jones, Andy<br> <b>Sent:</b> 27 June 2008 16:24<br> <b>To:</b> Angel Pizarro<br> <b>Cc:</b> <a class="moz-txt-link-abbreviated" href="mailto:psi...@li...">psi...@li...</a><br> <b>Subject:</b> Re: [Psidev-pi-dev] FW: Representing Sequences<o:p></o:p></span></p> </div> </div> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">I think Angel’s response below might not have made it round the list yet.<o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">I tend to agree that isDecoy is redundant information and perhaps this is not the best place to encode semantic information. An alternative would be to have a parameter, say on SpectrumIdentification for cvParam = “decoy_string” value = “Rev”. This would be a more compact representation and we would not have to add what is quite a specific attribute type (isDecoy) to Sequence.<o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <div> <div style="border-style: solid none none; border-color: rgb(181, 196, 223) -moz-use-text-color -moz-use-text-color; border-width: 1pt medium medium; padding: 3pt 0cm 0cm;"> <p class="MsoNormal"><b><span style="font-size: 10pt; font-family: "Tahoma","sans-serif";" lang="EN-US">From:</span></b><span style="font-size: 10pt; font-family: "Tahoma","sans-serif";" lang="EN-US"> <a class="moz-txt-link-abbreviated" href="mailto:an...@it...">an...@it...</a> [<a class="moz-txt-link-freetext" href="mailto:an...@it...">mailto:an...@it...</a>] <b>On Behalf Of </b>Angel Pizarro<br> <b>Sent:</b> 27 June 2008 15:59<br> <b>To:</b> Jones, Andy<br> <b>Cc:</b> <a class="moz-txt-link-abbreviated" href="mailto:psi...@li...">psi...@li...</a><br> <b>Subject:</b> Re: [Psidev-pi-dev] FW: Representing Sequences<o:p></o:p></span></p> </div> </div> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal" style="margin-bottom: 12pt;">my 2¢ : <br> You need to be able to extend this to all molecule types, or am I missing the point of this thread, and you mean that this would be a suclass of the conceptual molecule element? <br> <br> Second, and this is is tangentially related, but are decoy sequences really a problem we should be putting our effort into? Is it in our domain to encode semantic information about a sequence, and possibly relating reported sequences as part of our schema? <br> On a personal level I could care less if "isDecoy" is an attribute or not, but the temptation then would be for folks to encode the same accession for two different sequences, effectively making the primary key of the sequence object (accession, isDecoy) <br> <br> Do we want to go there? <o:p></o:p></p> <div> <p class="MsoNormal">On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <<a moz-do-not-send="true" href="mailto:And...@li..." target="_blank">And...@li...</a>> wrote:<o:p></o:p></p> <div> <div> <p><span style="color: rgb(31, 73, 125);">So how about include length as an attribute and then let all other things go in the CV (pI, mass, etc.)?</span><o:p></o:p></p> <div> <div> <p><span style="color: rgb(31, 73, 125);"> </span><o:p></o:p></p> <p><span style="color: rgb(31, 73, 125);"> </span><o:p></o:p></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <p><span style="color: rgb(31, 73, 125);"> </span><o:p></o:p></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <div> <div style="border-style: solid none none; border-color: -moz-use-text-color; border-width: 1pt medium medium; padding: 3pt 0cm 0cm;"> <p><b><span style="font-size: 10pt;" lang="EN-US">From:</span></b><span style="font-size: 10pt;" lang="EN-US"> Jones, Andy <br> <b>Sent:</b> 27 June 2008 14:54<br> <b>To:</b> 'David Creasy'<br> <b>Subject:</b> RE: [Psidev-pi-dev] Representing Sequences</span><o:p></o:p></p> </div> </div> <p> <o:p></o:p></p> <p><span style="color: rgb(31, 73, 125);">id and name are standard for all elements that inherit from FuGE identifiable – this is perhaps a separate discussion as to whether the optional name attribute should be there.</span><o:p></o:p></p> <p><span style="color: rgb(31, 73, 125);"> </span><o:p></o:p></p> <p><span style="color: rgb(31, 73, 125);">I agree that length may be useful – is this just an integer value with no unit? </span><o:p></o:p></p> </div> <p style="margin-bottom: 12pt;">Yes, I think so.<o:p></o:p></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <p><span style="color: rgb(31, 73, 125);">I'm less sure about pI and mass since mass at least can be calculated very simply</span><o:p></o:p></p> </div> <p>Only if you have the sequence... (we have residue masses in the file).<o:p></o:p></p> <p><span style="color: rgb(31, 73, 125);"> </span><o:p></o:p></p> <p> <o:p></o:p></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <p><span style="color: rgb(31, 73, 125);">, and pI values (in my opinion) are pretty inaccurate and fairly meaningless </span><o:p></o:p></p> </div> <p style="margin-bottom: 12pt;">Scandalous! (I happen to agree, but now some people will never speak to either of us ever again).<br> <br> The main problem with mass and pI is that these are 'irrelevant' if the sequence is nuleic acid rather than residues.<br> Why not just allow CV there? We can share the same CV as the PEFF format, which includes, taxonomy, sequence type, gene ID, and lots of wonderful other things?<br> <br> <o:p></o:p></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <p><span style="color: rgb(31, 73, 125);">– unless someone can convince me otherwise?</span><o:p></o:p></p> <p><span style="color: rgb(31, 73, 125);">Cheers</span><o:p></o:p></p> <p><span style="color: rgb(31, 73, 125);">Andy</span><o:p></o:p></p> <p><span style="color: rgb(31, 73, 125);"> </span><o:p></o:p></p> <p><span style="color: rgb(31, 73, 125);"> </span><o:p></o:p></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <div> <div style="border-style: solid none none; border-color: -moz-use-text-color; border-width: 1pt medium medium; padding: 3pt 0cm 0cm;"> <p><b><span style="font-size: 10pt;" lang="EN-US">From:</span></b><span style="font-size: 10pt;" lang="EN-US"> David Creasy [<a moz-do-not-send="true" href="mailto:dc...@ma..." target="_blank">mailto:dc...@ma...</a>] <br> <b>Sent:</b> 27 June 2008 14:51<br> <b>To:</b> Jones, Andy<br> <b>Cc:</b> <a moz-do-not-send="true" href="mailto:psi...@li..." target="_blank">psi...@li...</a><br> <b>Subject:</b> Re: [Psidev-pi-dev] Representing Sequences</span><o:p></o:p></p> </div> </div> <p> <o:p></o:p></p> <p>Hi Andy,<br> <br> length may be useful, because some people won't want to output the actual sequence for space reasons. The other things we wanted to add before were pI and mass. <br> Why do we want name? Is this for, say, a description line?<br> (Also, identifier -> id?)<br> <br> David<br> <br> Jones, Andy wrote: <o:p></o:p></p> <p>Hi all,<o:p></o:p></p> <p> <o:p></o:p></p> <p>It was decided on the call that we would like to flag that Sequences in the ConceptualMoleculeCollection should have a Boolean attribute to capture if they are decoy sequences. At the moment we are using the FuGE:Sequence element. I don't really want to add another attribute to this (it's less problematic cutting down FuGE than adding new things), so I'm wondering if we should define our own Sequence type in AnalysisXML. This would also allow us to choose exactly the relevant attributes. At the moment, Sequence can have all of the following:<o:p></o:p></p> <p> <o:p></o:p></p> <p><span style="background: white none repeat scroll 0%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> <span style="color: blue;"><</span><span style="color: maroon;">pf:Sequence</span><span style="color: red;"> isCircular</span><span style="color: blue;">="</span>true<span style="color: blue;">"</span><span style="color: red;"> sequence</span><span style="color: blue;">="</span>String<span style="color: blue;">"</span><span style="color: red;"> length</span><span style="color: blue;">="</span>0<span style="color: blue;">"</span><span style="color: red;"> isApproximateLength</span><span style="color: blue;">="</span>true<span style="color: blue;">"</span><span style="color: red;"> SequenceAnnotationSet_ref</span><span style="color: blue;">="</span>String<span style="color: blue;">"</span><span style="color: red;"> start</span><span style="color: blue;">="</span>0<span style="color: blue;">"</span><span style="color: red;"> end</span><span style="color: blue;">="</span>0<span style="color: blue;">"</span><span style="color: red;"> identifier</span><span style="color: blue;">="</span>String<span style="color: blue;">"</span><span style="color: red;"> name</span><span style="color: blue;">="</span>String<span style="color: blue;">"></span></span><o:p></o:p></p> <p><span style="color: blue;"> </span><o:p></o:p></p> <p>Several of these attributes were created to represent concepts that probably will never be required or implemented in AnalysisXML. How about the following:<o:p></o:p></p> <p> <o:p></o:p></p> <p><DBSequence identifier = "" name = "" isDecoy = "true"><o:p></o:p></p> <p> <seq>MCTMG...</seq><o:p></o:p></p> <p> <span style="background: white none repeat scroll 0%; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"><</span><span style="background: white none repeat scroll 0%; color: maroon; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">pf:DatabaseReference</span><span style="background: white none repeat scroll 0%; color: red; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> Database_ref</span><span style="background: white none repeat scroll 0%; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">=""</span><span style="background: white none repeat scroll 0%; color: red; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> accession</span><span style="background: white none repeat scroll 0%; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">="Rev_</span><span style="background: white none repeat scroll 0%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">IPI00013808.1<span style="color: blue;">"/></span></span><o:p></o:p></p> <p><span style="color: blue;"></DBSequence></span><o:p></o:p></p> <p><span style="color: blue;"> </span><o:p></o:p></p> <p>Are any of the other attributes on Sequence actually required? I'll post a new version of the schema with other changes WRT to PeptideEvidence shortly,<o:p></o:p></p> <p>Cheers<o:p></o:p></p> <p>Andy<o:p></o:p></p> <p> <o:p></o:p></p> <p> <o:p></o:p></p> <p> <o:p></o:p></p> <p> <o:p></o:p></p> <p> <o:p></o:p></p> <pre> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"> <hr align="center" size="4" width="90%"> </pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="margin-bottom: 12pt; text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre> <o:p></o:p></pre> <pre>-------------------------------------------------------------------------<o:p></o:p></pre> <pre>Check out the new SourceForge.net Marketplace.<o:p></o:p></pre> <pre>It's the best place to buy or sell services for<o:p></o:p></pre> <pre>just about anything Open Source.<o:p></o:p></pre> <pre><a moz-do-not-send="true" href="http://sourceforge.net/services/buy/index.php" target="_blank">http://sourceforge.net/services/buy/index.php</a><o:p></o:p></pre> <pre> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"> <hr align="center" size="4" width="90%"> </pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="margin-bottom: 12pt; text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre> <o:p></o:p></pre> <pre>_______________________________________________<o:p></o:p></pre> <pre>Psidev-pi-dev mailing list<o:p></o:p></pre> <pre><a moz-do-not-send="true" href="mailto:Psi...@li..." target="_blank">Psi...@li...</a><o:p></o:p></pre> <pre><a moz-do-not-send="true" href="https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev" target="_blank">https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev</a><o:p></o:p></pre> <pre> <o:p></o:p></pre> <p style="margin-bottom: 12pt;"> <o:p></o:p></p> <pre>-- <o:p></o:p></pre> <pre>David Creasy<o:p></o:p></pre> <pre>Matrix Science<o:p></o:p></pre> <pre>64 Baker Street<o:p></o:p></pre> <pre>London W1U 7GB, UK<o:p></o:p></pre> <pre>Tel: +44 (0)20 7486 1050<o:p></o:p></pre> <pre>Fax: +44 (0)20 7224 1344<o:p></o:p></pre> <pre> <o:p></o:p></pre> <pre><a moz-do-not-send="true" href="mailto:dc...@ma..." target="_blank">dc...@ma...</a><o:p></o:p></pre> <pre><a moz-do-not-send="true" href="http://www.matrixscience.com" target="_blank">http://www.matrixscience.com</a><o:p></o:p></pre> <pre> <o:p></o:p></pre> <pre>Matrix Science Ltd. is registered in England and Wales<o:p></o:p></pre> <pre>Company number 3533898<o:p></o:p></pre> </div> </div> <pre> <o:p></o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"> <hr align="center" size="4" width="90%"> </pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="margin-bottom: 12pt; text-align: center;"><o:p> </o:p></pre> <pre> <o:p></o:p></pre> <pre>-------------------------------------------------------------------------<o:p></o:p></pre> <pre>Check out the new SourceForge.net Marketplace.<o:p></o:p></pre> <pre>It's the best place to buy or sell services for<o:p></o:p></pre> <pre>just about anything Open Source.<o:p></o:p></pre> <pre><a moz-do-not-send="true" href="http://sourceforge.net/services/buy/index.php" target="_blank">http://sourceforge.net/services/buy/index.php</a><o:p></o:p></pre> <pre> <o:p></o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"> <hr align="center" size="4" width="90%"> </pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="margin-bottom: 12pt; text-align: center;"> <o:p></o:p></pre> <pre style="margin-bottom: 12pt; text-align: center;"> <o:p></o:p></pre> <pre style="margin-bottom: 12pt; text-align: center;"><o:p> </o:p></pre> <pre> <o:p></o:p></pre> <pre>_______________________________________________<o:p></o:p></pre> <pre>Psidev-pi-dev mailing list<o:p></o:p></pre> <pre><a moz-do-not-send="true" href="mailto:Psi...@li..." target="_blank">Psi...@li...</a><o:p></o:p></pre> <pre><a moz-do-not-send="true" href="https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev" target="_blank">https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev</a><o:p></o:p></pre> <pre> <o:p></o:p></pre> <p style="margin-bottom: 12pt;"><o:p> </o:p></p> <pre>-- <o:p></o:p></pre> <pre>David Creasy<o:p></o:p></pre> <pre>Matrix Science<o:p></o:p></pre> <pre>64 Baker Street<o:p></o:p></pre> <pre>London W1U 7GB, UK<o:p></o:p></pre> <pre>Tel: +44 (0)20 7486 1050<o:p></o:p></pre> <pre>Fax: +44 (0)20 7224 1344<o:p></o:p></pre> <pre> <o:p></o:p></pre> <pre><a moz-do-not-send="true" href="mailto:dc...@ma..." target="_blank">dc...@ma...</a><o:p></o:p></pre> <pre><a moz-do-not-send="true" href="http://www.matrixscience.com" target="_blank">http://www.matrixscience.com</a><o:p></o:p></pre> <pre> <o:p></o:p></pre> <pre>Matrix Science Ltd. is registered in England and Wales<o:p></o:p></pre> <pre>Company number 3533898<o:p></o:p></pre> </div> </div> </div> </div> </div> <p class="MsoNormal" style="margin-bottom: 12pt;"><br> -------------------------------------------------------------------------<br> Check out the new SourceForge.net Marketplace.<br> It's the best place to buy or sell services for<br> just about anything Open Source.<br> <a moz-do-not-send="true" href="http://sourceforge.net/services/buy/index.php" target="_blank">http://sourceforge.net/services/buy/index.php</a><br> _______________________________________________<br> Psidev-pi-dev mailing list<br> <a moz-do-not-send="true" href="mailto:Psi...@li..." target="_blank">Psi...@li...</a><br> <a moz-do-not-send="true" href="https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev" target="_blank">https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev</a><o:p></o:p></p> </div> <p class="MsoNormal"><br> <br clear="all"> <br> -- <br> Angel Pizarro<br> Director, ITMAT Bioinformatics Facility<br> 806 Biological Research Building<br> 421 Curie Blvd.<br> Philadelphia, PA 19104-6160<br> 215-573-3736 <o:p></o:p></p> </div> </div> </div> <pre wrap=""> <hr size="4" width="90%"> ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. <a class="moz-txt-link-freetext" href="http://sourceforge.net/services/buy/index.php">http://sourceforge.net/services/buy/index.php</a></pre> <pre wrap=""> <hr size="4" width="90%"> _______________________________________________ Psidev-pi-dev mailing list <a class="moz-txt-link-abbreviated" href="mailto:Psi...@li...">Psi...@li...</a> <a class="moz-txt-link-freetext" href="https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev">https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev</a> </pre> </blockquote> <br> <pre class="moz-signature" cols="72">-- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 <a class="moz-txt-link-abbreviated" href="mailto:dc...@ma...">dc...@ma...</a> <a class="moz-txt-link-freetext" href="http://www.matrixscience.com">http://www.matrixscience.com</a> Matrix Science Ltd. is registered in England and Wales Company number 3533898</pre> </body> </html> |
From: Pierre-Alain B. <pie...@is...> - 2008-07-02 09:20:27
|
Thanks David. a couple of questions, just to make sure: 1) in case of top-down approach, do we have to duplicate sequenceCollection information? as SpectrumIdentificationResult contains a PeptideEvidence refering to a Peptide element (and not to a DBSequence), identification is obligatory a Peptide? 2) and what about spectral library searches, do we have to have Peptide elements with possibly undefined explicit sequences to refer to from the SpectrumIdentificationResult (because non peptidic, or because not identified but good spectrum) 3) in the Peptide element, the Modifications are defined in a much more detailed manner than in ModificationParams (PSI-MOD is there for instance). Does this simply mean that The ModificationParams codes the search engine settings and the Peptide includes the formal PSI definition of the Mod? And the only reference is the ModName value? 4) all mass values (sequenceMass, calculatedMassToCharge, experimentalMassToCharge, are not specified whether monoisotopic or averaged. Do we assume that averaged does not exist anymore? 5) is sequenceMass the mass value with/without the mods? If with, the name might be missleading (peptideMass would be more appropriate) 6) in case the DBSequence is nucleotide, is there a tag for saying this? (NB: MS on nucleotide molecules can be performed and analysed, not only MS on AA sequences that are interpreting nucleotide sequences). Or do we neglect MS experiments done on nucleotide molecules (and by the way on glycans...) and only represent the DBSequences as AA sequences (frame translations)? (and what about glycans?) Probaly can be solved if one can replace SequenceCollection by something else if needed (SmallMoleculeCollection, GlycanCollection, MoleculeCollection)... but the validator might not like this. 7) in case that DBSequence is nucleotide, do we represent the Peptide as AA sequence in case of MS done on proteins? That's all for the sequence representation so far Cheers, Pierre-Alain David Creasy wrote: > Thanks Andy, > > I've added an updated example document to SVN: > http://code.google.com/p/psi-pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F001350.xml > > Problem is that we have now removed the main point of these recent > changes which was to add the decoy flag... I think that we need to add > isDecoy to SpectrumIdentificationItem. > > And yes, I suspect that we should go back to using the > ConceptualMoleculeCollection > Um, and since we've not actually ended up adding anything to > DBSequence... we haven't actually achieved anything? > I think we need to discuss this again at the next telecon. > > David > > Jones, Andy wrote: >> >> Hi all, >> >> >> >> I’ve updated the schema in SVN with the following main changes: >> >> >> >> - PeptideEvidence is now part of SpectrumIdentificationItem >> as discussed on the call (simple mappings to proteins are done at >> this level) >> >> - Added DBSequence that should be used instead of Sequence >> (following some of the discussion below) >> >> - Created a new collection class SequenceCollection (rather >> than ConceptualMoleculeCollection) so that only references can be >> given to DBSequence and Peptide >> >> o In fact, I’m not sure if this is sensible since it prevents other >> types of ConceptualMolecule being added later... to discuss >> >> - In FuGE on cvParam, the value attribute is no longer mandatory >> >> >> >> I’ve added a simple example that validates under >> examples\schema_usecase_examples\working27June >> >> >> >> Feel free to mail me any changes to make on Monday, >> >> Cheers >> >> Andy >> >> >> >> >> >> >> >> *From:* psi...@li... >> [mailto:psi...@li...] *On Behalf Of >> *Jones, Andy >> *Sent:* 27 June 2008 16:24 >> *To:* Angel Pizarro >> *Cc:* psi...@li... >> *Subject:* Re: [Psidev-pi-dev] FW: Representing Sequences >> >> >> >> I think Angel’s response below might not have made it round the list yet. >> >> >> >> I tend to agree that isDecoy is redundant information and perhaps >> this is not the best place to encode semantic information. An >> alternative would be to have a parameter, say on >> SpectrumIdentification for cvParam = “decoy_string” value = “Rev”. >> This would be a more compact representation and we would not have to >> add what is quite a specific attribute type (isDecoy) to Sequence. >> >> >> >> >> >> >> >> *From:* an...@it... [mailto:an...@it...] *On >> Behalf Of *Angel Pizarro >> *Sent:* 27 June 2008 15:59 >> *To:* Jones, Andy >> *Cc:* psi...@li... >> *Subject:* Re: [Psidev-pi-dev] FW: Representing Sequences >> >> >> >> my 2¢ : >> You need to be able to extend this to all molecule types, or am I >> missing the point of this thread, and you mean that this would be a >> suclass of the conceptual molecule element? >> >> Second, and this is is tangentially related, but are decoy sequences >> really a problem we should be putting our effort into? Is it in our >> domain to encode semantic information about a sequence, and possibly >> relating reported sequences as part of our schema? >> On a personal level I could care less if "isDecoy" is an attribute or >> not, but the temptation then would be for folks to encode the same >> accession for two different sequences, effectively making the primary >> key of the sequence object (accession, isDecoy) >> >> Do we want to go there? >> >> On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy >> <And...@li... <mailto:And...@li...>> >> wrote: >> >> So how about include length as an attribute and then let all other >> things go in the CV (pI, mass, etc.)? >> >> >> >> >> >> >> >> *From:* Jones, Andy >> *Sent:* 27 June 2008 14:54 >> *To:* 'David Creasy' >> *Subject:* RE: [Psidev-pi-dev] Representing Sequences >> >> >> >> id and name are standard for all elements that inherit from FuGE >> identifiable – this is perhaps a separate discussion as to whether >> the optional name attribute should be there. >> >> >> >> I agree that length may be useful – is this just an integer value >> with no unit? >> >> Yes, I think so. >> >> I'm less sure about pI and mass since mass at least can be calculated >> very simply >> >> Only if you have the sequence... (we have residue masses in the file). >> >> >> >> >> >> , and pI values (in my opinion) are pretty inaccurate and fairly >> meaningless >> >> Scandalous! (I happen to agree, but now some people will never speak >> to either of us ever again). >> >> The main problem with mass and pI is that these are 'irrelevant' if >> the sequence is nuleic acid rather than residues. >> Why not just allow CV there? We can share the same CV as the PEFF >> format, which includes, taxonomy, sequence type, gene ID, and lots of >> wonderful other things? >> >> – unless someone can convince me otherwise? >> >> Cheers >> >> Andy >> >> >> >> >> >> *From:* David Creasy [mailto:dc...@ma...] >> *Sent:* 27 June 2008 14:51 >> *To:* Jones, Andy >> *Cc:* psi...@li... >> <mailto:psi...@li...> >> *Subject:* Re: [Psidev-pi-dev] Representing Sequences >> >> >> >> Hi Andy, >> >> length may be useful, because some people won't want to output the >> actual sequence for space reasons. The other things we wanted to add >> before were pI and mass. >> Why do we want name? Is this for, say, a description line? >> (Also, identifier -> id?) >> >> David >> >> Jones, Andy wrote: >> >> Hi all, >> >> >> >> It was decided on the call that we would like to flag that Sequences >> in the ConceptualMoleculeCollection should have a Boolean attribute >> to capture if they are decoy sequences. At the moment we are using >> the FuGE:Sequence element. I don't really want to add another >> attribute to this (it's less problematic cutting down FuGE than >> adding new things), so I'm wondering if we should define our own >> Sequence type in AnalysisXML. This would also allow us to choose >> exactly the relevant attributes. At the moment, Sequence can have all >> of the following: >> >> >> >> <pf:Sequence isCircular="true" >> sequence="String" length="0" isApproximateLength="true" >> SequenceAnnotationSet_ref="String" start="0" end="0" >> identifier="String" name="String"> >> >> >> >> Several of these attributes were created to represent concepts that >> probably will never be required or implemented in AnalysisXML. How >> about the following: >> >> >> >> <DBSequence identifier = "" name = "" isDecoy = "true"> >> >> <seq>MCTMG...</seq> >> >> <pf:DatabaseReference Database_ref="" >> accession="Rev_IPI00013808.1"/> >> >> </DBSequence> >> >> >> >> Are any of the other attributes on Sequence actually required? I'll >> post a new version of the schema with other changes WRT to >> PeptideEvidence shortly, >> >> Cheers >> >> Andy >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://sourceforge.net/services/buy/index.php >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... <mailto:Psi...@li...> >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> >> >> >> >> -- >> David Creasy >> Matrix Science >> 64 Baker Street >> London W1U 7GB, UK >> Tel: +44 (0)20 7486 1050 >> Fax: +44 (0)20 7224 1344 >> >> dc...@ma... <mailto:dc...@ma...> >> http://www.matrixscience.com >> >> Matrix Science Ltd. is registered in England and Wales >> Company number 3533898 >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> >> >> >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://sourceforge.net/services/buy/index.php >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... <mailto:Psi...@li...> >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> >> >> >> >> -- >> David Creasy >> Matrix Science >> 64 Baker Street >> London W1U 7GB, UK >> Tel: +44 (0)20 7486 1050 >> Fax: +44 (0)20 7224 1344 >> >> dc...@ma... <mailto:dc...@ma...> >> http://www.matrixscience.com >> >> Matrix Science Ltd. is registered in England and Wales >> Company number 3533898 >> >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://sourceforge.net/services/buy/index.php >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> <mailto:Psi...@li...> >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> >> >> >> >> -- >> Angel Pizarro >> Director, ITMAT Bioinformatics Facility >> 806 Biological Research Building >> 421 Curie Blvd. >> Philadelphia, PA 19104-6160 >> 215-573-3736 >> >> ------------------------------------------------------------------------ >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://sourceforge.net/services/buy/index.php >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> > > -- > David Creasy > Matrix Science > 64 Baker Street > London W1U 7GB, UK > Tel: +44 (0)20 7486 1050 > Fax: +44 (0)20 7224 1344 > > dc...@ma... > http://www.matrixscience.com > > Matrix Science Ltd. is registered in England and Wales > Company number 3533898 > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > ------------------------------------------------------------------------ > > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > |
From: David C. <dc...@ma...> - 2008-07-03 14:22:01
|
Hi Pierre-Alain, Pierre-Alain Binz wrote: > Thanks David. > a couple of questions, just to make sure: In England, "couple" means 2. In the US, I believe it can mean any. So, being English, I'll just answer the first two questions ;) - actually, the rest of the questions are all good too! > > 1) in case of top-down approach, do we have to duplicate > sequenceCollection information? as SpectrumIdentificationResult contains > a PeptideEvidence refering to a Peptide element (and not to a > DBSequence), identification is obligatory a Peptide? I guess so, yes. > > 2) and what about spectral library searches, do we have to have Peptide > elements with possibly undefined explicit sequences to refer to from the > SpectrumIdentificationResult (because non peptidic, or because not > identified but good spectrum) PeptideEvidence is not a required element. However, we don't have an example instance document for spectral library searches. Would you like to volunteer? > > 3) in the Peptide element, the Modifications are defined in a much more > detailed manner than in ModificationParams (PSI-MOD is there for > instance). Does this simply mean that The ModificationParams codes the > search engine settings and the Peptide includes the formal PSI > definition of the Mod? And the only reference is the ModName value? The example document is not yet complete here... and yes, it needs a little more thought. However, we expect to provide a PSI mod definition. > > 4) all mass values (sequenceMass, calculatedMassToCharge, > experimentalMassToCharge, are not specified whether monoisotopic or > averaged. Do we assume that averaged does not exist anymore? No, average is still allowed. I've added this to http://code.google.com/p/psi-pi/issues/detail?id=13 > > 5) is sequenceMass the mass value with/without the mods? If with, the > name might be missleading (peptideMass would be more appropriate) Yes, it is without mods. I've added this to http://code.google.com/p/psi-pi/wiki/NotesForFocumentation > > 6) in case the DBSequence is nucleotide, is there a tag for saying this? Up for discussion at the telecon today I hope. > (NB: MS on nucleotide molecules can be performed and analysed, not only > MS on AA sequences that are interpreting nucleotide sequences). Or do we > neglect MS experiments done on nucleotide molecules (and by the way on > glycans...) and only represent the DBSequences as AA sequences (frame > translations)? (and what about glycans?) Probaly can be solved if one > can replace SequenceCollection by something else if needed > (SmallMoleculeCollection, GlycanCollection, MoleculeCollection)... but > the validator might not like this. > > 7) in case that DBSequence is nucleotide, do we represent the Peptide as > AA sequence in case of MS done on proteins? Yes - see the first item in: http://code.google.com/p/psi-pi/wiki/NotesForFocumentation Thanks very much for the questions. David > > That's all for the sequence representation so far > > Cheers, > Pierre-Alain > > > David Creasy wrote: >> Thanks Andy, >> >> I've added an updated example document to SVN: >> http://code.google.com/p/psi-pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F001350.xml >> >> Problem is that we have now removed the main point of these recent >> changes which was to add the decoy flag... I think that we need to add >> isDecoy to SpectrumIdentificationItem. >> >> And yes, I suspect that we should go back to using the >> ConceptualMoleculeCollection >> Um, and since we've not actually ended up adding anything to >> DBSequence... we haven't actually achieved anything? >> I think we need to discuss this again at the next telecon. >> >> David >> >> Jones, Andy wrote: >>> >>> Hi all, >>> >>> >>> >>> I’ve updated the schema in SVN with the following main changes: >>> >>> >>> >>> - PeptideEvidence is now part of SpectrumIdentificationItem >>> as discussed on the call (simple mappings to proteins are done at >>> this level) >>> >>> - Added DBSequence that should be used instead of Sequence >>> (following some of the discussion below) >>> >>> - Created a new collection class SequenceCollection (rather >>> than ConceptualMoleculeCollection) so that only references can be >>> given to DBSequence and Peptide >>> >>> o In fact, I’m not sure if this is sensible since it prevents other >>> types of ConceptualMolecule being added later... to discuss >>> >>> - In FuGE on cvParam, the value attribute is no longer mandatory >>> >>> >>> >>> I’ve added a simple example that validates under >>> examples\schema_usecase_examples\working27June >>> >>> >>> >>> Feel free to mail me any changes to make on Monday, >>> >>> Cheers >>> >>> Andy >>> >>> >>> >>> >>> >>> >>> >>> *From:* psi...@li... >>> [mailto:psi...@li...] *On Behalf Of >>> *Jones, Andy >>> *Sent:* 27 June 2008 16:24 >>> *To:* Angel Pizarro >>> *Cc:* psi...@li... >>> *Subject:* Re: [Psidev-pi-dev] FW: Representing Sequences >>> >>> >>> >>> I think Angel’s response below might not have made it round the list yet. >>> >>> >>> >>> I tend to agree that isDecoy is redundant information and perhaps >>> this is not the best place to encode semantic information. An >>> alternative would be to have a parameter, say on >>> SpectrumIdentification for cvParam = “decoy_string” value = “Rev”. >>> This would be a more compact representation and we would not have to >>> add what is quite a specific attribute type (isDecoy) to Sequence. >>> >>> >>> >>> >>> >>> >>> >>> *From:* an...@it... [mailto:an...@it...] *On >>> Behalf Of *Angel Pizarro >>> *Sent:* 27 June 2008 15:59 >>> *To:* Jones, Andy >>> *Cc:* psi...@li... >>> *Subject:* Re: [Psidev-pi-dev] FW: Representing Sequences >>> >>> >>> >>> my 2¢ : >>> You need to be able to extend this to all molecule types, or am I >>> missing the point of this thread, and you mean that this would be a >>> suclass of the conceptual molecule element? >>> >>> Second, and this is is tangentially related, but are decoy sequences >>> really a problem we should be putting our effort into? Is it in our >>> domain to encode semantic information about a sequence, and possibly >>> relating reported sequences as part of our schema? >>> On a personal level I could care less if "isDecoy" is an attribute or >>> not, but the temptation then would be for folks to encode the same >>> accession for two different sequences, effectively making the primary >>> key of the sequence object (accession, isDecoy) >>> >>> Do we want to go there? >>> >>> On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy >>> <And...@li... <mailto:And...@li...>> >>> wrote: >>> >>> So how about include length as an attribute and then let all other >>> things go in the CV (pI, mass, etc.)? >>> >>> >>> >>> >>> >>> >>> >>> *From:* Jones, Andy >>> *Sent:* 27 June 2008 14:54 >>> *To:* 'David Creasy' >>> *Subject:* RE: [Psidev-pi-dev] Representing Sequences >>> >>> >>> >>> id and name are standard for all elements that inherit from FuGE >>> identifiable – this is perhaps a separate discussion as to whether >>> the optional name attribute should be there. >>> >>> >>> >>> I agree that length may be useful – is this just an integer value >>> with no unit? >>> >>> Yes, I think so. >>> >>> I'm less sure about pI and mass since mass at least can be calculated >>> very simply >>> >>> Only if you have the sequence... (we have residue masses in the file). >>> >>> >>> >>> >>> >>> , and pI values (in my opinion) are pretty inaccurate and fairly >>> meaningless >>> >>> Scandalous! (I happen to agree, but now some people will never speak >>> to either of us ever again). >>> >>> The main problem with mass and pI is that these are 'irrelevant' if >>> the sequence is nuleic acid rather than residues. >>> Why not just allow CV there? We can share the same CV as the PEFF >>> format, which includes, taxonomy, sequence type, gene ID, and lots of >>> wonderful other things? >>> >>> – unless someone can convince me otherwise? >>> >>> Cheers >>> >>> Andy >>> >>> >>> >>> >>> >>> *From:* David Creasy [mailto:dc...@ma...] >>> *Sent:* 27 June 2008 14:51 >>> *To:* Jones, Andy >>> *Cc:* psi...@li... >>> <mailto:psi...@li...> >>> *Subject:* Re: [Psidev-pi-dev] Representing Sequences >>> >>> >>> >>> Hi Andy, >>> >>> length may be useful, because some people won't want to output the >>> actual sequence for space reasons. The other things we wanted to add >>> before were pI and mass. >>> Why do we want name? Is this for, say, a description line? >>> (Also, identifier -> id?) >>> >>> David >>> >>> Jones, Andy wrote: >>> >>> Hi all, >>> >>> >>> >>> It was decided on the call that we would like to flag that Sequences >>> in the ConceptualMoleculeCollection should have a Boolean attribute >>> to capture if they are decoy sequences. At the moment we are using >>> the FuGE:Sequence element. I don't really want to add another >>> attribute to this (it's less problematic cutting down FuGE than >>> adding new things), so I'm wondering if we should define our own >>> Sequence type in AnalysisXML. This would also allow us to choose >>> exactly the relevant attributes. At the moment, Sequence can have all >>> of the following: >>> >>> >>> >>> <pf:Sequence isCircular="true" >>> sequence="String" length="0" isApproximateLength="true" >>> SequenceAnnotationSet_ref="String" start="0" end="0" >>> identifier="String" name="String"> >>> >>> >>> >>> Several of these attributes were created to represent concepts that >>> probably will never be required or implemented in AnalysisXML. How >>> about the following: >>> >>> >>> >>> <DBSequence identifier = "" name = "" isDecoy = "true"> >>> >>> <seq>MCTMG...</seq> >>> >>> <pf:DatabaseReference Database_ref="" >>> accession="Rev_IPI00013808.1"/> >>> >>> </DBSequence> >>> >>> >>> >>> Are any of the other attributes on Sequence actually required? I'll >>> post a new version of the schema with other changes WRT to >>> PeptideEvidence shortly, >>> >>> Cheers >>> >>> Andy >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://sourceforge.net/services/buy/index.php >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... <mailto:Psi...@li...> >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> >>> >>> >>> -- >>> David Creasy >>> Matrix Science >>> 64 Baker Street >>> London W1U 7GB, UK >>> Tel: +44 (0)20 7486 1050 >>> Fax: +44 (0)20 7224 1344 >>> >>> dc...@ma... <mailto:dc...@ma...> >>> http://www.matrixscience.com >>> >>> Matrix Science Ltd. is registered in England and Wales >>> Company number 3533898 >>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://sourceforge.net/services/buy/index.php >>> >>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... <mailto:Psi...@li...> >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> >>> >>> >>> -- >>> David Creasy >>> Matrix Science >>> 64 Baker Street >>> London W1U 7GB, UK >>> Tel: +44 (0)20 7486 1050 >>> Fax: +44 (0)20 7224 1344 >>> >>> dc...@ma... <mailto:dc...@ma...> >>> http://www.matrixscience.com >>> >>> Matrix Science Ltd. is registered in England and Wales >>> Company number 3533898 >>> >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://sourceforge.net/services/buy/index.php >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> <mailto:Psi...@li...> >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> >>> >>> >>> -- >>> Angel Pizarro >>> Director, ITMAT Bioinformatics Facility >>> 806 Biological Research Building >>> 421 Curie Blvd. >>> Philadelphia, PA 19104-6160 >>> 215-573-3736 >>> >>> ------------------------------------------------------------------------ >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://sourceforge.net/services/buy/index.php >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >> >> -- >> David Creasy >> Matrix Science >> 64 Baker Street >> London W1U 7GB, UK >> Tel: +44 (0)20 7486 1050 >> Fax: +44 (0)20 7224 1344 >> >> dc...@ma... >> http://www.matrixscience.com >> >> Matrix Science Ltd. is registered in England and Wales >> Company number 3533898 >> ------------------------------------------------------------------------ >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://sourceforge.net/services/buy/index.php >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: Martin E. <mar...@ru...> - 2008-07-30 12:05:21
|
Hi Pierre-Alain, quite old posting, but I saw no answer yet, so I will try: >2nd July, 2008: >a couple of questions, just to make sure: >1) in case of top-down approach, do we have to duplicate sequenceCollection information? I hope not, by referencing the same identifier. >as SpectrumIdentificationResult contains a PeptideEvidence refering to a Peptide element >(and not to a DBSequence), identification is obligatory a Peptide? At the moment I think it's possible to directly reference a DBSeq. At the time the foreign key definitions are implemented we can forbid that. But we should have in mind, that a peptide is a sequence plus modifications, so if top-down identifies only a sequence, we should allow that and if top-down identifies with mods, we should forbid that. It would be quite helpful to have a top-down instance doc. To check whether our thoughts are really deep enough... >2) and what about spectral library searches, do we have to have Peptide >elements with possibly undefined explicit sequences to refer to >from the SpectrumIdentificationResult (because non peptidic, or because not identified >but good spectrum) At the moment the sequence element can be empty or even left out. User or CV params are allowed. How do they report results in spectral lib search if they identify non-peptidic or unidentified? We need CV terms for that... >3) in the Peptide element, the Modifications are defined in a much more >detailed manner than in ModificationParams (PSI-MOD is there for >instance). Does this simply mean that The ModificationParams codes >the search engine settings and the Peptide includes the formal PSI >definition of the Mod? And the only reference is the ModName value? I think that has changed meanwhile, in the MPC use case I used PSI-MOD terms for both. If a search engine has its "own" mods, we need CV for that in PSI-PI CV or they can define their own. >4) all mass values (sequenceMass, calculatedMassToCharge, experimentalMassToCharge, >are not specified whether monoisotopic or averaged. >Do we assume that averaged does not exist anymore? No, we decided to have only one type of masses in the whole analysisXML. But I cannot find a note for that or a schema attribute... I will add an issue for that. >5) is sequenceMass the mass value with/without the mods? If with, the >name might be missleading (peptideMass would be more appropriate) It is indeed the mass of the sequence without mods. THAT is described in http://code.google.com/p/psi-pi/wiki/NotesForFocumentation >6) in case the DBSequence is nucleotide, is there a tag for saying >this? (NB: MS on nucleotide molecules can be performed and analysed, >not only MS on AA sequences that are interpreting nucleotide sequences). >Or do we neglect MS experiments done on nucleotide molecules (and by >the way on glycans...) and only represent the DBSequences as AA >sequences (frame translations)? (and what about glycans?) >Probaly can be solved if one can replace SequenceCollection by >something else if needed (SmallMoleculeCollection, GlycanCollection, >MoleculeCollection)... but the validator might not like this. Mh, these can be extensions, I think they are not possible at the moment. But a tag for the type can indeed be useful, it could be a CV param. I will create an issue for that. >7) in case that DBSequence is nucleotide, do we represent the >Peptide as AA sequence in case of MS done on proteins? I hope the following answers this: <DBSequence> is the nucleotide seq from the nucleotide DB, <Peptide> is the identified amino acid sequence plus mods (without any translation frame or something). <PeptideEvidence> contains the DBSequence_Ref together with a frame and a TranslationTable_Ref attribute. (The Peptide_Ref is done in SpectrumIdentificationItem as in the amino acid DB case.) If a protein detection is performed, there are <PeptideHypothesis> elements referencing PeptideEvidence elements from SpectrumIdentificationItem sections. Bye Martin David Creasy wrote: Thanks Andy, I've added an updated example document to SVN: http://code.google.com/p/psi-pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F001350.xml Problem is that we have now removed the main point of these recent changes which was to add the decoy flag... I think that we need to add isDecoy to SpectrumIdentificationItem. And yes, I suspect that we should go back to using the ConceptualMoleculeCollection Um, and since we've not actually ended up adding anything to DBSequence... we haven't actually achieved anything? I think we need to discuss this again at the next telecon. David Jones, Andy wrote: Hi all, Ive updated the schema in SVN with the following main changes: PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the call (simple mappings to proteins are done at this level) Added DBSequence that should be used instead of Sequence (following some of the discussion below) Created a new collection class SequenceCollection (rather than ConceptualMoleculeCollection) so that only references can be given to DBSequence and Peptide In fact, Im not sure if this is sensible since it prevents other types of ConceptualMolecule being added later... to discuss In FuGE on cvParam, the value attribute is no longer mandatory Ive added a simple example that validates under examples\schema_usecase_examples\working27June Feel free to mail me any changes to make on Monday, Cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of Jones, Andy Sent: 27 June 2008 16:24 To: Angel Pizarro Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences I think Angels response below might not have made it round the list yet. I tend to agree that isDecoy is redundant information and perhaps this is not the best place to encode semantic information. An alternative would be to have a parameter, say on SpectrumIdentification for cvParam = decoy_string value = Rev. This would be a more compact representation and we would not have to add what is quite a specific attribute type (isDecoy) to Sequence. From: an...@it... [mailto:an...@it...] On Behalf Of Angel Pizarro Sent: 27 June 2008 15:59 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences my 2¢ : You need to be able to extend this to all molecule types, or am I missing the point of this thread, and you mean that this would be a suclass of the conceptual molecule element? Second, and this is is tangentially related, but are decoy sequences really a problem we should be putting our effort into? Is it in our domain to encode semantic information about a sequence, and possibly relating reported sequences as part of our schema? On a personal level I could care less if "isDecoy" is an attribute or not, but the temptation then would be for folks to encode the same accession for two different sequences, effectively making the primary key of the sequence object (accession, isDecoy) Do we want to go there? On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> wrote: So how about include length as an attribute and then let all other things go in the CV (pI, mass, etc.)? From: Jones, Andy Sent: 27 June 2008 14:54 To: 'David Creasy' Subject: RE: [Psidev-pi-dev] Representing Sequences id and name are standard for all elements that inherit from FuGE identifiable this is perhaps a separate discussion as to whether the optional name attribute should be there. I agree that length may be useful is this just an integer value with no unit? Yes, I think so. I'm less sure about pI and mass since mass at least can be calculated very simply Only if you have the sequence... (we have residue masses in the file). , and pI values (in my opinion) are pretty inaccurate and fairly meaningless Scandalous! (I happen to agree, but now some people will never speak to either of us ever again). The main problem with mass and pI is that these are 'irrelevant' if the sequence is nuleic acid rather than residues. Why not just allow CV there? We can share the same CV as the PEFF format, which includes, taxonomy, sequence type, gene ID, and lots of wonderful other things? unless someone can convince me otherwise? Cheers Andy From: David Creasy [mailto:dc...@ma...] Sent: 27 June 2008 14:51 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] Representing Sequences Hi Andy, length may be useful, because some people won't want to output the actual sequence for space reasons. The other things we wanted to add before were pI and mass. Why do we want name? Is this for, say, a description line? (Also, identifier -> id?) David Jones, Andy wrote: Hi all, It was decided on the call that we would like to flag that Sequences in the ConceptualMoleculeCollection should have a Boolean attribute to capture if they are decoy sequences. At the moment we are using the FuGE:Sequence element. I don't really want to add another attribute to this (it's less problematic cutting down FuGE than adding new things), so I'm wondering if we should define our own Sequence type in AnalysisXML. This would also allow us to choose exactly the relevant attributes. At the moment, Sequence can have all of the following: <pf:Sequence isCircular="true" sequence="String" length="0" isApproximateLength="true" SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" name="String"> Several of these attributes were created to represent concepts that probably will never be required or implemented in AnalysisXML. How about the following: <DBSequence identifier = "" name = "" isDecoy = "true"> <seq>MCTMG...</seq> <pf:DatabaseReference Database_ref="" accession="Rev_IPI00013808.1"/> </DBSequence> Are any of the other attributes on Sequence actually required? I'll post a new version of the schema with other changes WRT to PeptideEvidence shortly, Cheers Andy ________________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ________________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 ________________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ________________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev |
From: Jones, A. <And...@li...> - 2008-07-30 13:36:09
|
Hi all, > >as SpectrumIdentificationResult contains a PeptideEvidence refering to a Peptide > element > >(and not to a DBSequence), identification is obligatory a Peptide? > At the moment I think it's possible to directly reference a DBSeq. At the time the > foreign key definitions are implemented we can forbid that. > But we should have in mind, that a peptide is a sequence plus modifications, so if > top-down > identifies only a sequence, we should allow that and if top-down identifies with > mods, > we should forbid that. > It would be quite helpful to have a top-down instance doc. To check > whether our thoughts are really deep enough... As I see it SpectrumIdentificationItem is intended only for identifying Peptides. I didn't fully understand Martin's response about Mods. We have to focus on what use cases we state we are supporting... Looking at it again, the model of SpectrumIdentificationItem is a little hard to understand and we could probably improve it. This is because SpectrumIdentificationItem has both Peptide_ref (i.e. a reference to a Peptide sequence and its mods) plus PeptideEvidence which is a reference to the part of the ProteinSequence this Peptide was derived from. The PeptideEvidence lines could be shifted up to <Peptide> and renamed e.g. SourceProtein - this would save some space and would appear to be a logically more sensible model... I notice also that there is a small error in the schema in that on PeptideEvidence DBSequence_ref should be mandatory (and it is missing from the instance docs). I can fix this if there is agreement on this? > >2) and what about spectral library searches, do we have to have Peptide > >elements with possibly undefined explicit sequences to refer to > >from the SpectrumIdentificationResult (because non peptidic, or because not > identified > >but good spectrum) > At the moment the sequence element can be empty or even left out. > User or CV params are allowed. > How do they report results in spectral lib search if they identify non-peptidic or > unidentified? > We need CV terms for that... I don't quite get this point. What is reported from a spectral library search if it is unidentified - how does this differ from no result? In terms of non-peptidic, are we talking about identifying small molecules? This is analysisXML version 2 :-) > >3) in the Peptide element, the Modifications are defined in a much more > >detailed manner than in ModificationParams (PSI-MOD is there for > >instance). Does this simply mean that The ModificationParams codes > >the search engine settings and the Peptide includes the formal PSI > >definition of the Mod? And the only reference is the ModName value? > I think that has changed meanwhile, in the MPC use case I used PSI-MOD terms > for both. If a search engine has its "own" mods, we need CV for that in PSI-PI CV > or > they can define their own. Mods proposal coming from Angel. > >4) all mass values (sequenceMass, calculatedMassToCharge, > experimentalMassToCharge, > >are not specified whether monoisotopic or averaged. > >Do we assume that averaged does not exist anymore? > No, we decided to have only one type of masses in the whole analysisXML. > But I cannot find a note for that or a schema attribute... I will add an issue for that. It is a database search parameter: <AdditionalSearchParams> <pf:cvParam accession="PRIDE:0000162" name="Mass value type setting monoisotopic" cvRef="PRIDE"/> > >6) in case the DBSequence is nucleotide, is there a tag for saying > >this? DBSequence can have cvParams, so we could easily add a sequenceType = Nucleic acid CV term. Cheers Andy > -----Original Message----- > From: psi...@li... [mailto:psidev-pi-dev- > bo...@li...] On Behalf Of Martin Eisenacher > Sent: 30 July 2008 13:05 > To: 'Pierre-Alain Binz' > Cc: psi...@li... > Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > > Hi Pierre-Alain, quite old posting, but I saw no answer yet, so I will try: > > >2nd July, 2008: > >a couple of questions, just to make sure: > > >1) in case of top-down approach, do we have to duplicate sequenceCollection > information? > I hope not, by referencing the same identifier. > > >as SpectrumIdentificationResult contains a PeptideEvidence refering to a Peptide > element > >(and not to a DBSequence), identification is obligatory a Peptide? > At the moment I think it's possible to directly reference a DBSeq. At the time the > foreign key definitions are implemented we can forbid that. > But we should have in mind, that a peptide is a sequence plus modifications, so if > top-down > identifies only a sequence, we should allow that and if top-down identifies with > mods, > we should forbid that. > It would be quite helpful to have a top-down instance doc. To check > whether our thoughts are really deep enough... > > >2) and what about spectral library searches, do we have to have Peptide > >elements with possibly undefined explicit sequences to refer to > >from the SpectrumIdentificationResult (because non peptidic, or because not > identified > >but good spectrum) > At the moment the sequence element can be empty or even left out. > User or CV params are allowed. > How do they report results in spectral lib search if they identify non-peptidic or > unidentified? > We need CV terms for that... > > >3) in the Peptide element, the Modifications are defined in a much more > >detailed manner than in ModificationParams (PSI-MOD is there for > >instance). Does this simply mean that The ModificationParams codes > >the search engine settings and the Peptide includes the formal PSI > >definition of the Mod? And the only reference is the ModName value? > I think that has changed meanwhile, in the MPC use case I used PSI-MOD terms > for both. If a search engine has its "own" mods, we need CV for that in PSI-PI CV > or > they can define their own. > > >4) all mass values (sequenceMass, calculatedMassToCharge, > experimentalMassToCharge, > >are not specified whether monoisotopic or averaged. > >Do we assume that averaged does not exist anymore? > No, we decided to have only one type of masses in the whole analysisXML. > But I cannot find a note for that or a schema attribute... I will add an issue for that. > > > >5) is sequenceMass the mass value with/without the mods? If with, the > >name might be missleading (peptideMass would be more appropriate) > It is indeed the mass of the sequence without mods. > THAT is described in http://code.google.com/p/psi-pi/wiki/NotesForFocumentation > > >6) in case the DBSequence is nucleotide, is there a tag for saying > >this? (NB: MS on nucleotide molecules can be performed and analysed, > >not only MS on AA sequences that are interpreting nucleotide sequences). > >Or do we neglect MS experiments done on nucleotide molecules (and by > >the way on glycans...) and only represent the DBSequences as AA > >sequences (frame translations)? (and what about glycans?) > >Probaly can be solved if one can replace SequenceCollection by > >something else if needed (SmallMoleculeCollection, GlycanCollection, > >MoleculeCollection)... but the validator might not like this. > Mh, these can be extensions, I think they are not possible at the moment. > But a tag for the type can indeed be useful, it could be a CV param. > I will create an issue for that. > > >7) in case that DBSequence is nucleotide, do we represent the > >Peptide as AA sequence in case of MS done on proteins? > I hope the following answers this: > > <DBSequence> is the nucleotide seq from the nucleotide DB, > <Peptide> is the identified amino acid sequence plus mods (without any translation > frame or something). > <PeptideEvidence> contains the DBSequence_Ref together with a frame and a > TranslationTable_Ref attribute. > (The Peptide_Ref is done in SpectrumIdentificationItem as in the amino acid DB > case.) > If a protein detection is performed, there are <PeptideHypothesis> elements > referencing > PeptideEvidence elements from SpectrumIdentificationItem sections. > > > > Bye > Martin > > > > > David Creasy wrote: > Thanks Andy, > > I've added an updated example document to SVN: > http://code.google.com/p/psi- > pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F00 > 1350.xml > > Problem is that we have now removed the main point of these recent changes > which was to add the decoy flag... I think > that we need to add isDecoy to SpectrumIdentificationItem. > > And yes, I suspect that we should go back to using the > ConceptualMoleculeCollection > Um, and since we've not actually ended up adding anything to DBSequence... we > haven't actually achieved anything? > I think we need to discuss this again at the next telecon. > > David > > Jones, Andy wrote: > Hi all, > > I’ve updated the schema in SVN with the following main changes: > > PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the > call (simple mappings to proteins are done > at this level) > Added DBSequence that should be used instead of Sequence (following some of > the discussion below) > Created a new collection class SequenceCollection (rather than > ConceptualMoleculeCollection) so that only references can > be given to DBSequence and Peptide > In fact, I’m not sure if this is sensible since it prevents other types of > ConceptualMolecule being added later... to > discuss > In FuGE on cvParam, the value attribute is no longer mandatory > > I’ve added a simple example that validates under > examples\schema_usecase_examples\working27June > > Feel free to mail me any changes to make on Monday, > Cheers > Andy > > > > From: psi...@li... [mailto:psidev-pi-dev- > bo...@li...] On Behalf Of > Jones, Andy > Sent: 27 June 2008 16:24 > To: Angel Pizarro > Cc: psi...@li... > Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > > I think Angel’s response below might not have made it round the list yet. > > I tend to agree that isDecoy is redundant information and perhaps this is not the > best place to encode semantic > information. An alternative would be to have a parameter, say on > SpectrumIdentification for cvParam = “decoy_string” > value = “Rev”. This would be a more compact representation and we would not > have to add what is quite a specific > attribute type (isDecoy) to Sequence. > > > > From: an...@it... [mailto:an...@it...] On Behalf Of Angel > Pizarro > Sent: 27 June 2008 15:59 > To: Jones, Andy > Cc: psi...@li... > Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > > my 2¢ : > You need to be able to extend this to all molecule types, or am I missing the point > of this thread, and you mean that > this would be a suclass of the conceptual molecule element? > > Second, and this is is tangentially related, but are decoy sequences really a > problem we should be putting our effort > into? Is it in our domain to encode semantic information about a sequence, and > possibly relating reported sequences as > part of our schema? > On a personal level I could care less if "isDecoy" is an attribute or not, but the > temptation then would be for folks to > encode the same accession for two different sequences, effectively making the > primary key of the sequence object > (accession, isDecoy) > > > Do we want to go there? > On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> > wrote: > So how about include length as an attribute and then let all other things go in the > CV (pI, mass, etc.)? > > > > From: Jones, Andy > Sent: 27 June 2008 14:54 > To: 'David Creasy' > Subject: RE: [Psidev-pi-dev] Representing Sequences > > id and name are standard for all elements that inherit from FuGE identifiable – this > is perhaps a separate discussion as > to whether the optional name attribute should be there. > > I agree that length may be useful – is this just an integer value with no unit? > Yes, I think so. > I'm less sure about pI and mass since mass at least can be calculated very simply > Only if you have the sequence... (we have residue masses in the file). > > > , and pI values (in my opinion) are pretty inaccurate and fairly meaningless > Scandalous! (I happen to agree, but now some people will never speak to either of > us ever again). > > The main problem with mass and pI is that these are 'irrelevant' if the sequence is > nuleic acid rather than residues. > Why not just allow CV there? We can share the same CV as the PEFF format, > which includes, taxonomy, sequence type, gene > ID, and lots of wonderful other things? > > > – unless someone can convince me otherwise? > Cheers > Andy > > > From: David Creasy [mailto:dc...@ma...] > Sent: 27 June 2008 14:51 > To: Jones, Andy > Cc: psi...@li... > Subject: Re: [Psidev-pi-dev] Representing Sequences > > Hi Andy, > > length may be useful, because some people won't want to output the actual > sequence for space reasons. The other things > we wanted to add before were pI and mass. > Why do we want name? Is this for, say, a description line? > (Also, identifier -> id?) > > David > > Jones, Andy wrote: > Hi all, > > It was decided on the call that we would like to flag that Sequences in the > ConceptualMoleculeCollection should have a > Boolean attribute to capture if they are decoy sequences. At the moment we are > using the FuGE:Sequence element. I don't > really want to add another attribute to this (it's less problematic cutting down FuGE > than adding new things), so I'm > wondering if we should define our own Sequence type in AnalysisXML. This > would also allow us to choose exactly the > relevant attributes. At the moment, Sequence can have all of the following: > > <pf:Sequence isCircular="true" sequence="String" length="0" > isApproximateLength="true" > SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" > name="String"> > > Several of these attributes were created to represent concepts that probably will > never be required or implemented in > AnalysisXML. How about the following: > > <DBSequence identifier = "" name = "" isDecoy = "true"> > <seq>MCTMG...</seq> > <pf:DatabaseReference Database_ref="" > accession="Rev_IPI00013808.1"/> > </DBSequence> > > Are any of the other attributes on Sequence actually required? I'll post a new > version of the schema with other changes > WRT to PeptideEvidence shortly, > Cheers > Andy > > > > > > > > > > > > ________________________________________ > > > > > > > > > > > > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > > > > > > > ________________________________________ > > > > > > > > > > > > > > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > -- > David Creasy > Matrix Science > 64 Baker Street > London W1U 7GB, UK > Tel: +44 (0)20 7486 1050 > Fax: +44 (0)20 7224 1344 > > dc...@ma... > http://www.matrixscience.com > > Matrix Science Ltd. is registered in England and Wales > Company number 3533898 > > > > ________________________________________ > > > > > > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > > > > > ________________________________________ > > > > > > > > > > > > > > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > -- > David Creasy > Matrix Science > 64 Baker Street > London W1U 7GB, UK > Tel: +44 (0)20 7486 1050 > Fax: +44 (0)20 7224 1344 > > dc...@ma... > http://www.matrixscience.com > > Matrix Science Ltd. is registered in England and Wales > Company number 3533898 > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > -- > Angel Pizarro > Director, ITMAT Bioinformatics Facility > 806 Biological Research Building > 421 Curie Blvd. > Philadelphia, PA 19104-6160 > 215-573-3736 > ________________________________________ > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > ________________________________________ > > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > -- > David Creasy > Matrix Science > 64 Baker Street > London W1U 7GB, UK > Tel: +44 (0)20 7486 1050 > Fax: +44 (0)20 7224 1344 > > dc...@ma... > http://www.matrixscience.com > > Matrix Science Ltd. is registered in England and Wales > Company number 3533898 > > ________________________________________ > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > > ________________________________________ > > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev |
From: Martin E. <mar...@ru...> - 2008-07-31 11:35:29
|
Hi Andy, hi all, > As I see it SpectrumIdentificationItem is intended only for identifying Peptides. I didn't fully understand Yes, I agree; but I understood Pierre-Alains question as a hint, that top-down identifies protein sequences, so we would have to double information, referencing a protein sequence as <Peptide> from <SpectrumIdentificationItem> and then the same sequence as <DBSequence> from <ProteinDetectionResult>. But I might be wrong and we definitely have to wait for a top-down instance doc. > Looking at it again, the model of SpectrumIdentificationItem is a little hard to understand and we could > probably improve it. This is because SpectrumIdentificationItem has both Peptide_ref (i.e. a reference to a > Peptide sequence and its mods) plus PeptideEvidence which is a reference to the part of the ProteinSequence > this Peptide was derived from. The PeptideEvidence lines could be shifted up to <Peptide> and renamed e.g. > SourceProtein - this would save some space and would appear to be a logically more sensible model... You mean shifting <PeptideEvidence> under <Peptide> in the SequenceCollection? But missedcleavages is only well-defined in relation to a search (using an enzyme)! > I notice also that there is a small error in the schema in that on PeptideEvidence DBSequence_ref should be > mandatory (and it is missing from the instance docs). I can fix this if there is agreement on this? Yes, if <PeptideEvidence> stays optional. > > >4) all mass values (sequenceMass, calculatedMassToCharge, > > experimentalMassToCharge, > > >are not specified whether monoisotopic or averaged. > > >Do we assume that averaged does not exist anymore? > > No, we decided to have only one type of masses in the whole analysisXML. > > But I cannot find a note for that or a schema attribute... I will add an issue for that. > > It is a database search parameter: > <AdditionalSearchParams> > <pf:cvParam accession="PRIDE:0000162" name="Mass value type setting monoisotopic" cvRef="PRIDE"/> Yes, it is, but in case we have more than one SpectrumIdentification, that could be conflicting. http://code.google.com/p/psi-pi/issues/detail?id=37 bye Martin > > > -----Original Message----- > > From: psi...@li... [mailto:psidev-pi-dev- > > bo...@li...] On Behalf Of Martin Eisenacher > > Sent: 30 July 2008 13:05 > > To: 'Pierre-Alain Binz' > > Cc: psi...@li... > > Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > > > > Hi Pierre-Alain, quite old posting, but I saw no answer yet, so I will try: > > > > >2nd July, 2008: > > >a couple of questions, just to make sure: > > > > >1) in case of top-down approach, do we have to duplicate sequenceCollection > > information? > > I hope not, by referencing the same identifier. > > > > >as SpectrumIdentificationResult contains a PeptideEvidence refering to a Peptide > > element > > >(and not to a DBSequence), identification is obligatory a Peptide? > > At the moment I think it's possible to directly reference a DBSeq. At the time the > > foreign key definitions are implemented we can forbid that. > > But we should have in mind, that a peptide is a sequence plus modifications, so if > > top-down > > identifies only a sequence, we should allow that and if top-down identifies with > > mods, > > we should forbid that. > > It would be quite helpful to have a top-down instance doc. To check > > whether our thoughts are really deep enough... > > > > >2) and what about spectral library searches, do we have to have Peptide > > >elements with possibly undefined explicit sequences to refer to > > >from the SpectrumIdentificationResult (because non peptidic, or because not > > identified > > >but good spectrum) > > At the moment the sequence element can be empty or even left out. > > User or CV params are allowed. > > How do they report results in spectral lib search if they identify non-peptidic or > > unidentified? > > We need CV terms for that... > > > > >3) in the Peptide element, the Modifications are defined in a much more > > >detailed manner than in ModificationParams (PSI-MOD is there for > > >instance). Does this simply mean that The ModificationParams codes > > >the search engine settings and the Peptide includes the formal PSI > > >definition of the Mod? And the only reference is the ModName value? > > I think that has changed meanwhile, in the MPC use case I used PSI-MOD terms > > for both. If a search engine has its "own" mods, we need CV for that in PSI-PI CV > > or > > they can define their own. > > > > >4) all mass values (sequenceMass, calculatedMassToCharge, > > experimentalMassToCharge, > > >are not specified whether monoisotopic or averaged. > > >Do we assume that averaged does not exist anymore? > > No, we decided to have only one type of masses in the whole analysisXML. > > But I cannot find a note for that or a schema attribute... I will add an issue for that. > > > > > > >5) is sequenceMass the mass value with/without the mods? If with, the > > >name might be missleading (peptideMass would be more appropriate) > > It is indeed the mass of the sequence without mods. > > THAT is described in http://code.google.com/p/psi-pi/wiki/NotesForFocumentation > > > > >6) in case the DBSequence is nucleotide, is there a tag for saying > > >this? (NB: MS on nucleotide molecules can be performed and analysed, > > >not only MS on AA sequences that are interpreting nucleotide sequences). > > >Or do we neglect MS experiments done on nucleotide molecules (and by > > >the way on glycans...) and only represent the DBSequences as AA > > >sequences (frame translations)? (and what about glycans?) > > >Probaly can be solved if one can replace SequenceCollection by > > >something else if needed (SmallMoleculeCollection, GlycanCollection, > > >MoleculeCollection)... but the validator might not like this. > > Mh, these can be extensions, I think they are not possible at the moment. > > But a tag for the type can indeed be useful, it could be a CV param. > > I will create an issue for that. > > > > >7) in case that DBSequence is nucleotide, do we represent the > > >Peptide as AA sequence in case of MS done on proteins? > > I hope the following answers this: > > > > <DBSequence> is the nucleotide seq from the nucleotide DB, > > <Peptide> is the identified amino acid sequence plus mods (without any translation > > frame or something). > > <PeptideEvidence> contains the DBSequence_Ref together with a frame and a > > TranslationTable_Ref attribute. > > (The Peptide_Ref is done in SpectrumIdentificationItem as in the amino acid DB > > case.) > > If a protein detection is performed, there are <PeptideHypothesis> elements > > referencing > > PeptideEvidence elements from SpectrumIdentificationItem sections. > > > > > > > > Bye > > Martin > > > > > > > > > > David Creasy wrote: > > Thanks Andy, > > > > I've added an updated example document to SVN: > > http://code.google.com/p/psi- > > pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F00 > > 1350.xml > > > > Problem is that we have now removed the main point of these recent changes > > which was to add the decoy flag... I think > > that we need to add isDecoy to SpectrumIdentificationItem. > > > > And yes, I suspect that we should go back to using the > > ConceptualMoleculeCollection > > Um, and since we've not actually ended up adding anything to DBSequence... we > > haven't actually achieved anything? > > I think we need to discuss this again at the next telecon. > > > > David > > > > Jones, Andy wrote: > > Hi all, > > > > Ive updated the schema in SVN with the following main changes: > > > > PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the > > call (simple mappings to proteins are done > > at this level) > > Added DBSequence that should be used instead of Sequence (following some of > > the discussion below) > > Created a new collection class SequenceCollection (rather than > > ConceptualMoleculeCollection) so that only references can > > be given to DBSequence and Peptide > > In fact, Im not sure if this is sensible since it prevents other types of > > ConceptualMolecule being added later... to > > discuss > > In FuGE on cvParam, the value attribute is no longer mandatory > > > > Ive added a simple example that validates under > > examples\schema_usecase_examples\working27June > > > > Feel free to mail me any changes to make on Monday, > > Cheers > > Andy > > > > > > > > From: psi...@li... [mailto:psidev-pi-dev- > > bo...@li...] On Behalf Of > > Jones, Andy > > Sent: 27 June 2008 16:24 > > To: Angel Pizarro > > Cc: psi...@li... > > Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > > > > I think Angels response below might not have made it round the list yet. > > > > I tend to agree that isDecoy is redundant information and perhaps this is not the > > best place to encode semantic > > information. An alternative would be to have a parameter, say on > > SpectrumIdentification for cvParam = decoy_string > > value = Rev. This would be a more compact representation and we would not > > have to add what is quite a specific > > attribute type (isDecoy) to Sequence. > > > > > > > > From: an...@it... [mailto:an...@it...] On Behalf Of Angel > > Pizarro > > Sent: 27 June 2008 15:59 > > To: Jones, Andy > > Cc: psi...@li... > > Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > > > > my 2¢ : > > You need to be able to extend this to all molecule types, or am I missing the point > > of this thread, and you mean that > > this would be a suclass of the conceptual molecule element? > > > > Second, and this is is tangentially related, but are decoy sequences really a > > problem we should be putting our effort > > into? Is it in our domain to encode semantic information about a sequence, and > > possibly relating reported sequences as > > part of our schema? > > On a personal level I could care less if "isDecoy" is an attribute or not, but the > > temptation then would be for folks to > > encode the same accession for two different sequences, effectively making the > > primary key of the sequence object > > (accession, isDecoy) > > > > > > Do we want to go there? > > On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> > > wrote: > > So how about include length as an attribute and then let all other things go in the > > CV (pI, mass, etc.)? > > > > > > > > From: Jones, Andy > > Sent: 27 June 2008 14:54 > > To: 'David Creasy' > > Subject: RE: [Psidev-pi-dev] Representing Sequences > > > > id and name are standard for all elements that inherit from FuGE identifiable this > > is perhaps a separate discussion as > > to whether the optional name attribute should be there. > > > > I agree that length may be useful is this just an integer value with no unit? > > Yes, I think so. > > I'm less sure about pI and mass since mass at least can be calculated very simply > > Only if you have the sequence... (we have residue masses in the file). > > > > > > , and pI values (in my opinion) are pretty inaccurate and fairly meaningless > > Scandalous! (I happen to agree, but now some people will never speak to either of > > us ever again). > > > > The main problem with mass and pI is that these are 'irrelevant' if the sequence is > > nuleic acid rather than residues. > > Why not just allow CV there? We can share the same CV as the PEFF format, > > which includes, taxonomy, sequence type, gene > > ID, and lots of wonderful other things? > > > > > > unless someone can convince me otherwise? > > Cheers > > Andy > > > > > > From: David Creasy [mailto:dc...@ma...] > > Sent: 27 June 2008 14:51 > > To: Jones, Andy > > Cc: psi...@li... > > Subject: Re: [Psidev-pi-dev] Representing Sequences > > > > Hi Andy, > > > > length may be useful, because some people won't want to output the actual > > sequence for space reasons. The other things > > we wanted to add before were pI and mass. > > Why do we want name? Is this for, say, a description line? > > (Also, identifier -> id?) > > > > David > > > > Jones, Andy wrote: > > Hi all, > > > > It was decided on the call that we would like to flag that Sequences in the > > ConceptualMoleculeCollection should have a > > Boolean attribute to capture if they are decoy sequences. At the moment we are > > using the FuGE:Sequence element. I don't > > really want to add another attribute to this (it's less problematic cutting down FuGE > > than adding new things), so I'm > > wondering if we should define our own Sequence type in AnalysisXML. This > > would also allow us to choose exactly the > > relevant attributes. At the moment, Sequence can have all of the following: > > > > <pf:Sequence isCircular="true" sequence="String" length="0" > > isApproximateLength="true" > > SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" > > name="String"> > > > > Several of these attributes were created to represent concepts that probably will > > never be required or implemented in > > AnalysisXML. How about the following: > > > > <DBSequence identifier = "" name = "" isDecoy = "true"> > > <seq>MCTMG...</seq> > > <pf:DatabaseReference Database_ref="" > > accession="Rev_IPI00013808.1"/> > > </DBSequence> > > > > Are any of the other attributes on Sequence actually required? I'll post a new > > version of the schema with other changes > > WRT to PeptideEvidence shortly, > > Cheers > > Andy > > > > > > > > > > > > > > > > > > > > > > > > ________________________________________ > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------- > > Check out the new SourceForge.net Marketplace. > > It's the best place to buy or sell services for > > just about anything Open Source. > > http://sourceforge.net/services/buy/index.php > > > > > > > > > > > > > > ________________________________________ > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Psidev-pi-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > > > -- > > David Creasy > > Matrix Science > > 64 Baker Street > > London W1U 7GB, UK > > Tel: +44 (0)20 7486 1050 > > Fax: +44 (0)20 7224 1344 > > > > dc...@ma... > > http://www.matrixscience.com > > > > Matrix Science Ltd. is registered in England and Wales > > Company number 3533898 > > > > > > > > ________________________________________ > > > > > > > > > > > > > > > > ------------------------------------------------------------------------- > > Check out the new SourceForge.net Marketplace. > > It's the best place to buy or sell services for > > just about anything Open Source. > > http://sourceforge.net/services/buy/index.php > > > > > > > > > > ________________________________________ > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Psidev-pi-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > > > -- > > David Creasy > > Matrix Science > > 64 Baker Street > > London W1U 7GB, UK > > Tel: +44 (0)20 7486 1050 > > Fax: +44 (0)20 7224 1344 > > > > dc...@ma... > > http://www.matrixscience.com > > > > Matrix Science Ltd. is registered in England and Wales > > Company number 3533898 > > > > ------------------------------------------------------------------------- > > Check out the new SourceForge.net Marketplace. > > It's the best place to buy or sell services for > > just about anything Open Source. > > http://sourceforge.net/services/buy/index.php > > _______________________________________________ > > Psidev-pi-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > > > > > -- > > Angel Pizarro > > Director, ITMAT Bioinformatics Facility > > 806 Biological Research Building > > 421 Curie Blvd. > > Philadelphia, PA 19104-6160 > > 215-573-3736 > > ________________________________________ > > > > ------------------------------------------------------------------------- > > Check out the new SourceForge.net Marketplace. > > It's the best place to buy or sell services for > > just about anything Open Source. > > http://sourceforge.net/services/buy/index.php > > ________________________________________ > > > > _______________________________________________ > > Psidev-pi-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > > > > > -- > > David Creasy > > Matrix Science > > 64 Baker Street > > London W1U 7GB, UK > > Tel: +44 (0)20 7486 1050 > > Fax: +44 (0)20 7224 1344 > > > > dc...@ma... > > http://www.matrixscience.com > > > > Matrix Science Ltd. is registered in England and Wales > > Company number 3533898 > > > > ________________________________________ > > > > ------------------------------------------------------------------------- > > Check out the new SourceForge.net Marketplace. > > It's the best place to buy or sell services for > > just about anything Open Source. > > http://sourceforge.net/services/buy/index.php > > > > ________________________________________ > > > > _______________________________________________ > > Psidev-pi-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > > > > > ------------------------------------------------------------------------- > > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > > Build the coolest Linux based applications with Moblin SDK & win great prizes > > Grand prize is a trip for two to an Open Source event anywhere in the world > > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > _______________________________________________ > > Psidev-pi-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev |
From: David C. <dc...@ma...> - 2008-07-31 13:50:22
|
Hi, Martin Eisenacher wrote: > Hi Andy, hi all, > >> As I see it SpectrumIdentificationItem is intended only for identifying Peptides. I didn't fully understand > Yes, I agree; but I understood Pierre-Alains question as a hint, that top-down > identifies protein sequences, so we would have to double information, referencing a protein sequence > as <Peptide> from <SpectrumIdentificationItem> and then the same sequence as <DBSequence> from > <ProteinDetectionResult>. But I might be wrong and we definitely have to wait for > a top-down instance doc. Sorry for the delay ;) I've put one here: http://code.google.com/p/psi-pi/source/browse/#svn/trunk/examples/schema_usecase_examples/working31July It's not so bad really. In the case of signal peptides, or leading methionine (as in this example), the protein that was analysed may be different from the sequence in the database, and there must be a way of representing this. > > >> Looking at it again, the model of SpectrumIdentificationItem is a little hard to understand and we could >> probably improve it. This is because SpectrumIdentificationItem has both Peptide_ref (i.e. a reference to a >> Peptide sequence and its mods) plus PeptideEvidence which is a reference to the part of the ProteinSequence >> this Peptide was derived from. The PeptideEvidence lines could be shifted up to <Peptide> and renamed e.g. >> SourceProtein - this would save some space and would appear to be a logically more sensible model... > You mean shifting <PeptideEvidence> under <Peptide> in the SequenceCollection? But missedcleavages > is only well-defined in relation to a search (using an enzyme)! > > >> I notice also that there is a small error in the schema in that on PeptideEvidence DBSequence_ref should be >> mandatory (and it is missing from the instance docs). I can fix this if there is agreement on this? > Yes, if <PeptideEvidence> stays optional. What about denovo where there is no database... > > >>>> 4) all mass values (sequenceMass, calculatedMassToCharge, >>> experimentalMassToCharge, >>>> are not specified whether monoisotopic or averaged. >>>> Do we assume that averaged does not exist anymore? >>> No, we decided to have only one type of masses in the whole analysisXML. >>> But I cannot find a note for that or a schema attribute... I will add an issue for that. >> It is a database search parameter: >> <AdditionalSearchParams> >> <pf:cvParam accession="PRIDE:0000162" name="Mass value type setting monoisotopic" cvRef="PRIDE"/> > > Yes, it is, but in case we have more than one SpectrumIdentification, that could be conflicting. > http://code.google.com/p/psi-pi/issues/detail?id=37 I'm not sure I understand whether this is OK or not now? (And why use Pride CV?) David > > > bye > Martin > > >>> -----Original Message----- >>> From: psi...@li... [mailto:psidev-pi-dev- >>> bo...@li...] On Behalf Of Martin Eisenacher >>> Sent: 30 July 2008 13:05 >>> To: 'Pierre-Alain Binz' >>> Cc: psi...@li... >>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences >>> >>> Hi Pierre-Alain, quite old posting, but I saw no answer yet, so I will try: >>> >>>> 2nd July, 2008: >>>> a couple of questions, just to make sure: >>>> 1) in case of top-down approach, do we have to duplicate sequenceCollection >>> information? >>> I hope not, by referencing the same identifier. >>> >>>> as SpectrumIdentificationResult contains a PeptideEvidence refering to a Peptide >>> element >>>> (and not to a DBSequence), identification is obligatory a Peptide? >>> At the moment I think it's possible to directly reference a DBSeq. At the time the >>> foreign key definitions are implemented we can forbid that. >>> But we should have in mind, that a peptide is a sequence plus modifications, so if >>> top-down >>> identifies only a sequence, we should allow that and if top-down identifies with >>> mods, >>> we should forbid that. >>> It would be quite helpful to have a top-down instance doc. To check >>> whether our thoughts are really deep enough... >>> >>>> 2) and what about spectral library searches, do we have to have Peptide >>>> elements with possibly undefined explicit sequences to refer to >>> >from the SpectrumIdentificationResult (because non peptidic, or because not >>> identified >>>> but good spectrum) >>> At the moment the sequence element can be empty or even left out. >>> User or CV params are allowed. >>> How do they report results in spectral lib search if they identify non-peptidic or >>> unidentified? >>> We need CV terms for that... >>> >>>> 3) in the Peptide element, the Modifications are defined in a much more >>>> detailed manner than in ModificationParams (PSI-MOD is there for >>>> instance). Does this simply mean that The ModificationParams codes >>>> the search engine settings and the Peptide includes the formal PSI >>>> definition of the Mod? And the only reference is the ModName value? >>> I think that has changed meanwhile, in the MPC use case I used PSI-MOD terms >>> for both. If a search engine has its "own" mods, we need CV for that in PSI-PI CV >>> or >>> they can define their own. >>> >>>> 4) all mass values (sequenceMass, calculatedMassToCharge, >>> experimentalMassToCharge, >>>> are not specified whether monoisotopic or averaged. >>>> Do we assume that averaged does not exist anymore? >>> No, we decided to have only one type of masses in the whole analysisXML. >>> But I cannot find a note for that or a schema attribute... I will add an issue for that. >>> >>> >>>> 5) is sequenceMass the mass value with/without the mods? If with, the >>>> name might be missleading (peptideMass would be more appropriate) >>> It is indeed the mass of the sequence without mods. >>> THAT is described in http://code.google.com/p/psi-pi/wiki/NotesForFocumentation >>> >>>> 6) in case the DBSequence is nucleotide, is there a tag for saying >>>> this? (NB: MS on nucleotide molecules can be performed and analysed, >>>> not only MS on AA sequences that are interpreting nucleotide sequences). >>>> Or do we neglect MS experiments done on nucleotide molecules (and by >>>> the way on glycans...) and only represent the DBSequences as AA >>>> sequences (frame translations)? (and what about glycans?) >>>> Probaly can be solved if one can replace SequenceCollection by >>>> something else if needed (SmallMoleculeCollection, GlycanCollection, >>>> MoleculeCollection)... but the validator might not like this. >>> Mh, these can be extensions, I think they are not possible at the moment. >>> But a tag for the type can indeed be useful, it could be a CV param. >>> I will create an issue for that. >>> >>>> 7) in case that DBSequence is nucleotide, do we represent the >>>> Peptide as AA sequence in case of MS done on proteins? >>> I hope the following answers this: >>> >>> <DBSequence> is the nucleotide seq from the nucleotide DB, >>> <Peptide> is the identified amino acid sequence plus mods (without any translation >>> frame or something). >>> <PeptideEvidence> contains the DBSequence_Ref together with a frame and a >>> TranslationTable_Ref attribute. >>> (The Peptide_Ref is done in SpectrumIdentificationItem as in the amino acid DB >>> case.) >>> If a protein detection is performed, there are <PeptideHypothesis> elements >>> referencing >>> PeptideEvidence elements from SpectrumIdentificationItem sections. >>> >>> >>> >>> Bye >>> Martin >>> >>> >>> >>> >>> David Creasy wrote: >>> Thanks Andy, >>> >>> I've added an updated example document to SVN: >>> http://code.google.com/p/psi- >>> pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F00 >>> 1350.xml >>> >>> Problem is that we have now removed the main point of these recent changes >>> which was to add the decoy flag... I think >>> that we need to add isDecoy to SpectrumIdentificationItem. >>> >>> And yes, I suspect that we should go back to using the >>> ConceptualMoleculeCollection >>> Um, and since we've not actually ended up adding anything to DBSequence... we >>> haven't actually achieved anything? >>> I think we need to discuss this again at the next telecon. >>> >>> David >>> >>> Jones, Andy wrote: >>> Hi all, >>> >>> I’ve updated the schema in SVN with the following main changes: >>> >>> PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the >>> call (simple mappings to proteins are done >>> at this level) >>> Added DBSequence that should be used instead of Sequence (following some of >>> the discussion below) >>> Created a new collection class SequenceCollection (rather than >>> ConceptualMoleculeCollection) so that only references can >>> be given to DBSequence and Peptide >>> In fact, I’m not sure if this is sensible since it prevents other types of >>> ConceptualMolecule being added later... to >>> discuss >>> In FuGE on cvParam, the value attribute is no longer mandatory >>> >>> I’ve added a simple example that validates under >>> examples\schema_usecase_examples\working27June >>> >>> Feel free to mail me any changes to make on Monday, >>> Cheers >>> Andy >>> >>> >>> >>> From: psi...@li... [mailto:psidev-pi-dev- >>> bo...@li...] On Behalf Of >>> Jones, Andy >>> Sent: 27 June 2008 16:24 >>> To: Angel Pizarro >>> Cc: psi...@li... >>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences >>> >>> I think Angel’s response below might not have made it round the list yet. >>> >>> I tend to agree that isDecoy is redundant information and perhaps this is not the >>> best place to encode semantic >>> information. An alternative would be to have a parameter, say on >>> SpectrumIdentification for cvParam = “decoy_string” >>> value = “Rev”. This would be a more compact representation and we would not >>> have to add what is quite a specific >>> attribute type (isDecoy) to Sequence. >>> >>> >>> >>> From: an...@it... [mailto:an...@it...] On Behalf Of Angel >>> Pizarro >>> Sent: 27 June 2008 15:59 >>> To: Jones, Andy >>> Cc: psi...@li... >>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences >>> >>> my 2¢ : >>> You need to be able to extend this to all molecule types, or am I missing the point >>> of this thread, and you mean that >>> this would be a suclass of the conceptual molecule element? >>> >>> Second, and this is is tangentially related, but are decoy sequences really a >>> problem we should be putting our effort >>> into? Is it in our domain to encode semantic information about a sequence, and >>> possibly relating reported sequences as >>> part of our schema? >>> On a personal level I could care less if "isDecoy" is an attribute or not, but the >>> temptation then would be for folks to >>> encode the same accession for two different sequences, effectively making the >>> primary key of the sequence object >>> (accession, isDecoy) >>> >>> >>> Do we want to go there? >>> On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> >>> wrote: >>> So how about include length as an attribute and then let all other things go in the >>> CV (pI, mass, etc.)? >>> >>> >>> >>> From: Jones, Andy >>> Sent: 27 June 2008 14:54 >>> To: 'David Creasy' >>> Subject: RE: [Psidev-pi-dev] Representing Sequences >>> >>> id and name are standard for all elements that inherit from FuGE identifiable – this >>> is perhaps a separate discussion as >>> to whether the optional name attribute should be there. >>> >>> I agree that length may be useful – is this just an integer value with no unit? >>> Yes, I think so. >>> I'm less sure about pI and mass since mass at least can be calculated very simply >>> Only if you have the sequence... (we have residue masses in the file). >>> >>> >>> , and pI values (in my opinion) are pretty inaccurate and fairly meaningless >>> Scandalous! (I happen to agree, but now some people will never speak to either of >>> us ever again). >>> >>> The main problem with mass and pI is that these are 'irrelevant' if the sequence is >>> nuleic acid rather than residues. >>> Why not just allow CV there? We can share the same CV as the PEFF format, >>> which includes, taxonomy, sequence type, gene >>> ID, and lots of wonderful other things? >>> >>> >>> – unless someone can convince me otherwise? >>> Cheers >>> Andy >>> >>> >>> From: David Creasy [mailto:dc...@ma...] >>> Sent: 27 June 2008 14:51 >>> To: Jones, Andy >>> Cc: psi...@li... >>> Subject: Re: [Psidev-pi-dev] Representing Sequences >>> >>> Hi Andy, >>> >>> length may be useful, because some people won't want to output the actual >>> sequence for space reasons. The other things >>> we wanted to add before were pI and mass. >>> Why do we want name? Is this for, say, a description line? >>> (Also, identifier -> id?) >>> >>> David >>> >>> Jones, Andy wrote: >>> Hi all, >>> >>> It was decided on the call that we would like to flag that Sequences in the >>> ConceptualMoleculeCollection should have a >>> Boolean attribute to capture if they are decoy sequences. At the moment we are >>> using the FuGE:Sequence element. I don't >>> really want to add another attribute to this (it's less problematic cutting down FuGE >>> than adding new things), so I'm >>> wondering if we should define our own Sequence type in AnalysisXML. This >>> would also allow us to choose exactly the >>> relevant attributes. At the moment, Sequence can have all of the following: >>> >>> <pf:Sequence isCircular="true" sequence="String" length="0" >>> isApproximateLength="true" >>> SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" >>> name="String"> >>> >>> Several of these attributes were created to represent concepts that probably will >>> never be required or implemented in >>> AnalysisXML. How about the following: >>> >>> <DBSequence identifier = "" name = "" isDecoy = "true"> >>> <seq>MCTMG...</seq> >>> <pf:DatabaseReference Database_ref="" >>> accession="Rev_IPI00013808.1"/> >>> </DBSequence> >>> >>> Are any of the other attributes on Sequence actually required? I'll post a new >>> version of the schema with other changes >>> WRT to PeptideEvidence shortly, >>> Cheers >>> Andy >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ________________________________________ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://sourceforge.net/services/buy/index.php >>> >>> >>> >>> >>> >>> >>> ________________________________________ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> >>> -- >>> David Creasy >>> Matrix Science >>> 64 Baker Street >>> London W1U 7GB, UK >>> Tel: +44 (0)20 7486 1050 >>> Fax: +44 (0)20 7224 1344 >>> >>> dc...@ma... >>> http://www.matrixscience.com >>> >>> Matrix Science Ltd. is registered in England and Wales >>> Company number 3533898 >>> >>> >>> >>> ________________________________________ >>> >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://sourceforge.net/services/buy/index.php >>> >>> >>> >>> >>> ________________________________________ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> >>> -- >>> David Creasy >>> Matrix Science >>> 64 Baker Street >>> London W1U 7GB, UK >>> Tel: +44 (0)20 7486 1050 >>> Fax: +44 (0)20 7224 1344 >>> >>> dc...@ma... >>> http://www.matrixscience.com >>> >>> Matrix Science Ltd. is registered in England and Wales >>> Company number 3533898 >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://sourceforge.net/services/buy/index.php >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> >>> >>> -- >>> Angel Pizarro >>> Director, ITMAT Bioinformatics Facility >>> 806 Biological Research Building >>> 421 Curie Blvd. >>> Philadelphia, PA 19104-6160 >>> 215-573-3736 >>> ________________________________________ >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://sourceforge.net/services/buy/index.php >>> ________________________________________ >>> >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> >>> >>> -- >>> David Creasy >>> Matrix Science >>> 64 Baker Street >>> London W1U 7GB, UK >>> Tel: +44 (0)20 7486 1050 >>> Fax: +44 (0)20 7224 1344 >>> >>> dc...@ma... >>> http://www.matrixscience.com >>> >>> Matrix Science Ltd. is registered in England and Wales >>> Company number 3533898 >>> >>> ________________________________________ >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://sourceforge.net/services/buy/index.php >>> >>> ________________________________________ >>> >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> >>> >>> ------------------------------------------------------------------------- >>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >>> Build the coolest Linux based applications with Moblin SDK & win great prizes >>> Grand prize is a trip for two to an Open Source event anywhere in the world >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: Martin E. <mar...@ru...> - 2008-07-31 14:25:46
|
> >But I might be wrong and we definitely have to wait for > > a top-down instance doc. > Sorry for the delay ;) I've put one here: > http://code.google.com/p/psi-pi/source/browse/#svn/trunk/examples/schema_usecase_examples/working31July > It's not so bad really. In the case of signal peptides, or leading > methionine (as in this example), the protein that was analysed may be > different from the sequence in the database, and there must be a way of > representing this. So you think it's okay like it is and no doubling. Or can I derive an issue from the "not so bad really" phrase ;-) > >> I notice also that there is a small error in the schema in that on PeptideEvidence DBSequence_ref should be > >> mandatory (and it is missing from the instance docs). I can fix this if there is agreement on this? > > Yes, if <PeptideEvidence> stays optional. > What about denovo where there is no database... That is an argument to have PeptideEvidence optional, isn't it? But DBSequence_ref as attribute of it should be mandatory. > >> It is a database search parameter: > >> <AdditionalSearchParams> > >> <pf:cvParam accession="PRIDE:0000162" name="Mass value type setting monoisotopic" cvRef="PRIDE"/> > > Yes, it is, but in case we have more than one SpectrumIdentification, that could be conflicting. > > http://code.google.com/p/psi-pi/issues/detail?id=37 > > I'm not sure I understand whether this is OK or not now? (And why use > Pride CV?) I think the current schema is not okay, because it allows "average" in one SpecIdent and "mono" in another, so it is not well-defined for the masses in elements or attributes. We need a global attribute :-) or element. Or it can be done later in semantic validation :-( . Bye Martin > >>> -----Original Message----- > >>> From: psi...@li... [mailto:psidev-pi-dev- > >>> bo...@li...] On Behalf Of Martin Eisenacher > >>> Sent: 30 July 2008 13:05 > >>> To: 'Pierre-Alain Binz' > >>> Cc: psi...@li... > >>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > >>> > >>> Hi Pierre-Alain, quite old posting, but I saw no answer yet, so I will try: > >>> > >>>> 2nd July, 2008: > >>>> a couple of questions, just to make sure: > >>>> 1) in case of top-down approach, do we have to duplicate sequenceCollection > >>> information? > >>> I hope not, by referencing the same identifier. > >>> > >>>> as SpectrumIdentificationResult contains a PeptideEvidence refering to a Peptide > >>> element > >>>> (and not to a DBSequence), identification is obligatory a Peptide? > >>> At the moment I think it's possible to directly reference a DBSeq. At the time the > >>> foreign key definitions are implemented we can forbid that. > >>> But we should have in mind, that a peptide is a sequence plus modifications, so if > >>> top-down > >>> identifies only a sequence, we should allow that and if top-down identifies with > >>> mods, > >>> we should forbid that. > >>> It would be quite helpful to have a top-down instance doc. To check > >>> whether our thoughts are really deep enough... > >>> > >>>> 2) and what about spectral library searches, do we have to have Peptide > >>>> elements with possibly undefined explicit sequences to refer to > >>> >from the SpectrumIdentificationResult (because non peptidic, or because not > >>> identified > >>>> but good spectrum) > >>> At the moment the sequence element can be empty or even left out. > >>> User or CV params are allowed. > >>> How do they report results in spectral lib search if they identify non-peptidic or > >>> unidentified? > >>> We need CV terms for that... > >>> > >>>> 3) in the Peptide element, the Modifications are defined in a much more > >>>> detailed manner than in ModificationParams (PSI-MOD is there for > >>>> instance). Does this simply mean that The ModificationParams codes > >>>> the search engine settings and the Peptide includes the formal PSI > >>>> definition of the Mod? And the only reference is the ModName value? > >>> I think that has changed meanwhile, in the MPC use case I used PSI-MOD terms > >>> for both. If a search engine has its "own" mods, we need CV for that in PSI-PI CV > >>> or > >>> they can define their own. > >>> > >>>> 4) all mass values (sequenceMass, calculatedMassToCharge, > >>> experimentalMassToCharge, > >>>> are not specified whether monoisotopic or averaged. > >>>> Do we assume that averaged does not exist anymore? > >>> No, we decided to have only one type of masses in the whole analysisXML. > >>> But I cannot find a note for that or a schema attribute... I will add an issue for that. > >>> > >>> > >>>> 5) is sequenceMass the mass value with/without the mods? If with, the > >>>> name might be missleading (peptideMass would be more appropriate) > >>> It is indeed the mass of the sequence without mods. > >>> THAT is described in http://code.google.com/p/psi-pi/wiki/NotesForFocumentation > >>> > >>>> 6) in case the DBSequence is nucleotide, is there a tag for saying > >>>> this? (NB: MS on nucleotide molecules can be performed and analysed, > >>>> not only MS on AA sequences that are interpreting nucleotide sequences). > >>>> Or do we neglect MS experiments done on nucleotide molecules (and by > >>>> the way on glycans...) and only represent the DBSequences as AA > >>>> sequences (frame translations)? (and what about glycans?) > >>>> Probaly can be solved if one can replace SequenceCollection by > >>>> something else if needed (SmallMoleculeCollection, GlycanCollection, > >>>> MoleculeCollection)... but the validator might not like this. > >>> Mh, these can be extensions, I think they are not possible at the moment. > >>> But a tag for the type can indeed be useful, it could be a CV param. > >>> I will create an issue for that. > >>> > >>>> 7) in case that DBSequence is nucleotide, do we represent the > >>>> Peptide as AA sequence in case of MS done on proteins? > >>> I hope the following answers this: > >>> > >>> <DBSequence> is the nucleotide seq from the nucleotide DB, > >>> <Peptide> is the identified amino acid sequence plus mods (without any translation > >>> frame or something). > >>> <PeptideEvidence> contains the DBSequence_Ref together with a frame and a > >>> TranslationTable_Ref attribute. > >>> (The Peptide_Ref is done in SpectrumIdentificationItem as in the amino acid DB > >>> case.) > >>> If a protein detection is performed, there are <PeptideHypothesis> elements > >>> referencing > >>> PeptideEvidence elements from SpectrumIdentificationItem sections. > >>> > >>> > >>> > >>> Bye > >>> Martin > >>> > >>> > >>> > >>> > >>> David Creasy wrote: > >>> Thanks Andy, > >>> > >>> I've added an updated example document to SVN: > >>> http://code.google.com/p/psi- > >>> pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F00 > >>> 1350.xml > >>> > >>> Problem is that we have now removed the main point of these recent changes > >>> which was to add the decoy flag... I think > >>> that we need to add isDecoy to SpectrumIdentificationItem. > >>> > >>> And yes, I suspect that we should go back to using the > >>> ConceptualMoleculeCollection > >>> Um, and since we've not actually ended up adding anything to DBSequence... we > >>> haven't actually achieved anything? > >>> I think we need to discuss this again at the next telecon. > >>> > >>> David > >>> > >>> Jones, Andy wrote: > >>> Hi all, > >>> > >>> Ive updated the schema in SVN with the following main changes: > >>> > >>> PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the > >>> call (simple mappings to proteins are done > >>> at this level) > >>> Added DBSequence that should be used instead of Sequence (following some of > >>> the discussion below) > >>> Created a new collection class SequenceCollection (rather than > >>> ConceptualMoleculeCollection) so that only references can > >>> be given to DBSequence and Peptide > >>> In fact, Im not sure if this is sensible since it prevents other types of > >>> ConceptualMolecule being added later... to > >>> discuss > >>> In FuGE on cvParam, the value attribute is no longer mandatory > >>> > >>> Ive added a simple example that validates under > >>> examples\schema_usecase_examples\working27June > >>> > >>> Feel free to mail me any changes to make on Monday, > >>> Cheers > >>> Andy > >>> > >>> > >>> > >>> From: psi...@li... [mailto:psidev-pi-dev- > >>> bo...@li...] On Behalf Of > >>> Jones, Andy > >>> Sent: 27 June 2008 16:24 > >>> To: Angel Pizarro > >>> Cc: psi...@li... > >>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > >>> > >>> I think Angels response below might not have made it round the list yet. > >>> > >>> I tend to agree that isDecoy is redundant information and perhaps this is not the > >>> best place to encode semantic > >>> information. An alternative would be to have a parameter, say on > >>> SpectrumIdentification for cvParam = decoy_string > >>> value = Rev. This would be a more compact representation and we would not > >>> have to add what is quite a specific > >>> attribute type (isDecoy) to Sequence. > >>> > >>> > >>> > >>> From: an...@it... [mailto:an...@it...] On Behalf Of Angel > >>> Pizarro > >>> Sent: 27 June 2008 15:59 > >>> To: Jones, Andy > >>> Cc: psi...@li... > >>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > >>> > >>> my 2¢ : > >>> You need to be able to extend this to all molecule types, or am I missing the point > >>> of this thread, and you mean that > >>> this would be a suclass of the conceptual molecule element? > >>> > >>> Second, and this is is tangentially related, but are decoy sequences really a > >>> problem we should be putting our effort > >>> into? Is it in our domain to encode semantic information about a sequence, and > >>> possibly relating reported sequences as > >>> part of our schema? > >>> On a personal level I could care less if "isDecoy" is an attribute or not, but the > >>> temptation then would be for folks to > >>> encode the same accession for two different sequences, effectively making the > >>> primary key of the sequence object > >>> (accession, isDecoy) > >>> > >>> > >>> Do we want to go there? > >>> On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> > >>> wrote: > >>> So how about include length as an attribute and then let all other things go in the > >>> CV (pI, mass, etc.)? > >>> > >>> > >>> > >>> From: Jones, Andy > >>> Sent: 27 June 2008 14:54 > >>> To: 'David Creasy' > >>> Subject: RE: [Psidev-pi-dev] Representing Sequences > >>> > >>> id and name are standard for all elements that inherit from FuGE identifiable this > >>> is perhaps a separate discussion as > >>> to whether the optional name attribute should be there. > >>> > >>> I agree that length may be useful is this just an integer value with no unit? > >>> Yes, I think so. > >>> I'm less sure about pI and mass since mass at least can be calculated very simply > >>> Only if you have the sequence... (we have residue masses in the file). > >>> > >>> > >>> , and pI values (in my opinion) are pretty inaccurate and fairly meaningless > >>> Scandalous! (I happen to agree, but now some people will never speak to either of > >>> us ever again). > >>> > >>> The main problem with mass and pI is that these are 'irrelevant' if the sequence is > >>> nuleic acid rather than residues. > >>> Why not just allow CV there? We can share the same CV as the PEFF format, > >>> which includes, taxonomy, sequence type, gene > >>> ID, and lots of wonderful other things? > >>> > >>> > >>> unless someone can convince me otherwise? > >>> Cheers > >>> Andy > >>> > >>> > >>> From: David Creasy [mailto:dc...@ma...] > >>> Sent: 27 June 2008 14:51 > >>> To: Jones, Andy > >>> Cc: psi...@li... > >>> Subject: Re: [Psidev-pi-dev] Representing Sequences > >>> > >>> Hi Andy, > >>> > >>> length may be useful, because some people won't want to output the actual > >>> sequence for space reasons. The other things > >>> we wanted to add before were pI and mass. > >>> Why do we want name? Is this for, say, a description line? > >>> (Also, identifier -> id?) > >>> > >>> David > >>> > >>> Jones, Andy wrote: > >>> Hi all, > >>> > >>> It was decided on the call that we would like to flag that Sequences in the > >>> ConceptualMoleculeCollection should have a > >>> Boolean attribute to capture if they are decoy sequences. At the moment we are > >>> using the FuGE:Sequence element. I don't > >>> really want to add another attribute to this (it's less problematic cutting down FuGE > >>> than adding new things), so I'm > >>> wondering if we should define our own Sequence type in AnalysisXML. This > >>> would also allow us to choose exactly the > >>> relevant attributes. At the moment, Sequence can have all of the following: > >>> > >>> <pf:Sequence isCircular="true" sequence="String" length="0" > >>> isApproximateLength="true" > >>> SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" > >>> name="String"> > >>> > >>> Several of these attributes were created to represent concepts that probably will > >>> never be required or implemented in > >>> AnalysisXML. How about the following: > >>> > >>> <DBSequence identifier = "" name = "" isDecoy = "true"> > >>> <seq>MCTMG...</seq> > >>> <pf:DatabaseReference Database_ref="" > >>> accession="Rev_IPI00013808.1"/> > >>> </DBSequence> > >>> > >>> Are any of the other attributes on Sequence actually required? I'll post a new > >>> version of the schema with other changes > >>> WRT to PeptideEvidence shortly, > >>> Cheers > >>> Andy > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> ________________________________________ > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> ------------------------------------------------------------------------- > >>> Check out the new SourceForge.net Marketplace. > >>> It's the best place to buy or sell services for > >>> just about anything Open Source. > >>> http://sourceforge.net/services/buy/index.php > >>> > >>> > >>> > >>> > >>> > >>> > >>> ________________________________________ > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Psidev-pi-dev mailing list > >>> Psi...@li... > >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>> > >>> > >>> -- > >>> David Creasy > >>> Matrix Science > >>> 64 Baker Street > >>> London W1U 7GB, UK > >>> Tel: +44 (0)20 7486 1050 > >>> Fax: +44 (0)20 7224 1344 > >>> > >>> dc...@ma... > >>> http://www.matrixscience.com > >>> > >>> Matrix Science Ltd. is registered in England and Wales > >>> Company number 3533898 > >>> > >>> > >>> > >>> ________________________________________ > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> ------------------------------------------------------------------------- > >>> Check out the new SourceForge.net Marketplace. > >>> It's the best place to buy or sell services for > >>> just about anything Open Source. > >>> http://sourceforge.net/services/buy/index.php > >>> > >>> > >>> > >>> > >>> ________________________________________ > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Psidev-pi-dev mailing list > >>> Psi...@li... > >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>> > >>> > >>> -- > >>> David Creasy > >>> Matrix Science > >>> 64 Baker Street > >>> London W1U 7GB, UK > >>> Tel: +44 (0)20 7486 1050 > >>> Fax: +44 (0)20 7224 1344 > >>> > >>> dc...@ma... > >>> http://www.matrixscience.com > >>> > >>> Matrix Science Ltd. is registered in England and Wales > >>> Company number 3533898 > >>> > >>> ------------------------------------------------------------------------- > >>> Check out the new SourceForge.net Marketplace. > >>> It's the best place to buy or sell services for > >>> just about anything Open Source. > >>> http://sourceforge.net/services/buy/index.php > >>> _______________________________________________ > >>> Psidev-pi-dev mailing list > >>> Psi...@li... > >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>> > >>> > >>> > >>> -- > >>> Angel Pizarro > >>> Director, ITMAT Bioinformatics Facility > >>> 806 Biological Research Building > >>> 421 Curie Blvd. > >>> Philadelphia, PA 19104-6160 > >>> 215-573-3736 > >>> ________________________________________ > >>> > >>> ------------------------------------------------------------------------- > >>> Check out the new SourceForge.net Marketplace. > >>> It's the best place to buy or sell services for > >>> just about anything Open Source. > >>> http://sourceforge.net/services/buy/index.php > >>> ________________________________________ > >>> > >>> _______________________________________________ > >>> Psidev-pi-dev mailing list > >>> Psi...@li... > >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>> > >>> > >>> > >>> -- > >>> David Creasy > >>> Matrix Science > >>> 64 Baker Street > >>> London W1U 7GB, UK > >>> Tel: +44 (0)20 7486 1050 > >>> Fax: +44 (0)20 7224 1344 > >>> > >>> dc...@ma... > >>> http://www.matrixscience.com > >>> > >>> Matrix Science Ltd. is registered in England and Wales > >>> Company number 3533898 > >>> > >>> ________________________________________ > >>> > >>> ------------------------------------------------------------------------- > >>> Check out the new SourceForge.net Marketplace. > >>> It's the best place to buy or sell services for > >>> just about anything Open Source. > >>> http://sourceforge.net/services/buy/index.php > >>> > >>> ________________________________________ > >>> > >>> _______________________________________________ > >>> Psidev-pi-dev mailing list > >>> Psi...@li... > >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>> > >>> > >>> > >>> ------------------------------------------------------------------------- > >>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > >>> Build the coolest Linux based applications with Moblin SDK & win great prizes > >>> Grand prize is a trip for two to an Open Source event anywhere in the world > >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ > >>> _______________________________________________ > >>> Psidev-pi-dev mailing list > >>> Psi...@li... > >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > > > ------------------------------------------------------------------------- > > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > > Build the coolest Linux based applications with Moblin SDK & win great prizes > > Grand prize is a trip for two to an Open Source event anywhere in the world > > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > _______________________________________________ > > Psidev-pi-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > -- > David Creasy > Matrix Science > 64 Baker Street > London W1U 7GB, UK > Tel: +44 (0)20 7486 1050 > Fax: +44 (0)20 7224 1344 > > dc...@ma... > http://www.matrixscience.com > > Matrix Science Ltd. is registered in England and Wales > Company number 3533898 |
From: David C. <dc...@ma...> - 2008-07-31 14:46:06
|
Martin Eisenacher wrote: >>> But I might be wrong and we definitely have to wait for >>> a top-down instance doc. >> Sorry for the delay ;) I've put one here: >> http://code.google.com/p/psi-pi/source/browse/#svn/trunk/examples/schema_usecase_examples/working31July >> It's not so bad really. In the case of signal peptides, or leading >> methionine (as in this example), the protein that was analysed may be >> different from the sequence in the database, and there must be a way of >> representing this. > So you think it's okay like it is and no doubling. Or can I derive an issue from > the "not so bad really" phrase ;-) I think it has to be the way it is with potential for the sequence to be in the document twice. Otherwise, how do we cope with the case of signal peptides and/or a leading methionine? > > >>>> I notice also that there is a small error in the schema in that on PeptideEvidence DBSequence_ref should be >>>> mandatory (and it is missing from the instance docs). I can fix this if there is agreement on this? >>> Yes, if <PeptideEvidence> stays optional. >> What about denovo where there is no database... > That is an argument to have PeptideEvidence optional, isn't it? > But DBSequence_ref as attribute of it should be mandatory. Doh, sorry, yes you are totally correct. It should be mandatory. >(and it is missing from the instance docs). I believe it's in all the Mascot ones? > > >>>> It is a database search parameter: >>>> <AdditionalSearchParams> >>>> <pf:cvParam accession="PRIDE:0000162" name="Mass value type setting monoisotopic" cvRef="PRIDE"/> >>> Yes, it is, but in case we have more than one SpectrumIdentification, that could be conflicting. >>> http://code.google.com/p/psi-pi/issues/detail?id=37 >> I'm not sure I understand whether this is OK or not now? (And why use >> Pride CV?) > I think the current schema is not okay, because it allows "average" in one SpecIdent and "mono" in another, > so it is not well-defined for the masses in elements or attributes. > We need a global attribute :-) or element. Or it can be done later in semantic validation :-( . I think it's actually _required_ to be like this. For example, at least one search engine allows you to specify mono for masses below x and average for masses above x. So, in this case, the output should be similar to the N15 example that I've supplied, with two separate mass tables. Maybe you could look at the Mascot_N15_example.xml and see if you think that this is OK. Talk soon, David > > Bye > Martin > > >>>>> -----Original Message----- >>>>> From: psi...@li... [mailto:psidev-pi-dev- >>>>> bo...@li...] On Behalf Of Martin Eisenacher >>>>> Sent: 30 July 2008 13:05 >>>>> To: 'Pierre-Alain Binz' >>>>> Cc: psi...@li... >>>>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences >>>>> >>>>> Hi Pierre-Alain, quite old posting, but I saw no answer yet, so I will try: >>>>> >>>>>> 2nd July, 2008: >>>>>> a couple of questions, just to make sure: >>>>>> 1) in case of top-down approach, do we have to duplicate sequenceCollection >>>>> information? >>>>> I hope not, by referencing the same identifier. >>>>> >>>>>> as SpectrumIdentificationResult contains a PeptideEvidence refering to a Peptide >>>>> element >>>>>> (and not to a DBSequence), identification is obligatory a Peptide? >>>>> At the moment I think it's possible to directly reference a DBSeq. At the time the >>>>> foreign key definitions are implemented we can forbid that. >>>>> But we should have in mind, that a peptide is a sequence plus modifications, so if >>>>> top-down >>>>> identifies only a sequence, we should allow that and if top-down identifies with >>>>> mods, >>>>> we should forbid that. >>>>> It would be quite helpful to have a top-down instance doc. To check >>>>> whether our thoughts are really deep enough... >>>>> >>>>>> 2) and what about spectral library searches, do we have to have Peptide >>>>>> elements with possibly undefined explicit sequences to refer to >>>>> >from the SpectrumIdentificationResult (because non peptidic, or because not >>>>> identified >>>>>> but good spectrum) >>>>> At the moment the sequence element can be empty or even left out. >>>>> User or CV params are allowed. >>>>> How do they report results in spectral lib search if they identify non-peptidic or >>>>> unidentified? >>>>> We need CV terms for that... >>>>> >>>>>> 3) in the Peptide element, the Modifications are defined in a much more >>>>>> detailed manner than in ModificationParams (PSI-MOD is there for >>>>>> instance). Does this simply mean that The ModificationParams codes >>>>>> the search engine settings and the Peptide includes the formal PSI >>>>>> definition of the Mod? And the only reference is the ModName value? >>>>> I think that has changed meanwhile, in the MPC use case I used PSI-MOD terms >>>>> for both. If a search engine has its "own" mods, we need CV for that in PSI-PI CV >>>>> or >>>>> they can define their own. >>>>> >>>>>> 4) all mass values (sequenceMass, calculatedMassToCharge, >>>>> experimentalMassToCharge, >>>>>> are not specified whether monoisotopic or averaged. >>>>>> Do we assume that averaged does not exist anymore? >>>>> No, we decided to have only one type of masses in the whole analysisXML. >>>>> But I cannot find a note for that or a schema attribute... I will add an issue for that. >>>>> >>>>> >>>>>> 5) is sequenceMass the mass value with/without the mods? If with, the >>>>>> name might be missleading (peptideMass would be more appropriate) >>>>> It is indeed the mass of the sequence without mods. >>>>> THAT is described in http://code.google.com/p/psi-pi/wiki/NotesForFocumentation >>>>> >>>>>> 6) in case the DBSequence is nucleotide, is there a tag for saying >>>>>> this? (NB: MS on nucleotide molecules can be performed and analysed, >>>>>> not only MS on AA sequences that are interpreting nucleotide sequences). >>>>>> Or do we neglect MS experiments done on nucleotide molecules (and by >>>>>> the way on glycans...) and only represent the DBSequences as AA >>>>>> sequences (frame translations)? (and what about glycans?) >>>>>> Probaly can be solved if one can replace SequenceCollection by >>>>>> something else if needed (SmallMoleculeCollection, GlycanCollection, >>>>>> MoleculeCollection)... but the validator might not like this. >>>>> Mh, these can be extensions, I think they are not possible at the moment. >>>>> But a tag for the type can indeed be useful, it could be a CV param. >>>>> I will create an issue for that. >>>>> >>>>>> 7) in case that DBSequence is nucleotide, do we represent the >>>>>> Peptide as AA sequence in case of MS done on proteins? >>>>> I hope the following answers this: >>>>> >>>>> <DBSequence> is the nucleotide seq from the nucleotide DB, >>>>> <Peptide> is the identified amino acid sequence plus mods (without any translation >>>>> frame or something). >>>>> <PeptideEvidence> contains the DBSequence_Ref together with a frame and a >>>>> TranslationTable_Ref attribute. >>>>> (The Peptide_Ref is done in SpectrumIdentificationItem as in the amino acid DB >>>>> case.) >>>>> If a protein detection is performed, there are <PeptideHypothesis> elements >>>>> referencing >>>>> PeptideEvidence elements from SpectrumIdentificationItem sections. >>>>> >>>>> >>>>> >>>>> Bye >>>>> Martin >>>>> >>>>> >>>>> >>>>> >>>>> David Creasy wrote: >>>>> Thanks Andy, >>>>> >>>>> I've added an updated example document to SVN: >>>>> http://code.google.com/p/psi- >>>>> pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F00 >>>>> 1350.xml >>>>> >>>>> Problem is that we have now removed the main point of these recent changes >>>>> which was to add the decoy flag... I think >>>>> that we need to add isDecoy to SpectrumIdentificationItem. >>>>> >>>>> And yes, I suspect that we should go back to using the >>>>> ConceptualMoleculeCollection >>>>> Um, and since we've not actually ended up adding anything to DBSequence... we >>>>> haven't actually achieved anything? >>>>> I think we need to discuss this again at the next telecon. >>>>> >>>>> David >>>>> >>>>> Jones, Andy wrote: >>>>> Hi all, >>>>> >>>>> I’ve updated the schema in SVN with the following main changes: >>>>> >>>>> PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the >>>>> call (simple mappings to proteins are done >>>>> at this level) >>>>> Added DBSequence that should be used instead of Sequence (following some of >>>>> the discussion below) >>>>> Created a new collection class SequenceCollection (rather than >>>>> ConceptualMoleculeCollection) so that only references can >>>>> be given to DBSequence and Peptide >>>>> In fact, I’m not sure if this is sensible since it prevents other types of >>>>> ConceptualMolecule being added later... to >>>>> discuss >>>>> In FuGE on cvParam, the value attribute is no longer mandatory >>>>> >>>>> I’ve added a simple example that validates under >>>>> examples\schema_usecase_examples\working27June >>>>> >>>>> Feel free to mail me any changes to make on Monday, >>>>> Cheers >>>>> Andy >>>>> >>>>> >>>>> >>>>> From: psi...@li... [mailto:psidev-pi-dev- >>>>> bo...@li...] On Behalf Of >>>>> Jones, Andy >>>>> Sent: 27 June 2008 16:24 >>>>> To: Angel Pizarro >>>>> Cc: psi...@li... >>>>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences >>>>> >>>>> I think Angel’s response below might not have made it round the list yet. >>>>> >>>>> I tend to agree that isDecoy is redundant information and perhaps this is not the >>>>> best place to encode semantic >>>>> information. An alternative would be to have a parameter, say on >>>>> SpectrumIdentification for cvParam = “decoy_string” >>>>> value = “Rev”. This would be a more compact representation and we would not >>>>> have to add what is quite a specific >>>>> attribute type (isDecoy) to Sequence. >>>>> >>>>> >>>>> >>>>> From: an...@it... [mailto:an...@it...] On Behalf Of Angel >>>>> Pizarro >>>>> Sent: 27 June 2008 15:59 >>>>> To: Jones, Andy >>>>> Cc: psi...@li... >>>>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences >>>>> >>>>> my 2¢ : >>>>> You need to be able to extend this to all molecule types, or am I missing the point >>>>> of this thread, and you mean that >>>>> this would be a suclass of the conceptual molecule element? >>>>> >>>>> Second, and this is is tangentially related, but are decoy sequences really a >>>>> problem we should be putting our effort >>>>> into? Is it in our domain to encode semantic information about a sequence, and >>>>> possibly relating reported sequences as >>>>> part of our schema? >>>>> On a personal level I could care less if "isDecoy" is an attribute or not, but the >>>>> temptation then would be for folks to >>>>> encode the same accession for two different sequences, effectively making the >>>>> primary key of the sequence object >>>>> (accession, isDecoy) >>>>> >>>>> >>>>> Do we want to go there? >>>>> On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> >>>>> wrote: >>>>> So how about include length as an attribute and then let all other things go in the >>>>> CV (pI, mass, etc.)? >>>>> >>>>> >>>>> >>>>> From: Jones, Andy >>>>> Sent: 27 June 2008 14:54 >>>>> To: 'David Creasy' >>>>> Subject: RE: [Psidev-pi-dev] Representing Sequences >>>>> >>>>> id and name are standard for all elements that inherit from FuGE identifiable – this >>>>> is perhaps a separate discussion as >>>>> to whether the optional name attribute should be there. >>>>> >>>>> I agree that length may be useful – is this just an integer value with no unit? >>>>> Yes, I think so. >>>>> I'm less sure about pI and mass since mass at least can be calculated very simply >>>>> Only if you have the sequence... (we have residue masses in the file). >>>>> >>>>> >>>>> , and pI values (in my opinion) are pretty inaccurate and fairly meaningless >>>>> Scandalous! (I happen to agree, but now some people will never speak to either of >>>>> us ever again). >>>>> >>>>> The main problem with mass and pI is that these are 'irrelevant' if the sequence is >>>>> nuleic acid rather than residues. >>>>> Why not just allow CV there? We can share the same CV as the PEFF format, >>>>> which includes, taxonomy, sequence type, gene >>>>> ID, and lots of wonderful other things? >>>>> >>>>> >>>>> – unless someone can convince me otherwise? >>>>> Cheers >>>>> Andy >>>>> >>>>> >>>>> From: David Creasy [mailto:dc...@ma...] >>>>> Sent: 27 June 2008 14:51 >>>>> To: Jones, Andy >>>>> Cc: psi...@li... >>>>> Subject: Re: [Psidev-pi-dev] Representing Sequences >>>>> >>>>> Hi Andy, >>>>> >>>>> length may be useful, because some people won't want to output the actual >>>>> sequence for space reasons. The other things >>>>> we wanted to add before were pI and mass. >>>>> Why do we want name? Is this for, say, a description line? >>>>> (Also, identifier -> id?) >>>>> >>>>> David >>>>> >>>>> Jones, Andy wrote: >>>>> Hi all, >>>>> >>>>> It was decided on the call that we would like to flag that Sequences in the >>>>> ConceptualMoleculeCollection should have a >>>>> Boolean attribute to capture if they are decoy sequences. At the moment we are >>>>> using the FuGE:Sequence element. I don't >>>>> really want to add another attribute to this (it's less problematic cutting down FuGE >>>>> than adding new things), so I'm >>>>> wondering if we should define our own Sequence type in AnalysisXML. This >>>>> would also allow us to choose exactly the >>>>> relevant attributes. At the moment, Sequence can have all of the following: >>>>> >>>>> <pf:Sequence isCircular="true" sequence="String" length="0" >>>>> isApproximateLength="true" >>>>> SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" >>>>> name="String"> >>>>> >>>>> Several of these attributes were created to represent concepts that probably will >>>>> never be required or implemented in >>>>> AnalysisXML. How about the following: >>>>> >>>>> <DBSequence identifier = "" name = "" isDecoy = "true"> >>>>> <seq>MCTMG...</seq> >>>>> <pf:DatabaseReference Database_ref="" >>>>> accession="Rev_IPI00013808.1"/> >>>>> </DBSequence> >>>>> >>>>> Are any of the other attributes on Sequence actually required? I'll post a new >>>>> version of the schema with other changes >>>>> WRT to PeptideEvidence shortly, >>>>> Cheers >>>>> Andy >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ________________________________________ >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------- >>>>> Check out the new SourceForge.net Marketplace. >>>>> It's the best place to buy or sell services for >>>>> just about anything Open Source. >>>>> http://sourceforge.net/services/buy/index.php >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ________________________________________ >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Psidev-pi-dev mailing list >>>>> Psi...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>>>> >>>>> >>>>> -- >>>>> David Creasy >>>>> Matrix Science >>>>> 64 Baker Street >>>>> London W1U 7GB, UK >>>>> Tel: +44 (0)20 7486 1050 >>>>> Fax: +44 (0)20 7224 1344 >>>>> >>>>> dc...@ma... >>>>> http://www.matrixscience.com >>>>> >>>>> Matrix Science Ltd. is registered in England and Wales >>>>> Company number 3533898 >>>>> >>>>> >>>>> >>>>> ________________________________________ >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------- >>>>> Check out the new SourceForge.net Marketplace. >>>>> It's the best place to buy or sell services for >>>>> just about anything Open Source. >>>>> http://sourceforge.net/services/buy/index.php >>>>> >>>>> >>>>> >>>>> >>>>> ________________________________________ >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Psidev-pi-dev mailing list >>>>> Psi...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>>>> >>>>> >>>>> -- >>>>> David Creasy >>>>> Matrix Science >>>>> 64 Baker Street >>>>> London W1U 7GB, UK >>>>> Tel: +44 (0)20 7486 1050 >>>>> Fax: +44 (0)20 7224 1344 >>>>> >>>>> dc...@ma... >>>>> http://www.matrixscience.com >>>>> >>>>> Matrix Science Ltd. is registered in England and Wales >>>>> Company number 3533898 >>>>> >>>>> ------------------------------------------------------------------------- >>>>> Check out the new SourceForge.net Marketplace. >>>>> It's the best place to buy or sell services for >>>>> just about anything Open Source. >>>>> http://sourceforge.net/services/buy/index.php >>>>> _______________________________________________ >>>>> Psidev-pi-dev mailing list >>>>> Psi...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>>>> >>>>> >>>>> >>>>> -- >>>>> Angel Pizarro >>>>> Director, ITMAT Bioinformatics Facility >>>>> 806 Biological Research Building >>>>> 421 Curie Blvd. >>>>> Philadelphia, PA 19104-6160 >>>>> 215-573-3736 >>>>> ________________________________________ >>>>> >>>>> ------------------------------------------------------------------------- >>>>> Check out the new SourceForge.net Marketplace. >>>>> It's the best place to buy or sell services for >>>>> just about anything Open Source. >>>>> http://sourceforge.net/services/buy/index.php >>>>> ________________________________________ >>>>> >>>>> _______________________________________________ >>>>> Psidev-pi-dev mailing list >>>>> Psi...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>>>> >>>>> >>>>> >>>>> -- >>>>> David Creasy >>>>> Matrix Science >>>>> 64 Baker Street >>>>> London W1U 7GB, UK >>>>> Tel: +44 (0)20 7486 1050 >>>>> Fax: +44 (0)20 7224 1344 >>>>> >>>>> dc...@ma... >>>>> http://www.matrixscience.com >>>>> >>>>> Matrix Science Ltd. is registered in England and Wales >>>>> Company number 3533898 >>>>> >>>>> ________________________________________ >>>>> >>>>> ------------------------------------------------------------------------- >>>>> Check out the new SourceForge.net Marketplace. >>>>> It's the best place to buy or sell services for >>>>> just about anything Open Source. >>>>> http://sourceforge.net/services/buy/index.php >>>>> >>>>> ________________________________________ >>>>> >>>>> _______________________________________________ >>>>> Psidev-pi-dev mailing list >>>>> Psi...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------- >>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >>>>> Build the coolest Linux based applications with Moblin SDK & win great prizes >>>>> Grand prize is a trip for two to an Open Source event anywhere in the world >>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>> _______________________________________________ >>>>> Psidev-pi-dev mailing list >>>>> Psi...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> ------------------------------------------------------------------------- >>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >>> Build the coolest Linux based applications with Moblin SDK & win great prizes >>> Grand prize is a trip for two to an Open Source event anywhere in the world >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> -- >> David Creasy >> Matrix Science >> 64 Baker Street >> London W1U 7GB, UK >> Tel: +44 (0)20 7486 1050 >> Fax: +44 (0)20 7224 1344 >> >> dc...@ma... >> http://www.matrixscience.com >> >> Matrix Science Ltd. is registered in England and Wales >> Company number 3533898 > -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: Martin E. <mar...@ru...> - 2008-08-07 14:17:30
|
> >>>> I notice also that there is a small error in the schema in that on PeptideEvidence DBSequence_ref > should be > >>>> mandatory (and it is missing from the instance docs). I can fix this if there is agreement on this? > >>> Yes, if <PeptideEvidence> stays optional. > >> What about denovo where there is no database... > > That is an argument to have PeptideEvidence optional, isn't it? > > But DBSequence_ref as attribute of it should be mandatory. > Doh, sorry, yes you are totally correct. It should be mandatory. Now it is. ANd next problem: WE have two "Sequence_Ref" attributes, in <PeptideEvidence> and <ProteinHypothesis> (now both mandatory). What if they are contradictory (validator?)? If they are not contradictory, at least the one in <ProteinHypothesis> is redundant. > >(and it is missing from the instance docs). > I believe it's in all the Mascot ones? It is. > >>>> It is a database search parameter: > >>>> <AdditionalSearchParams> > >>>> <pf:cvParam accession="PRIDE:0000162" name="Mass value type setting monoisotopic" cvRef="PRIDE"/> > >>> Yes, it is, but in case we have more than one SpectrumIdentification, that could be conflicting. > >>> http://code.google.com/p/psi-pi/issues/detail?id=37 > >> I'm not sure I understand whether this is OK or not now? (And why use > >> Pride CV?) > > I think the current schema is not okay, because it allows "average" in one SpecIdent and "mono" in > another, > > so it is not well-defined for the masses in elements or attributes. > > We need a global attribute :-) or element. Or it can be done later in semantic validation :-( . > I think it's actually _required_ to be like this. For example, at least > one search engine allows you to specify mono for masses below x and > average for masses above x. So, in this case, the output should be > similar to the N15 example that I've supplied, with two separate mass > tables. Maybe you could look at the Mascot_N15_example.xml and see if > you think that this is OK. It is okay with me; to answer Pierre-Alains original question then: all mass values for peptides then depend on the type of search performed and the residue table used. ;-) Bye Martin > Talk soon, > > David > > > > > Bye > > Martin > > > > > >>>>> -----Original Message----- > >>>>> From: psi...@li... [mailto:psidev-pi-dev- > >>>>> bo...@li...] On Behalf Of Martin Eisenacher > >>>>> Sent: 30 July 2008 13:05 > >>>>> To: 'Pierre-Alain Binz' > >>>>> Cc: psi...@li... > >>>>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > >>>>> > >>>>> Hi Pierre-Alain, quite old posting, but I saw no answer yet, so I will try: > >>>>> > >>>>>> 2nd July, 2008: > >>>>>> a couple of questions, just to make sure: > >>>>>> 1) in case of top-down approach, do we have to duplicate sequenceCollection > >>>>> information? > >>>>> I hope not, by referencing the same identifier. > >>>>> > >>>>>> as SpectrumIdentificationResult contains a PeptideEvidence refering to a Peptide > >>>>> element > >>>>>> (and not to a DBSequence), identification is obligatory a Peptide? > >>>>> At the moment I think it's possible to directly reference a DBSeq. At the time the > >>>>> foreign key definitions are implemented we can forbid that. > >>>>> But we should have in mind, that a peptide is a sequence plus modifications, so if > >>>>> top-down > >>>>> identifies only a sequence, we should allow that and if top-down identifies with > >>>>> mods, > >>>>> we should forbid that. > >>>>> It would be quite helpful to have a top-down instance doc. To check > >>>>> whether our thoughts are really deep enough... > >>>>> > >>>>>> 2) and what about spectral library searches, do we have to have Peptide > >>>>>> elements with possibly undefined explicit sequences to refer to > >>>>> >from the SpectrumIdentificationResult (because non peptidic, or because not > >>>>> identified > >>>>>> but good spectrum) > >>>>> At the moment the sequence element can be empty or even left out. > >>>>> User or CV params are allowed. > >>>>> How do they report results in spectral lib search if they identify non-peptidic or > >>>>> unidentified? > >>>>> We need CV terms for that... > >>>>> > >>>>>> 3) in the Peptide element, the Modifications are defined in a much more > >>>>>> detailed manner than in ModificationParams (PSI-MOD is there for > >>>>>> instance). Does this simply mean that The ModificationParams codes > >>>>>> the search engine settings and the Peptide includes the formal PSI > >>>>>> definition of the Mod? And the only reference is the ModName value? > >>>>> I think that has changed meanwhile, in the MPC use case I used PSI-MOD terms > >>>>> for both. If a search engine has its "own" mods, we need CV for that in PSI-PI CV > >>>>> or > >>>>> they can define their own. > >>>>> > >>>>>> 4) all mass values (sequenceMass, calculatedMassToCharge, > >>>>> experimentalMassToCharge, > >>>>>> are not specified whether monoisotopic or averaged. > >>>>>> Do we assume that averaged does not exist anymore? > >>>>> No, we decided to have only one type of masses in the whole analysisXML. > >>>>> But I cannot find a note for that or a schema attribute... I will add an issue for that. > >>>>> > >>>>> > >>>>>> 5) is sequenceMass the mass value with/without the mods? If with, the > >>>>>> name might be missleading (peptideMass would be more appropriate) > >>>>> It is indeed the mass of the sequence without mods. > >>>>> THAT is described in http://code.google.com/p/psi-pi/wiki/NotesForFocumentation > >>>>> > >>>>>> 6) in case the DBSequence is nucleotide, is there a tag for saying > >>>>>> this? (NB: MS on nucleotide molecules can be performed and analysed, > >>>>>> not only MS on AA sequences that are interpreting nucleotide sequences). > >>>>>> Or do we neglect MS experiments done on nucleotide molecules (and by > >>>>>> the way on glycans...) and only represent the DBSequences as AA > >>>>>> sequences (frame translations)? (and what about glycans?) > >>>>>> Probaly can be solved if one can replace SequenceCollection by > >>>>>> something else if needed (SmallMoleculeCollection, GlycanCollection, > >>>>>> MoleculeCollection)... but the validator might not like this. > >>>>> Mh, these can be extensions, I think they are not possible at the moment. > >>>>> But a tag for the type can indeed be useful, it could be a CV param. > >>>>> I will create an issue for that. > >>>>> > >>>>>> 7) in case that DBSequence is nucleotide, do we represent the > >>>>>> Peptide as AA sequence in case of MS done on proteins? > >>>>> I hope the following answers this: > >>>>> > >>>>> <DBSequence> is the nucleotide seq from the nucleotide DB, > >>>>> <Peptide> is the identified amino acid sequence plus mods (without any translation > >>>>> frame or something). > >>>>> <PeptideEvidence> contains the DBSequence_Ref together with a frame and a > >>>>> TranslationTable_Ref attribute. > >>>>> (The Peptide_Ref is done in SpectrumIdentificationItem as in the amino acid DB > >>>>> case.) > >>>>> If a protein detection is performed, there are <PeptideHypothesis> elements > >>>>> referencing > >>>>> PeptideEvidence elements from SpectrumIdentificationItem sections. > >>>>> > >>>>> > >>>>> > >>>>> Bye > >>>>> Martin > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> David Creasy wrote: > >>>>> Thanks Andy, > >>>>> > >>>>> I've added an updated example document to SVN: > >>>>> http://code.google.com/p/psi- > >>>>> pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F00 > >>>>> 1350.xml > >>>>> > >>>>> Problem is that we have now removed the main point of these recent changes > >>>>> which was to add the decoy flag... I think > >>>>> that we need to add isDecoy to SpectrumIdentificationItem. > >>>>> > >>>>> And yes, I suspect that we should go back to using the > >>>>> ConceptualMoleculeCollection > >>>>> Um, and since we've not actually ended up adding anything to DBSequence... we > >>>>> haven't actually achieved anything? > >>>>> I think we need to discuss this again at the next telecon. > >>>>> > >>>>> David > >>>>> > >>>>> Jones, Andy wrote: > >>>>> Hi all, > >>>>> > >>>>> Ive updated the schema in SVN with the following main changes: > >>>>> > >>>>> PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the > >>>>> call (simple mappings to proteins are done > >>>>> at this level) > >>>>> Added DBSequence that should be used instead of Sequence (following some of > >>>>> the discussion below) > >>>>> Created a new collection class SequenceCollection (rather than > >>>>> ConceptualMoleculeCollection) so that only references can > >>>>> be given to DBSequence and Peptide > >>>>> In fact, Im not sure if this is sensible since it prevents other types of > >>>>> ConceptualMolecule being added later... to > >>>>> discuss > >>>>> In FuGE on cvParam, the value attribute is no longer mandatory > >>>>> > >>>>> Ive added a simple example that validates under > >>>>> examples\schema_usecase_examples\working27June > >>>>> > >>>>> Feel free to mail me any changes to make on Monday, > >>>>> Cheers > >>>>> Andy > >>>>> > >>>>> > >>>>> > >>>>> From: psi...@li... [mailto:psidev-pi-dev- > >>>>> bo...@li...] On Behalf Of > >>>>> Jones, Andy > >>>>> Sent: 27 June 2008 16:24 > >>>>> To: Angel Pizarro > >>>>> Cc: psi...@li... > >>>>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > >>>>> > >>>>> I think Angels response below might not have made it round the list yet. > >>>>> > >>>>> I tend to agree that isDecoy is redundant information and perhaps this is not the > >>>>> best place to encode semantic > >>>>> information. An alternative would be to have a parameter, say on > >>>>> SpectrumIdentification for cvParam = decoy_string > >>>>> value = Rev. This would be a more compact representation and we would not > >>>>> have to add what is quite a specific > >>>>> attribute type (isDecoy) to Sequence. > >>>>> > >>>>> > >>>>> > >>>>> From: an...@it... [mailto:an...@it...] On Behalf Of Angel > >>>>> Pizarro > >>>>> Sent: 27 June 2008 15:59 > >>>>> To: Jones, Andy > >>>>> Cc: psi...@li... > >>>>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > >>>>> > >>>>> my 2¢ : > >>>>> You need to be able to extend this to all molecule types, or am I missing the point > >>>>> of this thread, and you mean that > >>>>> this would be a suclass of the conceptual molecule element? > >>>>> > >>>>> Second, and this is is tangentially related, but are decoy sequences really a > >>>>> problem we should be putting our effort > >>>>> into? Is it in our domain to encode semantic information about a sequence, and > >>>>> possibly relating reported sequences as > >>>>> part of our schema? > >>>>> On a personal level I could care less if "isDecoy" is an attribute or not, but the > >>>>> temptation then would be for folks to > >>>>> encode the same accession for two different sequences, effectively making the > >>>>> primary key of the sequence object > >>>>> (accession, isDecoy) > >>>>> > >>>>> > >>>>> Do we want to go there? > >>>>> On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> > >>>>> wrote: > >>>>> So how about include length as an attribute and then let all other things go in the > >>>>> CV (pI, mass, etc.)? > >>>>> > >>>>> > >>>>> > >>>>> From: Jones, Andy > >>>>> Sent: 27 June 2008 14:54 > >>>>> To: 'David Creasy' > >>>>> Subject: RE: [Psidev-pi-dev] Representing Sequences > >>>>> > >>>>> id and name are standard for all elements that inherit from FuGE identifiable this > >>>>> is perhaps a separate discussion as > >>>>> to whether the optional name attribute should be there. > >>>>> > >>>>> I agree that length may be useful is this just an integer value with no unit? > >>>>> Yes, I think so. > >>>>> I'm less sure about pI and mass since mass at least can be calculated very simply > >>>>> Only if you have the sequence... (we have residue masses in the file). > >>>>> > >>>>> > >>>>> , and pI values (in my opinion) are pretty inaccurate and fairly meaningless > >>>>> Scandalous! (I happen to agree, but now some people will never speak to either of > >>>>> us ever again). > >>>>> > >>>>> The main problem with mass and pI is that these are 'irrelevant' if the sequence is > >>>>> nuleic acid rather than residues. > >>>>> Why not just allow CV there? We can share the same CV as the PEFF format, > >>>>> which includes, taxonomy, sequence type, gene > >>>>> ID, and lots of wonderful other things? > >>>>> > >>>>> > >>>>> unless someone can convince me otherwise? > >>>>> Cheers > >>>>> Andy > >>>>> > >>>>> > >>>>> From: David Creasy [mailto:dc...@ma...] > >>>>> Sent: 27 June 2008 14:51 > >>>>> To: Jones, Andy > >>>>> Cc: psi...@li... > >>>>> Subject: Re: [Psidev-pi-dev] Representing Sequences > >>>>> > >>>>> Hi Andy, > >>>>> > >>>>> length may be useful, because some people won't want to output the actual > >>>>> sequence for space reasons. The other things > >>>>> we wanted to add before were pI and mass. > >>>>> Why do we want name? Is this for, say, a description line? > >>>>> (Also, identifier -> id?) > >>>>> > >>>>> David > >>>>> > >>>>> Jones, Andy wrote: > >>>>> Hi all, > >>>>> > >>>>> It was decided on the call that we would like to flag that Sequences in the > >>>>> ConceptualMoleculeCollection should have a > >>>>> Boolean attribute to capture if they are decoy sequences. At the moment we are > >>>>> using the FuGE:Sequence element. I don't > >>>>> really want to add another attribute to this (it's less problematic cutting down FuGE > >>>>> than adding new things), so I'm > >>>>> wondering if we should define our own Sequence type in AnalysisXML. This > >>>>> would also allow us to choose exactly the > >>>>> relevant attributes. At the moment, Sequence can have all of the following: > >>>>> > >>>>> <pf:Sequence isCircular="true" sequence="String" length="0" > >>>>> isApproximateLength="true" > >>>>> SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" > >>>>> name="String"> > >>>>> > >>>>> Several of these attributes were created to represent concepts that probably will > >>>>> never be required or implemented in > >>>>> AnalysisXML. How about the following: > >>>>> > >>>>> <DBSequence identifier = "" name = "" isDecoy = "true"> > >>>>> <seq>MCTMG...</seq> > >>>>> <pf:DatabaseReference Database_ref="" > >>>>> accession="Rev_IPI00013808.1"/> > >>>>> </DBSequence> > >>>>> > >>>>> Are any of the other attributes on Sequence actually required? I'll post a new > >>>>> version of the schema with other changes > >>>>> WRT to PeptideEvidence shortly, > >>>>> Cheers > >>>>> Andy > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> ________________________________________ > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> ------------------------------------------------------------------------- > >>>>> Check out the new SourceForge.net Marketplace. > >>>>> It's the best place to buy or sell services for > >>>>> just about anything Open Source. > >>>>> http://sourceforge.net/services/buy/index.php > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> ________________________________________ > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Psidev-pi-dev mailing list > >>>>> Psi...@li... > >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>>>> > >>>>> > >>>>> -- > >>>>> David Creasy > >>>>> Matrix Science > >>>>> 64 Baker Street > >>>>> London W1U 7GB, UK > >>>>> Tel: +44 (0)20 7486 1050 > >>>>> Fax: +44 (0)20 7224 1344 > >>>>> > >>>>> dc...@ma... > >>>>> http://www.matrixscience.com > >>>>> > >>>>> Matrix Science Ltd. is registered in England and Wales > >>>>> Company number 3533898 > >>>>> > >>>>> > >>>>> > >>>>> ________________________________________ > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> ------------------------------------------------------------------------- > >>>>> Check out the new SourceForge.net Marketplace. > >>>>> It's the best place to buy or sell services for > >>>>> just about anything Open Source. > >>>>> http://sourceforge.net/services/buy/index.php > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> ________________________________________ > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Psidev-pi-dev mailing list > >>>>> Psi...@li... > >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>>>> > >>>>> > >>>>> -- > >>>>> David Creasy > >>>>> Matrix Science > >>>>> 64 Baker Street > >>>>> London W1U 7GB, UK > >>>>> Tel: +44 (0)20 7486 1050 > >>>>> Fax: +44 (0)20 7224 1344 > >>>>> > >>>>> dc...@ma... > >>>>> http://www.matrixscience.com > >>>>> > >>>>> Matrix Science Ltd. is registered in England and Wales > >>>>> Company number 3533898 > >>>>> > >>>>> ------------------------------------------------------------------------- > >>>>> Check out the new SourceForge.net Marketplace. > >>>>> It's the best place to buy or sell services for > >>>>> just about anything Open Source. > >>>>> http://sourceforge.net/services/buy/index.php > >>>>> _______________________________________________ > >>>>> Psidev-pi-dev mailing list > >>>>> Psi...@li... > >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Angel Pizarro > >>>>> Director, ITMAT Bioinformatics Facility > >>>>> 806 Biological Research Building > >>>>> 421 Curie Blvd. > >>>>> Philadelphia, PA 19104-6160 > >>>>> 215-573-3736 > >>>>> ________________________________________ > >>>>> > >>>>> ------------------------------------------------------------------------- > >>>>> Check out the new SourceForge.net Marketplace. > >>>>> It's the best place to buy or sell services for > >>>>> just about anything Open Source. > >>>>> http://sourceforge.net/services/buy/index.php > >>>>> ________________________________________ > >>>>> > >>>>> _______________________________________________ > >>>>> Psidev-pi-dev mailing list > >>>>> Psi...@li... > >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> David Creasy > >>>>> Matrix Science > >>>>> 64 Baker Street > >>>>> London W1U 7GB, UK > >>>>> Tel: +44 (0)20 7486 1050 > >>>>> Fax: +44 (0)20 7224 1344 > >>>>> > >>>>> dc...@ma... > >>>>> http://www.matrixscience.com > >>>>> > >>>>> Matrix Science Ltd. is registered in England and Wales > >>>>> Company number 3533898 > >>>>> > >>>>> ________________________________________ > >>>>> > >>>>> ------------------------------------------------------------------------- > >>>>> Check out the new SourceForge.net Marketplace. > >>>>> It's the best place to buy or sell services for > >>>>> just about anything Open Source. > >>>>> http://sourceforge.net/services/buy/index.php > >>>>> > >>>>> ________________________________________ > >>>>> > >>>>> _______________________________________________ > >>>>> Psidev-pi-dev mailing list > >>>>> Psi...@li... > >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>>>> > >>>>> > >>>>> > >>>>> ------------------------------------------------------------------------- > >>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > >>>>> Build the coolest Linux based applications with Moblin SDK & win great prizes > >>>>> Grand prize is a trip for two to an Open Source event anywhere in the world > >>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ > >>>>> _______________________________________________ > >>>>> Psidev-pi-dev mailing list > >>>>> Psi...@li... > >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>> > >>> ------------------------------------------------------------------------- > >>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > >>> Build the coolest Linux based applications with Moblin SDK & win great prizes > >>> Grand prize is a trip for two to an Open Source event anywhere in the world > >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ > >>> _______________________________________________ > >>> Psidev-pi-dev mailing list > >>> Psi...@li... > >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >> -- > >> David Creasy > >> Matrix Science > >> 64 Baker Street > >> London W1U 7GB, UK > >> Tel: +44 (0)20 7486 1050 > >> Fax: +44 (0)20 7224 1344 > >> > >> dc...@ma... > >> http://www.matrixscience.com > >> > >> Matrix Science Ltd. is registered in England and Wales > >> Company number 3533898 > > > > -- > David Creasy > Matrix Science > 64 Baker Street > London W1U 7GB, UK > Tel: +44 (0)20 7486 1050 > Fax: +44 (0)20 7224 1344 > > dc...@ma... > http://www.matrixscience.com > > Matrix Science Ltd. is registered in England and Wales > Company number 3533898 |
From: David C. <dc...@ma...> - 2008-08-07 14:38:22
|
Martin Eisenacher wrote: >>>>>> I notice also that there is a small error in the schema in that on PeptideEvidence DBSequence_ref >>>>>> >> should be >> >>>>>> mandatory (and it is missing from the instance docs). I can fix this if there is agreement on this? >>>>>> >>>>> Yes, if <PeptideEvidence> stays optional. >>>>> >>>> What about denovo where there is no database... >>>> >>> That is an argument to have PeptideEvidence optional, isn't it? >>> But DBSequence_ref as attribute of it should be mandatory. >>> >> Doh, sorry, yes you are totally correct. It should be mandatory. >> > Now it is. > ANd next problem: WE have two "Sequence_Ref" attributes, > in <PeptideEvidence> and <ProteinHypothesis> (now both mandatory). > What if they are contradictory (validator?)? > If they are not contradictory, at least the one in <ProteinHypothesis> is redundant. > With current use cases, I think it's always redundant, but I'm trying to think of a case where it wouldn't be. However, since the <ProteinDetectionHypothesis> has to have at least one <PeptideHypothesis>, you must be right. Suggest that we remove the reference from <ProteinHypothesis> > >> >(and it is missing from the instance docs). >> I believe it's in all the Mascot ones? >> > It is. > > >>>>>> It is a database search parameter: >>>>>> <AdditionalSearchParams> >>>>>> <pf:cvParam accession="PRIDE:0000162" name="Mass value type setting monoisotopic" cvRef="PRIDE"/> >>>>>> >>>>> Yes, it is, but in case we have more than one SpectrumIdentification, that could be conflicting. >>>>> http://code.google.com/p/psi-pi/issues/detail?id=37 >>>>> >>>> I'm not sure I understand whether this is OK or not now? (And why use >>>> Pride CV?) >>>> >>> I think the current schema is not okay, because it allows "average" in one SpecIdent and "mono" in >>> >> another, >> >>> so it is not well-defined for the masses in elements or attributes. >>> We need a global attribute :-) or element. Or it can be done later in semantic validation :-( . >>> >> I think it's actually _required_ to be like this. For example, at least >> one search engine allows you to specify mono for masses below x and >> average for masses above x. So, in this case, the output should be >> similar to the N15 example that I've supplied, with two separate mass >> tables. Maybe you could look at the Mascot_N15_example.xml and see if >> you think that this is OK. >> > It is okay with me; to answer Pierre-Alains original question then: > all mass values for peptides then depend on > the type of search performed and the residue table used. ;-) > > Bye > Martin > > > > >> Talk soon, >> >> David >> >> >>> Bye >>> Martin >>> >>> >>> >>>>>>> -----Original Message----- >>>>>>> From: psi...@li... [mailto:psidev-pi-dev- >>>>>>> bo...@li...] On Behalf Of Martin Eisenacher >>>>>>> Sent: 30 July 2008 13:05 >>>>>>> To: 'Pierre-Alain Binz' >>>>>>> Cc: psi...@li... >>>>>>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences >>>>>>> >>>>>>> Hi Pierre-Alain, quite old posting, but I saw no answer yet, so I will try: >>>>>>> >>>>>>> >>>>>>>> 2nd July, 2008: >>>>>>>> a couple of questions, just to make sure: >>>>>>>> 1) in case of top-down approach, do we have to duplicate sequenceCollection >>>>>>>> >>>>>>> information? >>>>>>> I hope not, by referencing the same identifier. >>>>>>> >>>>>>> >>>>>>>> as SpectrumIdentificationResult contains a PeptideEvidence refering to a Peptide >>>>>>>> >>>>>>> element >>>>>>> >>>>>>>> (and not to a DBSequence), identification is obligatory a Peptide? >>>>>>>> >>>>>>> At the moment I think it's possible to directly reference a DBSeq. At the time the >>>>>>> foreign key definitions are implemented we can forbid that. >>>>>>> But we should have in mind, that a peptide is a sequence plus modifications, so if >>>>>>> top-down >>>>>>> identifies only a sequence, we should allow that and if top-down identifies with >>>>>>> mods, >>>>>>> we should forbid that. >>>>>>> It would be quite helpful to have a top-down instance doc. To check >>>>>>> whether our thoughts are really deep enough... >>>>>>> >>>>>>> >>>>>>>> 2) and what about spectral library searches, do we have to have Peptide >>>>>>>> elements with possibly undefined explicit sequences to refer to >>>>>>>> >>>>>>> >from the SpectrumIdentificationResult (because non peptidic, or because not >>>>>>> identified >>>>>>> >>>>>>>> but good spectrum) >>>>>>>> >>>>>>> At the moment the sequence element can be empty or even left out. >>>>>>> User or CV params are allowed. >>>>>>> How do they report results in spectral lib search if they identify non-peptidic or >>>>>>> unidentified? >>>>>>> We need CV terms for that... >>>>>>> >>>>>>> >>>>>>>> 3) in the Peptide element, the Modifications are defined in a much more >>>>>>>> detailed manner than in ModificationParams (PSI-MOD is there for >>>>>>>> instance). Does this simply mean that The ModificationParams codes >>>>>>>> the search engine settings and the Peptide includes the formal PSI >>>>>>>> definition of the Mod? And the only reference is the ModName value? >>>>>>>> >>>>>>> I think that has changed meanwhile, in the MPC use case I used PSI-MOD terms >>>>>>> for both. If a search engine has its "own" mods, we need CV for that in PSI-PI CV >>>>>>> or >>>>>>> they can define their own. >>>>>>> >>>>>>> >>>>>>>> 4) all mass values (sequenceMass, calculatedMassToCharge, >>>>>>>> >>>>>>> experimentalMassToCharge, >>>>>>> >>>>>>>> are not specified whether monoisotopic or averaged. >>>>>>>> Do we assume that averaged does not exist anymore? >>>>>>>> >>>>>>> No, we decided to have only one type of masses in the whole analysisXML. >>>>>>> But I cannot find a note for that or a schema attribute... I will add an issue for that. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> 5) is sequenceMass the mass value with/without the mods? If with, the >>>>>>>> name might be missleading (peptideMass would be more appropriate) >>>>>>>> >>>>>>> It is indeed the mass of the sequence without mods. >>>>>>> THAT is described in http://code.google.com/p/psi-pi/wiki/NotesForFocumentation >>>>>>> >>>>>>> >>>>>>>> 6) in case the DBSequence is nucleotide, is there a tag for saying >>>>>>>> this? (NB: MS on nucleotide molecules can be performed and analysed, >>>>>>>> not only MS on AA sequences that are interpreting nucleotide sequences). >>>>>>>> Or do we neglect MS experiments done on nucleotide molecules (and by >>>>>>>> the way on glycans...) and only represent the DBSequences as AA >>>>>>>> sequences (frame translations)? (and what about glycans?) >>>>>>>> Probaly can be solved if one can replace SequenceCollection by >>>>>>>> something else if needed (SmallMoleculeCollection, GlycanCollection, >>>>>>>> MoleculeCollection)... but the validator might not like this. >>>>>>>> >>>>>>> Mh, these can be extensions, I think they are not possible at the moment. >>>>>>> But a tag for the type can indeed be useful, it could be a CV param. >>>>>>> I will create an issue for that. >>>>>>> >>>>>>> >>>>>>>> 7) in case that DBSequence is nucleotide, do we represent the >>>>>>>> Peptide as AA sequence in case of MS done on proteins? >>>>>>>> >>>>>>> I hope the following answers this: >>>>>>> >>>>>>> <DBSequence> is the nucleotide seq from the nucleotide DB, >>>>>>> <Peptide> is the identified amino acid sequence plus mods (without any translation >>>>>>> frame or something). >>>>>>> <PeptideEvidence> contains the DBSequence_Ref together with a frame and a >>>>>>> TranslationTable_Ref attribute. >>>>>>> (The Peptide_Ref is done in SpectrumIdentificationItem as in the amino acid DB >>>>>>> case.) >>>>>>> If a protein detection is performed, there are <PeptideHypothesis> elements >>>>>>> referencing >>>>>>> PeptideEvidence elements from SpectrumIdentificationItem sections. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Bye >>>>>>> Martin >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> David Creasy wrote: >>>>>>> Thanks Andy, >>>>>>> >>>>>>> I've added an updated example document to SVN: >>>>>>> http://code.google.com/p/psi- >>>>>>> pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F00 >>>>>>> 1350.xml >>>>>>> >>>>>>> Problem is that we have now removed the main point of these recent changes >>>>>>> which was to add the decoy flag... I think >>>>>>> that we need to add isDecoy to SpectrumIdentificationItem. >>>>>>> >>>>>>> And yes, I suspect that we should go back to using the >>>>>>> ConceptualMoleculeCollection >>>>>>> Um, and since we've not actually ended up adding anything to DBSequence... we >>>>>>> haven't actually achieved anything? >>>>>>> I think we need to discuss this again at the next telecon. >>>>>>> >>>>>>> David >>>>>>> >>>>>>> Jones, Andy wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> I’ve updated the schema in SVN with the following main changes: >>>>>>> >>>>>>> PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the >>>>>>> call (simple mappings to proteins are done >>>>>>> at this level) >>>>>>> Added DBSequence that should be used instead of Sequence (following some of >>>>>>> the discussion below) >>>>>>> Created a new collection class SequenceCollection (rather than >>>>>>> ConceptualMoleculeCollection) so that only references can >>>>>>> be given to DBSequence and Peptide >>>>>>> In fact, I’m not sure if this is sensible since it prevents other types of >>>>>>> ConceptualMolecule being added later... to >>>>>>> discuss >>>>>>> In FuGE on cvParam, the value attribute is no longer mandatory >>>>>>> >>>>>>> I’ve added a simple example that validates under >>>>>>> examples\schema_usecase_examples\working27June >>>>>>> >>>>>>> Feel free to mail me any changes to make on Monday, >>>>>>> Cheers >>>>>>> Andy >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: psi...@li... [mailto:psidev-pi-dev- >>>>>>> bo...@li...] On Behalf Of >>>>>>> Jones, Andy >>>>>>> Sent: 27 June 2008 16:24 >>>>>>> To: Angel Pizarro >>>>>>> Cc: psi...@li... >>>>>>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences >>>>>>> >>>>>>> I think Angel’s response below might not have made it round the list yet. >>>>>>> >>>>>>> I tend to agree that isDecoy is redundant information and perhaps this is not the >>>>>>> best place to encode semantic >>>>>>> information. An alternative would be to have a parameter, say on >>>>>>> SpectrumIdentification for cvParam = “decoy_string” >>>>>>> value = “Rev”. This would be a more compact representation and we would not >>>>>>> have to add what is quite a specific >>>>>>> attribute type (isDecoy) to Sequence. >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: an...@it... [mailto:an...@it...] On Behalf Of Angel >>>>>>> Pizarro >>>>>>> Sent: 27 June 2008 15:59 >>>>>>> To: Jones, Andy >>>>>>> Cc: psi...@li... >>>>>>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences >>>>>>> >>>>>>> my 2¢ : >>>>>>> You need to be able to extend this to all molecule types, or am I missing the point >>>>>>> of this thread, and you mean that >>>>>>> this would be a suclass of the conceptual molecule element? >>>>>>> >>>>>>> Second, and this is is tangentially related, but are decoy sequences really a >>>>>>> problem we should be putting our effort >>>>>>> into? Is it in our domain to encode semantic information about a sequence, and >>>>>>> possibly relating reported sequences as >>>>>>> part of our schema? >>>>>>> On a personal level I could care less if "isDecoy" is an attribute or not, but the >>>>>>> temptation then would be for folks to >>>>>>> encode the same accession for two different sequences, effectively making the >>>>>>> primary key of the sequence object >>>>>>> (accession, isDecoy) >>>>>>> >>>>>>> >>>>>>> Do we want to go there? >>>>>>> On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> >>>>>>> wrote: >>>>>>> So how about include length as an attribute and then let all other things go in the >>>>>>> CV (pI, mass, etc.)? >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Jones, Andy >>>>>>> Sent: 27 June 2008 14:54 >>>>>>> To: 'David Creasy' >>>>>>> Subject: RE: [Psidev-pi-dev] Representing Sequences >>>>>>> >>>>>>> id and name are standard for all elements that inherit from FuGE identifiable – this >>>>>>> is perhaps a separate discussion as >>>>>>> to whether the optional name attribute should be there. >>>>>>> >>>>>>> I agree that length may be useful – is this just an integer value with no unit? >>>>>>> Yes, I think so. >>>>>>> I'm less sure about pI and mass since mass at least can be calculated very simply >>>>>>> Only if you have the sequence... (we have residue masses in the file). >>>>>>> >>>>>>> >>>>>>> , and pI values (in my opinion) are pretty inaccurate and fairly meaningless >>>>>>> Scandalous! (I happen to agree, but now some people will never speak to either of >>>>>>> us ever again). >>>>>>> >>>>>>> The main problem with mass and pI is that these are 'irrelevant' if the sequence is >>>>>>> nuleic acid rather than residues. >>>>>>> Why not just allow CV there? We can share the same CV as the PEFF format, >>>>>>> which includes, taxonomy, sequence type, gene >>>>>>> ID, and lots of wonderful other things? >>>>>>> >>>>>>> >>>>>>> – unless someone can convince me otherwise? >>>>>>> Cheers >>>>>>> Andy >>>>>>> >>>>>>> >>>>>>> From: David Creasy [mailto:dc...@ma...] >>>>>>> Sent: 27 June 2008 14:51 >>>>>>> To: Jones, Andy >>>>>>> Cc: psi...@li... >>>>>>> Subject: Re: [Psidev-pi-dev] Representing Sequences >>>>>>> >>>>>>> Hi Andy, >>>>>>> >>>>>>> length may be useful, because some people won't want to output the actual >>>>>>> sequence for space reasons. The other things >>>>>>> we wanted to add before were pI and mass. >>>>>>> Why do we want name? Is this for, say, a description line? >>>>>>> (Also, identifier -> id?) >>>>>>> >>>>>>> David >>>>>>> >>>>>>> Jones, Andy wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> It was decided on the call that we would like to flag that Sequences in the >>>>>>> ConceptualMoleculeCollection should have a >>>>>>> Boolean attribute to capture if they are decoy sequences. At the moment we are >>>>>>> using the FuGE:Sequence element. I don't >>>>>>> really want to add another attribute to this (it's less problematic cutting down FuGE >>>>>>> than adding new things), so I'm >>>>>>> wondering if we should define our own Sequence type in AnalysisXML. This >>>>>>> would also allow us to choose exactly the >>>>>>> relevant attributes. At the moment, Sequence can have all of the following: >>>>>>> >>>>>>> <pf:Sequence isCircular="true" sequence="String" length="0" >>>>>>> isApproximateLength="true" >>>>>>> SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" >>>>>>> name="String"> >>>>>>> >>>>>>> Several of these attributes were created to represent concepts that probably will >>>>>>> never be required or implemented in >>>>>>> AnalysisXML. How about the following: >>>>>>> >>>>>>> <DBSequence identifier = "" name = "" isDecoy = "true"> >>>>>>> <seq>MCTMG...</seq> >>>>>>> <pf:DatabaseReference Database_ref="" >>>>>>> accession="Rev_IPI00013808.1"/> >>>>>>> </DBSequence> >>>>>>> >>>>>>> Are any of the other attributes on Sequence actually required? I'll post a new >>>>>>> version of the schema with other changes >>>>>>> WRT to PeptideEvidence shortly, >>>>>>> Cheers >>>>>>> Andy >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ________________________________________ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------- >>>>>>> Check out the new SourceForge.net Marketplace. >>>>>>> It's the best place to buy or sell services for >>>>>>> just about anything Open Source. >>>>>>> http://sourceforge.net/services/buy/index.php >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ________________________________________ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Psidev-pi-dev mailing list >>>>>>> Psi...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> David Creasy >>>>>>> Matrix Science >>>>>>> 64 Baker Street >>>>>>> London W1U 7GB, UK >>>>>>> Tel: +44 (0)20 7486 1050 >>>>>>> Fax: +44 (0)20 7224 1344 >>>>>>> >>>>>>> dc...@ma... >>>>>>> http://www.matrixscience.com >>>>>>> >>>>>>> Matrix Science Ltd. is registered in England and Wales >>>>>>> Company number 3533898 >>>>>>> >>>>>>> >>>>>>> >>>>>>> ________________________________________ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------- >>>>>>> Check out the new SourceForge.net Marketplace. >>>>>>> It's the best place to buy or sell services for >>>>>>> just about anything Open Source. >>>>>>> http://sourceforge.net/services/buy/index.php >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ________________________________________ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Psidev-pi-dev mailing list >>>>>>> Psi...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> David Creasy >>>>>>> Matrix Science >>>>>>> 64 Baker Street >>>>>>> London W1U 7GB, UK >>>>>>> Tel: +44 (0)20 7486 1050 >>>>>>> Fax: +44 (0)20 7224 1344 >>>>>>> >>>>>>> dc...@ma... >>>>>>> http://www.matrixscience.com >>>>>>> >>>>>>> Matrix Science Ltd. is registered in England and Wales >>>>>>> Company number 3533898 >>>>>>> >>>>>>> ------------------------------------------------------------------------- >>>>>>> Check out the new SourceForge.net Marketplace. >>>>>>> It's the best place to buy or sell services for >>>>>>> just about anything Open Source. >>>>>>> http://sourceforge.net/services/buy/index.php >>>>>>> _______________________________________________ >>>>>>> Psidev-pi-dev mailing list >>>>>>> Psi...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Angel Pizarro >>>>>>> Director, ITMAT Bioinformatics Facility >>>>>>> 806 Biological Research Building >>>>>>> 421 Curie Blvd. >>>>>>> Philadelphia, PA 19104-6160 >>>>>>> 215-573-3736 >>>>>>> ________________________________________ >>>>>>> >>>>>>> ------------------------------------------------------------------------- >>>>>>> Check out the new SourceForge.net Marketplace. >>>>>>> It's the best place to buy or sell services for >>>>>>> just about anything Open Source. >>>>>>> http://sourceforge.net/services/buy/index.php >>>>>>> ________________________________________ >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Psidev-pi-dev mailing list >>>>>>> Psi...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> David Creasy >>>>>>> Matrix Science >>>>>>> 64 Baker Street >>>>>>> London W1U 7GB, UK >>>>>>> Tel: +44 (0)20 7486 1050 >>>>>>> Fax: +44 (0)20 7224 1344 >>>>>>> >>>>>>> dc...@ma... >>>>>>> http://www.matrixscience.com >>>>>>> >>>>>>> Matrix Science Ltd. is registered in England and Wales >>>>>>> Company number 3533898 >>>>>>> >>>>>>> ________________________________________ >>>>>>> >>>>>>> ------------------------------------------------------------------------- >>>>>>> Check out the new SourceForge.net Marketplace. >>>>>>> It's the best place to buy or sell services for >>>>>>> just about anything Open Source. >>>>>>> http://sourceforge.net/services/buy/index.php >>>>>>> >>>>>>> ________________________________________ >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Psidev-pi-dev mailing list >>>>>>> Psi...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------- >>>>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >>>>>>> Build the coolest Linux based applications with Moblin SDK & win great prizes >>>>>>> Grand prize is a trip for two to an Open Source event anywhere in the world >>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>> _______________________________________________ >>>>>>> Psidev-pi-dev mailing list >>>>>>> Psi...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>>>>>> >>>>> ------------------------------------------------------------------------- >>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >>>>> Build the coolest Linux based applications with Moblin SDK & win great prizes >>>>> Grand prize is a trip for two to an Open Source event anywhere in the world >>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>> _______________________________________________ >>>>> Psidev-pi-dev mailing list >>>>> Psi...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>>>> >>>> -- >>>> David Creasy >>>> Matrix Science >>>> 64 Baker Street >>>> London W1U 7GB, UK >>>> Tel: +44 (0)20 7486 1050 >>>> Fax: +44 (0)20 7224 1344 >>>> >>>> dc...@ma... >>>> http://www.matrixscience.com >>>> >>>> Matrix Science Ltd. is registered in England and Wales >>>> Company number 3533898 >>>> >> -- >> David Creasy >> Matrix Science >> 64 Baker Street >> London W1U 7GB, UK >> Tel: +44 (0)20 7486 1050 >> Fax: +44 (0)20 7224 1344 >> >> dc...@ma... >> http://www.matrixscience.com >> >> Matrix Science Ltd. is registered in England and Wales >> Company number 3533898 >> > > -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: Jones, A. <And...@li...> - 2008-08-08 12:06:38
|
Hi David, I'm trying to understand how best to model differential labelling e.g. the N15 file. I think we should view this as similar to searching for different variable mods. For example, if this was an ICAT, we would search for C+ICATlight and C+ICAT heavy within one single search. As such, I don't think it makes sense to separate out an N15 labelling into two different search protocols. I think we should allow multiple mass tables within one SpectrumIdentificationProtocol. The MassTable also will need an id and an optional cv terms to explain the purpose of this particular mass table and perhaps to say if it is monoisotopic or average. We could then add a MassTable_ref to Peptide, documented saying that it is only required if more than one MassTable has been given. If we also have: <FragmentationTable> <Measure id="m_mz"> <pf:cvParam cvRef="PSI-PI" accession="PI:xxxx" name="product ion monoisotopic m/z"/> To say that reported fragment ions are monoisotopic, does this also solve the problem if someone wants to report that precursor ions are average mass and product ions are monoisotopic? Just to be clear however, in SpectrumIdentificationProtocol we have: <SpectrumIdentificationProtocol id="SIP_1" AnalysisSoftware_ref="AS_mascot_server"> <AdditionalSearchParams> ... <pf:cvParam accession="PI:00211" name="mass type setting monoisotopic" cvRef="PSI-PI"/> I didn't understand why we couldn't have two CV terms here for product and precursor mass setting, this is how X!Tandem sets up its searches. Was there something else that I missed here that would stop this working? Cheers Andy |
From: David C. <dc...@ma...> - 2008-08-08 12:41:03
|
Hi Andy, Jones, Andy wrote: > Hi David, > > I'm trying to understand how best to model differential labelling e.g. the N15 file. I think we should view this as similar to searching for different variable mods. For example, if this was an ICAT, we would search for C+ICATlight and C+ICAT heavy within one single search. It's not quite the same really. The problem is that any (naturally occurring) modification that contains nitrogen will also have a different delta for that modification. And you wouldn't want to consider a modification with 15N masses occurring in a peptide made from 14N masses. However, for the sake of simplicity, I think that we can ignore this little detail. > As such, I don't think it makes sense to separate out an N15 labelling into two different search protocols. > Yes, (and I think that this what we agreed at the conference call). > I think we should allow multiple mass tables within one SpectrumIdentificationProtocol. The MassTable also will need an id and an optional cv terms to explain the purpose of this particular mass table and perhaps to say if it is monoisotopic or average. We could then add a MassTable_ref to Peptide, documented saying that it is only required if more than one MassTable has been given. > Yes, this is what we agreed on the call: http://www.psidev.info/index.php?q=node/361 > If we also have: > > <FragmentationTable> > <Measure id="m_mz"> > <pf:cvParam cvRef="PSI-PI" accession="PI:xxxx" name="product ion monoisotopic m/z"/> > > To say that reported fragment ions are monoisotopic, does this also solve the problem if someone wants to report that precursor ions are average mass and product ions are monoisotopic? > Yes. > > Just to be clear however, in SpectrumIdentificationProtocol we have: > > <SpectrumIdentificationProtocol id="SIP_1" AnalysisSoftware_ref="AS_mascot_server"> > <AdditionalSearchParams> > > ... > <pf:cvParam accession="PI:00211" name="mass type setting monoisotopic" cvRef="PSI-PI"/> > > I didn't understand why we couldn't have two CV terms here for product and precursor mass setting, this is how X!Tandem sets up its searches. Yes, this is also what we agreed on the call - Phil's suggestion (although it didn't make it into the minutes). I've added the change request to the CV to: http://code.google.com/p/psi-pi/issues/detail?id=42#c2 > Was there something else that I missed here that would stop this working? > Can't think of anything (= = probably!) Once you've made the schema changes, I'll update the N15 example. Thanks, David > Cheers > Andy > > > > > > -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: Jones, A. <And...@li...> - 2008-08-08 13:10:19
|
I had a vague recollection we discussed it on the call but I couldn't remember what we agreed :-) Schema now updated with these changes, cheers Andy > -----Original Message----- > From: David Creasy [mailto:dc...@ma...] > Sent: 08 August 2008 13:41 > To: Jones, Andy > Cc: psi...@li... > Subject: Re: N15 example > > Hi Andy, > > Jones, Andy wrote: > > Hi David, > > > > I'm trying to understand how best to model differential labelling e.g. the N15 file. > I think we should view this as similar to searching for different variable mods. For > example, if this was an ICAT, we would search for C+ICATlight and C+ICAT heavy > within one single search. > It's not quite the same really. The problem is that any (naturally > occurring) modification that contains nitrogen will also have a > different delta for that modification. And you wouldn't want to consider > a modification with 15N masses occurring in a peptide made from 14N > masses. However, for the sake of simplicity, I think that we can ignore > this little detail. > > As such, I don't think it makes sense to separate out an N15 labelling into two > different search protocols. > > > Yes, (and I think that this what we agreed at the conference call). > > > I think we should allow multiple mass tables within one > SpectrumIdentificationProtocol. The MassTable also will need an id and an optional > cv terms to explain the purpose of this particular mass table and perhaps to say if it > is monoisotopic or average. We could then add a MassTable_ref to Peptide, > documented saying that it is only required if more than one MassTable has been > given. > > > Yes, this is what we agreed on the call: > http://www.psidev.info/index.php?q=node/361 > > If we also have: > > > > <FragmentationTable> > > <Measure id="m_mz"> > > <pf:cvParam cvRef="PSI-PI" accession="PI:xxxx" name="product ion > monoisotopic m/z"/> > > > > To say that reported fragment ions are monoisotopic, does this also solve the > problem if someone wants to report that precursor ions are average mass and > product ions are monoisotopic? > > > Yes. > > > > Just to be clear however, in SpectrumIdentificationProtocol we have: > > > > <SpectrumIdentificationProtocol id="SIP_1" > AnalysisSoftware_ref="AS_mascot_server"> > > <AdditionalSearchParams> > > > > ... > > <pf:cvParam accession="PI:00211" name="mass type setting monoisotopic" > cvRef="PSI-PI"/> > > > > I didn't understand why we couldn't have two CV terms here for product and > precursor mass setting, this is how X!Tandem sets up its searches. > Yes, this is also what we agreed on the call - Phil's suggestion > (although it didn't make it into the minutes). > I've added the change request to the CV to: > http://code.google.com/p/psi-pi/issues/detail?id=42#c2 > > > Was there something else that I missed here that would stop this working? > > > Can't think of anything (= = probably!) > Once you've made the schema changes, I'll update the N15 example. > > Thanks, > David > > Cheers > > Andy > > > > > > > > > > > > > > -- > David Creasy > Matrix Science > 64 Baker Street > London W1U 7GB, UK > Tel: +44 (0)20 7486 1050 > Fax: +44 (0)20 7224 1344 > > dc...@ma... > http://www.matrixscience.com > > Matrix Science Ltd. is registered in England and Wales > Company number 3533898 |
From: Martin E. <mar...@ru...> - 2008-07-10 13:14:38
|
Dear PSI-PI workers! Im confused about the new PeptideHypothesis element and the new location of the PeptideEvidence elements. Is it for the case, where the same peptide (sequence) is part of several proteins? But then this information is only relevant if both proteins are reported as ProteinDetection results (as AnalysisXML is only for reporting final results and not to allow information extraction). Then the PeptideEvidence elements are better placed under ProteinDetectionHypothesis (as agreed to after weeks of discussion ;-) ) If there is a convincing argument I missed, please state it here and I can put it into the wiki doc. Many Thanks! Bye Martin Von: psi...@li... [mailto:psi...@li...] Im Auftrag von Jones, Andy Gesendet: Friday, June 27, 2008 5:36 PM An: psi...@li... Betreff: Re: [Psidev-pi-dev] FW: Representing Sequences Hi all, Ive updated the schema in SVN with the following main changes: - PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the call (simple mappings to proteins are done at this level) - Added DBSequence that should be used instead of Sequence (following some of the discussion below) - Created a new collection class SequenceCollection (rather than ConceptualMoleculeCollection) so that only references can be given to DBSequence and Peptide o In fact, Im not sure if this is sensible since it prevents other types of ConceptualMolecule being added later... to discuss - In FuGE on cvParam, the value attribute is no longer mandatory Ive added a simple example that validates under examples\schema_usecase_examples\working27June Feel free to mail me any changes to make on Monday, Cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of Jones, Andy Sent: 27 June 2008 16:24 To: Angel Pizarro Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences I think Angels response below might not have made it round the list yet. I tend to agree that isDecoy is redundant information and perhaps this is not the best place to encode semantic information. An alternative would be to have a parameter, say on SpectrumIdentification for cvParam = decoy_string value = Rev. This would be a more compact representation and we would not have to add what is quite a specific attribute type (isDecoy) to Sequence. From: an...@it... [mailto:an...@it...] On Behalf Of Angel Pizarro Sent: 27 June 2008 15:59 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences my 2¢ : You need to be able to extend this to all molecule types, or am I missing the point of this thread, and you mean that this would be a suclass of the conceptual molecule element? Second, and this is is tangentially related, but are decoy sequences really a problem we should be putting our effort into? Is it in our domain to encode semantic information about a sequence, and possibly relating reported sequences as part of our schema? On a personal level I could care less if "isDecoy" is an attribute or not, but the temptation then would be for folks to encode the same accession for two different sequences, effectively making the primary key of the sequence object (accession, isDecoy) Do we want to go there? On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> wrote: So how about include length as an attribute and then let all other things go in the CV (pI, mass, etc.)? From: Jones, Andy Sent: 27 June 2008 14:54 To: 'David Creasy' Subject: RE: [Psidev-pi-dev] Representing Sequences id and name are standard for all elements that inherit from FuGE identifiable this is perhaps a separate discussion as to whether the optional name attribute should be there. I agree that length may be useful is this just an integer value with no unit? Yes, I think so. I'm less sure about pI and mass since mass at least can be calculated very simply Only if you have the sequence... (we have residue masses in the file). , and pI values (in my opinion) are pretty inaccurate and fairly meaningless Scandalous! (I happen to agree, but now some people will never speak to either of us ever again). The main problem with mass and pI is that these are 'irrelevant' if the sequence is nuleic acid rather than residues. Why not just allow CV there? We can share the same CV as the PEFF format, which includes, taxonomy, sequence type, gene ID, and lots of wonderful other things? unless someone can convince me otherwise? Cheers Andy From: David Creasy [mailto:dc...@ma...] Sent: 27 June 2008 14:51 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] Representing Sequences Hi Andy, length may be useful, because some people won't want to output the actual sequence for space reasons. The other things we wanted to add before were pI and mass. Why do we want name? Is this for, say, a description line? (Also, identifier -> id?) David Jones, Andy wrote: Hi all, It was decided on the call that we would like to flag that Sequences in the ConceptualMoleculeCollection should have a Boolean attribute to capture if they are decoy sequences. At the moment we are using the FuGE:Sequence element. I don't really want to add another attribute to this (it's less problematic cutting down FuGE than adding new things), so I'm wondering if we should define our own Sequence type in AnalysisXML. This would also allow us to choose exactly the relevant attributes. At the moment, Sequence can have all of the following: <pf:Sequence isCircular="true" sequence="String" length="0" isApproximateLength="true" SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" name="String"> Several of these attributes were created to represent concepts that probably will never be required or implemented in AnalysisXML. How about the following: <DBSequence identifier = "" name = "" isDecoy = "true"> <seq>MCTMG...</seq> <pf:DatabaseReference Database_ref="" accession="Rev_IPI00013808.1"/> </DBSequence> Are any of the other attributes on Sequence actually required? I'll post a new version of the schema with other changes WRT to PeptideEvidence shortly, Cheers Andy _____ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _____ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 _____ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _____ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Jones, A. <And...@li...> - 2008-07-10 14:17:06
|
Hi Martin, This alteration came about because we realised that this provided a good solution to two problems: representing reverse database hits and translated sequences. The false discovery rate might need to be reported for peptide idents only, in which case you need to know which peptide sequences came from which proteins – previously this mapping was only provided in the Protein evidence. Similarly, for translated sequence searches, there may not be any Protein hypotheses, yet the mapping back to positions within the original sequence and the translation frame must be reported. Hope this makes sense, hopefully we included something in the minutes about this. Looks like I’m not going to make the call today (and on holiday next week...) so can someone else look after the schema updates? Cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of Martin Eisenacher Sent: 10 July 2008 14:15 To: psi...@li... Subject: [Psidev-pi-dev] PeptideHypothesis and PeptideEvidence Dear PSI-PI workers! I’m confused about the new PeptideHypothesis element and the new location of the PeptideEvidence elements. Is it for the case, where the same peptide (sequence) is part of several proteins? But then this information is only relevant if both proteins are reported as ProteinDetection results (as AnalysisXML is only for reporting “final” results and not to allow information extraction). Then the PeptideEvidence elements are better placed under ProteinDetectionHypothesis (as agreed to after weeks of discussion ;-) ) If there is a convincing argument I missed, please state it here and I can put it into the wiki doc. Many Thanks! Bye Martin Von: psi...@li... [mailto:psi...@li...] Im Auftrag von Jones, Andy Gesendet: Friday, June 27, 2008 5:36 PM An: psi...@li... Betreff: Re: [Psidev-pi-dev] FW: Representing Sequences Hi all, I’ve updated the schema in SVN with the following main changes: - PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the call (simple mappings to proteins are done at this level) - Added DBSequence that should be used instead of Sequence (following some of the discussion below) - Created a new collection class SequenceCollection (rather than ConceptualMoleculeCollection) so that only references can be given to DBSequence and Peptide o In fact, I’m not sure if this is sensible since it prevents other types of ConceptualMolecule being added later... to discuss - In FuGE on cvParam, the value attribute is no longer mandatory I’ve added a simple example that validates under examples\schema_usecase_examples\working27June Feel free to mail me any changes to make on Monday, Cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of Jones, Andy Sent: 27 June 2008 16:24 To: Angel Pizarro Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences I think Angel’s response below might not have made it round the list yet. I tend to agree that isDecoy is redundant information and perhaps this is not the best place to encode semantic information. An alternative would be to have a parameter, say on SpectrumIdentification for cvParam = “decoy_string” value = “Rev”. This would be a more compact representation and we would not have to add what is quite a specific attribute type (isDecoy) to Sequence. From: an...@it... [mailto:an...@it...] On Behalf Of Angel Pizarro Sent: 27 June 2008 15:59 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences my 2¢ : You need to be able to extend this to all molecule types, or am I missing the point of this thread, and you mean that this would be a suclass of the conceptual molecule element? Second, and this is is tangentially related, but are decoy sequences really a problem we should be putting our effort into? Is it in our domain to encode semantic information about a sequence, and possibly relating reported sequences as part of our schema? On a personal level I could care less if "isDecoy" is an attribute or not, but the temptation then would be for folks to encode the same accession for two different sequences, effectively making the primary key of the sequence object (accession, isDecoy) Do we want to go there? On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> wrote: So how about include length as an attribute and then let all other things go in the CV (pI, mass, etc.)? From: Jones, Andy Sent: 27 June 2008 14:54 To: 'David Creasy' Subject: RE: [Psidev-pi-dev] Representing Sequences id and name are standard for all elements that inherit from FuGE identifiable – this is perhaps a separate discussion as to whether the optional name attribute should be there. I agree that length may be useful – is this just an integer value with no unit? Yes, I think so. I'm less sure about pI and mass since mass at least can be calculated very simply Only if you have the sequence... (we have residue masses in the file). , and pI values (in my opinion) are pretty inaccurate and fairly meaningless Scandalous! (I happen to agree, but now some people will never speak to either of us ever again). The main problem with mass and pI is that these are 'irrelevant' if the sequence is nuleic acid rather than residues. Why not just allow CV there? We can share the same CV as the PEFF format, which includes, taxonomy, sequence type, gene ID, and lots of wonderful other things? – unless someone can convince me otherwise? Cheers Andy From: David Creasy [mailto:dc...@ma...] Sent: 27 June 2008 14:51 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] Representing Sequences Hi Andy, length may be useful, because some people won't want to output the actual sequence for space reasons. The other things we wanted to add before were pI and mass. Why do we want name? Is this for, say, a description line? (Also, identifier -> id?) David Jones, Andy wrote: Hi all, It was decided on the call that we would like to flag that Sequences in the ConceptualMoleculeCollection should have a Boolean attribute to capture if they are decoy sequences. At the moment we are using the FuGE:Sequence element. I don't really want to add another attribute to this (it's less problematic cutting down FuGE than adding new things), so I'm wondering if we should define our own Sequence type in AnalysisXML. This would also allow us to choose exactly the relevant attributes. At the moment, Sequence can have all of the following: <pf:Sequence isCircular="true" sequence="String" length="0" isApproximateLength="true" SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" name="String"> Several of these attributes were created to represent concepts that probably will never be required or implemented in AnalysisXML. How about the following: <DBSequence identifier = "" name = "" isDecoy = "true"> <seq>MCTMG...</seq> <pf:DatabaseReference Database_ref="" accession="Rev_IPI00013808.1"/> </DBSequence> Are any of the other attributes on Sequence actually required? I'll post a new version of the schema with other changes WRT to PeptideEvidence shortly, Cheers Andy ________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Martin E. <mar...@ru...> - 2008-07-10 14:50:42
|
Okay, that seems to work for our use case (although not having peptide FDR and translation table). Ive put your explanation to the wiki Bye Martin Von: Jones, Andy [mailto:And...@li...] Gesendet: Thursday, July 10, 2008 4:17 PM An: Martin Eisenacher; psi...@li... Betreff: RE: [Psidev-pi-dev] PeptideHypothesis and PeptideEvidence Hi Martin, This alteration came about because we realised that this provided a good solution to two problems: representing reverse database hits and translated sequences. The false discovery rate might need to be reported for peptide idents only, in which case you need to know which peptide sequences came from which proteins previously this mapping was only provided in the Protein evidence. Similarly, for translated sequence searches, there may not be any Protein hypotheses, yet the mapping back to positions within the original sequence and the translation frame must be reported. Hope this makes sense, hopefully we included something in the minutes about this. Looks like Im not going to make the call today (and on holiday next week...) so can someone else look after the schema updates? Cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of Martin Eisenacher Sent: 10 July 2008 14:15 To: psi...@li... Subject: [Psidev-pi-dev] PeptideHypothesis and PeptideEvidence Dear PSI-PI workers! Im confused about the new PeptideHypothesis element and the new location of the PeptideEvidence elements. Is it for the case, where the same peptide (sequence) is part of several proteins? But then this information is only relevant if both proteins are reported as ProteinDetection results (as AnalysisXML is only for reporting final results and not to allow information extraction). Then the PeptideEvidence elements are better placed under ProteinDetectionHypothesis (as agreed to after weeks of discussion ;-) ) If there is a convincing argument I missed, please state it here and I can put it into the wiki doc. Many Thanks! Bye Martin Von: psi...@li... [mailto:psi...@li...] Im Auftrag von Jones, Andy Gesendet: Friday, June 27, 2008 5:36 PM An: psi...@li... Betreff: Re: [Psidev-pi-dev] FW: Representing Sequences Hi all, Ive updated the schema in SVN with the following main changes: - PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the call (simple mappings to proteins are done at this level) - Added DBSequence that should be used instead of Sequence (following some of the discussion below) - Created a new collection class SequenceCollection (rather than ConceptualMoleculeCollection) so that only references can be given to DBSequence and Peptide o In fact, Im not sure if this is sensible since it prevents other types of ConceptualMolecule being added later... to discuss - In FuGE on cvParam, the value attribute is no longer mandatory Ive added a simple example that validates under examples\schema_usecase_examples\working27June Feel free to mail me any changes to make on Monday, Cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of Jones, Andy Sent: 27 June 2008 16:24 To: Angel Pizarro Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences I think Angels response below might not have made it round the list yet. I tend to agree that isDecoy is redundant information and perhaps this is not the best place to encode semantic information. An alternative would be to have a parameter, say on SpectrumIdentification for cvParam = decoy_string value = Rev. This would be a more compact representation and we would not have to add what is quite a specific attribute type (isDecoy) to Sequence. From: an...@it... [mailto:an...@it...] On Behalf Of Angel Pizarro Sent: 27 June 2008 15:59 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences my 2¢ : You need to be able to extend this to all molecule types, or am I missing the point of this thread, and you mean that this would be a suclass of the conceptual molecule element? Second, and this is is tangentially related, but are decoy sequences really a problem we should be putting our effort into? Is it in our domain to encode semantic information about a sequence, and possibly relating reported sequences as part of our schema? On a personal level I could care less if "isDecoy" is an attribute or not, but the temptation then would be for folks to encode the same accession for two different sequences, effectively making the primary key of the sequence object (accession, isDecoy) Do we want to go there? On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> wrote: So how about include length as an attribute and then let all other things go in the CV (pI, mass, etc.)? From: Jones, Andy Sent: 27 June 2008 14:54 To: 'David Creasy' Subject: RE: [Psidev-pi-dev] Representing Sequences id and name are standard for all elements that inherit from FuGE identifiable this is perhaps a separate discussion as to whether the optional name attribute should be there. I agree that length may be useful is this just an integer value with no unit? Yes, I think so. I'm less sure about pI and mass since mass at least can be calculated very simply Only if you have the sequence... (we have residue masses in the file). , and pI values (in my opinion) are pretty inaccurate and fairly meaningless Scandalous! (I happen to agree, but now some people will never speak to either of us ever again). The main problem with mass and pI is that these are 'irrelevant' if the sequence is nuleic acid rather than residues. Why not just allow CV there? We can share the same CV as the PEFF format, which includes, taxonomy, sequence type, gene ID, and lots of wonderful other things? unless someone can convince me otherwise? Cheers Andy From: David Creasy [mailto:dc...@ma...] Sent: 27 June 2008 14:51 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] Representing Sequences Hi Andy, length may be useful, because some people won't want to output the actual sequence for space reasons. The other things we wanted to add before were pI and mass. Why do we want name? Is this for, say, a description line? (Also, identifier -> id?) David Jones, Andy wrote: Hi all, It was decided on the call that we would like to flag that Sequences in the ConceptualMoleculeCollection should have a Boolean attribute to capture if they are decoy sequences. At the moment we are using the FuGE:Sequence element. I don't really want to add another attribute to this (it's less problematic cutting down FuGE than adding new things), so I'm wondering if we should define our own Sequence type in AnalysisXML. This would also allow us to choose exactly the relevant attributes. At the moment, Sequence can have all of the following: <pf:Sequence isCircular="true" sequence="String" length="0" isApproximateLength="true" SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" name="String"> Several of these attributes were created to represent concepts that probably will never be required or implemented in AnalysisXML. How about the following: <DBSequence identifier = "" name = "" isDecoy = "true"> <seq>MCTMG...</seq> <pf:DatabaseReference Database_ref="" accession="Rev_IPI00013808.1"/> </DBSequence> Are any of the other attributes on Sequence actually required? I'll post a new version of the schema with other changes WRT to PeptideEvidence shortly, Cheers Andy _____ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _____ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 _____ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _____ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |