From: Martin E. <mar...@ru...> - 2008-07-10 14:50:42
|
Okay, that seems to work for our use case (although not having peptide FDR and translation table). Ive put your explanation to the wiki Bye Martin Von: Jones, Andy [mailto:And...@li...] Gesendet: Thursday, July 10, 2008 4:17 PM An: Martin Eisenacher; psi...@li... Betreff: RE: [Psidev-pi-dev] PeptideHypothesis and PeptideEvidence Hi Martin, This alteration came about because we realised that this provided a good solution to two problems: representing reverse database hits and translated sequences. The false discovery rate might need to be reported for peptide idents only, in which case you need to know which peptide sequences came from which proteins previously this mapping was only provided in the Protein evidence. Similarly, for translated sequence searches, there may not be any Protein hypotheses, yet the mapping back to positions within the original sequence and the translation frame must be reported. Hope this makes sense, hopefully we included something in the minutes about this. Looks like Im not going to make the call today (and on holiday next week...) so can someone else look after the schema updates? Cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of Martin Eisenacher Sent: 10 July 2008 14:15 To: psi...@li... Subject: [Psidev-pi-dev] PeptideHypothesis and PeptideEvidence Dear PSI-PI workers! Im confused about the new PeptideHypothesis element and the new location of the PeptideEvidence elements. Is it for the case, where the same peptide (sequence) is part of several proteins? But then this information is only relevant if both proteins are reported as ProteinDetection results (as AnalysisXML is only for reporting final results and not to allow information extraction). Then the PeptideEvidence elements are better placed under ProteinDetectionHypothesis (as agreed to after weeks of discussion ;-) ) If there is a convincing argument I missed, please state it here and I can put it into the wiki doc. Many Thanks! Bye Martin Von: psi...@li... [mailto:psi...@li...] Im Auftrag von Jones, Andy Gesendet: Friday, June 27, 2008 5:36 PM An: psi...@li... Betreff: Re: [Psidev-pi-dev] FW: Representing Sequences Hi all, Ive updated the schema in SVN with the following main changes: - PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the call (simple mappings to proteins are done at this level) - Added DBSequence that should be used instead of Sequence (following some of the discussion below) - Created a new collection class SequenceCollection (rather than ConceptualMoleculeCollection) so that only references can be given to DBSequence and Peptide o In fact, Im not sure if this is sensible since it prevents other types of ConceptualMolecule being added later... to discuss - In FuGE on cvParam, the value attribute is no longer mandatory Ive added a simple example that validates under examples\schema_usecase_examples\working27June Feel free to mail me any changes to make on Monday, Cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of Jones, Andy Sent: 27 June 2008 16:24 To: Angel Pizarro Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences I think Angels response below might not have made it round the list yet. I tend to agree that isDecoy is redundant information and perhaps this is not the best place to encode semantic information. An alternative would be to have a parameter, say on SpectrumIdentification for cvParam = decoy_string value = Rev. This would be a more compact representation and we would not have to add what is quite a specific attribute type (isDecoy) to Sequence. From: an...@it... [mailto:an...@it...] On Behalf Of Angel Pizarro Sent: 27 June 2008 15:59 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences my 2¢ : You need to be able to extend this to all molecule types, or am I missing the point of this thread, and you mean that this would be a suclass of the conceptual molecule element? Second, and this is is tangentially related, but are decoy sequences really a problem we should be putting our effort into? Is it in our domain to encode semantic information about a sequence, and possibly relating reported sequences as part of our schema? On a personal level I could care less if "isDecoy" is an attribute or not, but the temptation then would be for folks to encode the same accession for two different sequences, effectively making the primary key of the sequence object (accession, isDecoy) Do we want to go there? On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> wrote: So how about include length as an attribute and then let all other things go in the CV (pI, mass, etc.)? From: Jones, Andy Sent: 27 June 2008 14:54 To: 'David Creasy' Subject: RE: [Psidev-pi-dev] Representing Sequences id and name are standard for all elements that inherit from FuGE identifiable this is perhaps a separate discussion as to whether the optional name attribute should be there. I agree that length may be useful is this just an integer value with no unit? Yes, I think so. I'm less sure about pI and mass since mass at least can be calculated very simply Only if you have the sequence... (we have residue masses in the file). , and pI values (in my opinion) are pretty inaccurate and fairly meaningless Scandalous! (I happen to agree, but now some people will never speak to either of us ever again). The main problem with mass and pI is that these are 'irrelevant' if the sequence is nuleic acid rather than residues. Why not just allow CV there? We can share the same CV as the PEFF format, which includes, taxonomy, sequence type, gene ID, and lots of wonderful other things? unless someone can convince me otherwise? Cheers Andy From: David Creasy [mailto:dc...@ma...] Sent: 27 June 2008 14:51 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] Representing Sequences Hi Andy, length may be useful, because some people won't want to output the actual sequence for space reasons. The other things we wanted to add before were pI and mass. Why do we want name? Is this for, say, a description line? (Also, identifier -> id?) David Jones, Andy wrote: Hi all, It was decided on the call that we would like to flag that Sequences in the ConceptualMoleculeCollection should have a Boolean attribute to capture if they are decoy sequences. At the moment we are using the FuGE:Sequence element. I don't really want to add another attribute to this (it's less problematic cutting down FuGE than adding new things), so I'm wondering if we should define our own Sequence type in AnalysisXML. This would also allow us to choose exactly the relevant attributes. At the moment, Sequence can have all of the following: <pf:Sequence isCircular="true" sequence="String" length="0" isApproximateLength="true" SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" name="String"> Several of these attributes were created to represent concepts that probably will never be required or implemented in AnalysisXML. How about the following: <DBSequence identifier = "" name = "" isDecoy = "true"> <seq>MCTMG...</seq> <pf:DatabaseReference Database_ref="" accession="Rev_IPI00013808.1"/> </DBSequence> Are any of the other attributes on Sequence actually required? I'll post a new version of the schema with other changes WRT to PeptideEvidence shortly, Cheers Andy _____ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _____ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 _____ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _____ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |