From: Martin E. <mar...@ru...> - 2008-07-30 12:05:21
|
Hi Pierre-Alain, quite old posting, but I saw no answer yet, so I will try: >2nd July, 2008: >a couple of questions, just to make sure: >1) in case of top-down approach, do we have to duplicate sequenceCollection information? I hope not, by referencing the same identifier. >as SpectrumIdentificationResult contains a PeptideEvidence refering to a Peptide element >(and not to a DBSequence), identification is obligatory a Peptide? At the moment I think it's possible to directly reference a DBSeq. At the time the foreign key definitions are implemented we can forbid that. But we should have in mind, that a peptide is a sequence plus modifications, so if top-down identifies only a sequence, we should allow that and if top-down identifies with mods, we should forbid that. It would be quite helpful to have a top-down instance doc. To check whether our thoughts are really deep enough... >2) and what about spectral library searches, do we have to have Peptide >elements with possibly undefined explicit sequences to refer to >from the SpectrumIdentificationResult (because non peptidic, or because not identified >but good spectrum) At the moment the sequence element can be empty or even left out. User or CV params are allowed. How do they report results in spectral lib search if they identify non-peptidic or unidentified? We need CV terms for that... >3) in the Peptide element, the Modifications are defined in a much more >detailed manner than in ModificationParams (PSI-MOD is there for >instance). Does this simply mean that The ModificationParams codes >the search engine settings and the Peptide includes the formal PSI >definition of the Mod? And the only reference is the ModName value? I think that has changed meanwhile, in the MPC use case I used PSI-MOD terms for both. If a search engine has its "own" mods, we need CV for that in PSI-PI CV or they can define their own. >4) all mass values (sequenceMass, calculatedMassToCharge, experimentalMassToCharge, >are not specified whether monoisotopic or averaged. >Do we assume that averaged does not exist anymore? No, we decided to have only one type of masses in the whole analysisXML. But I cannot find a note for that or a schema attribute... I will add an issue for that. >5) is sequenceMass the mass value with/without the mods? If with, the >name might be missleading (peptideMass would be more appropriate) It is indeed the mass of the sequence without mods. THAT is described in http://code.google.com/p/psi-pi/wiki/NotesForFocumentation >6) in case the DBSequence is nucleotide, is there a tag for saying >this? (NB: MS on nucleotide molecules can be performed and analysed, >not only MS on AA sequences that are interpreting nucleotide sequences). >Or do we neglect MS experiments done on nucleotide molecules (and by >the way on glycans...) and only represent the DBSequences as AA >sequences (frame translations)? (and what about glycans?) >Probaly can be solved if one can replace SequenceCollection by >something else if needed (SmallMoleculeCollection, GlycanCollection, >MoleculeCollection)... but the validator might not like this. Mh, these can be extensions, I think they are not possible at the moment. But a tag for the type can indeed be useful, it could be a CV param. I will create an issue for that. >7) in case that DBSequence is nucleotide, do we represent the >Peptide as AA sequence in case of MS done on proteins? I hope the following answers this: <DBSequence> is the nucleotide seq from the nucleotide DB, <Peptide> is the identified amino acid sequence plus mods (without any translation frame or something). <PeptideEvidence> contains the DBSequence_Ref together with a frame and a TranslationTable_Ref attribute. (The Peptide_Ref is done in SpectrumIdentificationItem as in the amino acid DB case.) If a protein detection is performed, there are <PeptideHypothesis> elements referencing PeptideEvidence elements from SpectrumIdentificationItem sections. Bye Martin David Creasy wrote: Thanks Andy, I've added an updated example document to SVN: http://code.google.com/p/psi-pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F001350.xml Problem is that we have now removed the main point of these recent changes which was to add the decoy flag... I think that we need to add isDecoy to SpectrumIdentificationItem. And yes, I suspect that we should go back to using the ConceptualMoleculeCollection Um, and since we've not actually ended up adding anything to DBSequence... we haven't actually achieved anything? I think we need to discuss this again at the next telecon. David Jones, Andy wrote: Hi all, Ive updated the schema in SVN with the following main changes: PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the call (simple mappings to proteins are done at this level) Added DBSequence that should be used instead of Sequence (following some of the discussion below) Created a new collection class SequenceCollection (rather than ConceptualMoleculeCollection) so that only references can be given to DBSequence and Peptide In fact, Im not sure if this is sensible since it prevents other types of ConceptualMolecule being added later... to discuss In FuGE on cvParam, the value attribute is no longer mandatory Ive added a simple example that validates under examples\schema_usecase_examples\working27June Feel free to mail me any changes to make on Monday, Cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of Jones, Andy Sent: 27 June 2008 16:24 To: Angel Pizarro Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences I think Angels response below might not have made it round the list yet. I tend to agree that isDecoy is redundant information and perhaps this is not the best place to encode semantic information. An alternative would be to have a parameter, say on SpectrumIdentification for cvParam = decoy_string value = Rev. This would be a more compact representation and we would not have to add what is quite a specific attribute type (isDecoy) to Sequence. From: an...@it... [mailto:an...@it...] On Behalf Of Angel Pizarro Sent: 27 June 2008 15:59 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences my 2¢ : You need to be able to extend this to all molecule types, or am I missing the point of this thread, and you mean that this would be a suclass of the conceptual molecule element? Second, and this is is tangentially related, but are decoy sequences really a problem we should be putting our effort into? Is it in our domain to encode semantic information about a sequence, and possibly relating reported sequences as part of our schema? On a personal level I could care less if "isDecoy" is an attribute or not, but the temptation then would be for folks to encode the same accession for two different sequences, effectively making the primary key of the sequence object (accession, isDecoy) Do we want to go there? On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> wrote: So how about include length as an attribute and then let all other things go in the CV (pI, mass, etc.)? From: Jones, Andy Sent: 27 June 2008 14:54 To: 'David Creasy' Subject: RE: [Psidev-pi-dev] Representing Sequences id and name are standard for all elements that inherit from FuGE identifiable this is perhaps a separate discussion as to whether the optional name attribute should be there. I agree that length may be useful is this just an integer value with no unit? Yes, I think so. I'm less sure about pI and mass since mass at least can be calculated very simply Only if you have the sequence... (we have residue masses in the file). , and pI values (in my opinion) are pretty inaccurate and fairly meaningless Scandalous! (I happen to agree, but now some people will never speak to either of us ever again). The main problem with mass and pI is that these are 'irrelevant' if the sequence is nuleic acid rather than residues. Why not just allow CV there? We can share the same CV as the PEFF format, which includes, taxonomy, sequence type, gene ID, and lots of wonderful other things? unless someone can convince me otherwise? Cheers Andy From: David Creasy [mailto:dc...@ma...] Sent: 27 June 2008 14:51 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] Representing Sequences Hi Andy, length may be useful, because some people won't want to output the actual sequence for space reasons. The other things we wanted to add before were pI and mass. Why do we want name? Is this for, say, a description line? (Also, identifier -> id?) David Jones, Andy wrote: Hi all, It was decided on the call that we would like to flag that Sequences in the ConceptualMoleculeCollection should have a Boolean attribute to capture if they are decoy sequences. At the moment we are using the FuGE:Sequence element. I don't really want to add another attribute to this (it's less problematic cutting down FuGE than adding new things), so I'm wondering if we should define our own Sequence type in AnalysisXML. This would also allow us to choose exactly the relevant attributes. At the moment, Sequence can have all of the following: <pf:Sequence isCircular="true" sequence="String" length="0" isApproximateLength="true" SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" name="String"> Several of these attributes were created to represent concepts that probably will never be required or implemented in AnalysisXML. How about the following: <DBSequence identifier = "" name = "" isDecoy = "true"> <seq>MCTMG...</seq> <pf:DatabaseReference Database_ref="" accession="Rev_IPI00013808.1"/> </DBSequence> Are any of the other attributes on Sequence actually required? I'll post a new version of the schema with other changes WRT to PeptideEvidence shortly, Cheers Andy ________________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ________________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 ________________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ________________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev |