From: David C. <dc...@ma...> - 2008-07-31 13:50:22
|
Hi, Martin Eisenacher wrote: > Hi Andy, hi all, > >> As I see it SpectrumIdentificationItem is intended only for identifying Peptides. I didn't fully understand > Yes, I agree; but I understood Pierre-Alains question as a hint, that top-down > identifies protein sequences, so we would have to double information, referencing a protein sequence > as <Peptide> from <SpectrumIdentificationItem> and then the same sequence as <DBSequence> from > <ProteinDetectionResult>. But I might be wrong and we definitely have to wait for > a top-down instance doc. Sorry for the delay ;) I've put one here: http://code.google.com/p/psi-pi/source/browse/#svn/trunk/examples/schema_usecase_examples/working31July It's not so bad really. In the case of signal peptides, or leading methionine (as in this example), the protein that was analysed may be different from the sequence in the database, and there must be a way of representing this. > > >> Looking at it again, the model of SpectrumIdentificationItem is a little hard to understand and we could >> probably improve it. This is because SpectrumIdentificationItem has both Peptide_ref (i.e. a reference to a >> Peptide sequence and its mods) plus PeptideEvidence which is a reference to the part of the ProteinSequence >> this Peptide was derived from. The PeptideEvidence lines could be shifted up to <Peptide> and renamed e.g. >> SourceProtein - this would save some space and would appear to be a logically more sensible model... > You mean shifting <PeptideEvidence> under <Peptide> in the SequenceCollection? But missedcleavages > is only well-defined in relation to a search (using an enzyme)! > > >> I notice also that there is a small error in the schema in that on PeptideEvidence DBSequence_ref should be >> mandatory (and it is missing from the instance docs). I can fix this if there is agreement on this? > Yes, if <PeptideEvidence> stays optional. What about denovo where there is no database... > > >>>> 4) all mass values (sequenceMass, calculatedMassToCharge, >>> experimentalMassToCharge, >>>> are not specified whether monoisotopic or averaged. >>>> Do we assume that averaged does not exist anymore? >>> No, we decided to have only one type of masses in the whole analysisXML. >>> But I cannot find a note for that or a schema attribute... I will add an issue for that. >> It is a database search parameter: >> <AdditionalSearchParams> >> <pf:cvParam accession="PRIDE:0000162" name="Mass value type setting monoisotopic" cvRef="PRIDE"/> > > Yes, it is, but in case we have more than one SpectrumIdentification, that could be conflicting. > http://code.google.com/p/psi-pi/issues/detail?id=37 I'm not sure I understand whether this is OK or not now? (And why use Pride CV?) David > > > bye > Martin > > >>> -----Original Message----- >>> From: psi...@li... [mailto:psidev-pi-dev- >>> bo...@li...] On Behalf Of Martin Eisenacher >>> Sent: 30 July 2008 13:05 >>> To: 'Pierre-Alain Binz' >>> Cc: psi...@li... >>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences >>> >>> Hi Pierre-Alain, quite old posting, but I saw no answer yet, so I will try: >>> >>>> 2nd July, 2008: >>>> a couple of questions, just to make sure: >>>> 1) in case of top-down approach, do we have to duplicate sequenceCollection >>> information? >>> I hope not, by referencing the same identifier. >>> >>>> as SpectrumIdentificationResult contains a PeptideEvidence refering to a Peptide >>> element >>>> (and not to a DBSequence), identification is obligatory a Peptide? >>> At the moment I think it's possible to directly reference a DBSeq. At the time the >>> foreign key definitions are implemented we can forbid that. >>> But we should have in mind, that a peptide is a sequence plus modifications, so if >>> top-down >>> identifies only a sequence, we should allow that and if top-down identifies with >>> mods, >>> we should forbid that. >>> It would be quite helpful to have a top-down instance doc. To check >>> whether our thoughts are really deep enough... >>> >>>> 2) and what about spectral library searches, do we have to have Peptide >>>> elements with possibly undefined explicit sequences to refer to >>> >from the SpectrumIdentificationResult (because non peptidic, or because not >>> identified >>>> but good spectrum) >>> At the moment the sequence element can be empty or even left out. >>> User or CV params are allowed. >>> How do they report results in spectral lib search if they identify non-peptidic or >>> unidentified? >>> We need CV terms for that... >>> >>>> 3) in the Peptide element, the Modifications are defined in a much more >>>> detailed manner than in ModificationParams (PSI-MOD is there for >>>> instance). Does this simply mean that The ModificationParams codes >>>> the search engine settings and the Peptide includes the formal PSI >>>> definition of the Mod? And the only reference is the ModName value? >>> I think that has changed meanwhile, in the MPC use case I used PSI-MOD terms >>> for both. If a search engine has its "own" mods, we need CV for that in PSI-PI CV >>> or >>> they can define their own. >>> >>>> 4) all mass values (sequenceMass, calculatedMassToCharge, >>> experimentalMassToCharge, >>>> are not specified whether monoisotopic or averaged. >>>> Do we assume that averaged does not exist anymore? >>> No, we decided to have only one type of masses in the whole analysisXML. >>> But I cannot find a note for that or a schema attribute... I will add an issue for that. >>> >>> >>>> 5) is sequenceMass the mass value with/without the mods? If with, the >>>> name might be missleading (peptideMass would be more appropriate) >>> It is indeed the mass of the sequence without mods. >>> THAT is described in http://code.google.com/p/psi-pi/wiki/NotesForFocumentation >>> >>>> 6) in case the DBSequence is nucleotide, is there a tag for saying >>>> this? (NB: MS on nucleotide molecules can be performed and analysed, >>>> not only MS on AA sequences that are interpreting nucleotide sequences). >>>> Or do we neglect MS experiments done on nucleotide molecules (and by >>>> the way on glycans...) and only represent the DBSequences as AA >>>> sequences (frame translations)? (and what about glycans?) >>>> Probaly can be solved if one can replace SequenceCollection by >>>> something else if needed (SmallMoleculeCollection, GlycanCollection, >>>> MoleculeCollection)... but the validator might not like this. >>> Mh, these can be extensions, I think they are not possible at the moment. >>> But a tag for the type can indeed be useful, it could be a CV param. >>> I will create an issue for that. >>> >>>> 7) in case that DBSequence is nucleotide, do we represent the >>>> Peptide as AA sequence in case of MS done on proteins? >>> I hope the following answers this: >>> >>> <DBSequence> is the nucleotide seq from the nucleotide DB, >>> <Peptide> is the identified amino acid sequence plus mods (without any translation >>> frame or something). >>> <PeptideEvidence> contains the DBSequence_Ref together with a frame and a >>> TranslationTable_Ref attribute. >>> (The Peptide_Ref is done in SpectrumIdentificationItem as in the amino acid DB >>> case.) >>> If a protein detection is performed, there are <PeptideHypothesis> elements >>> referencing >>> PeptideEvidence elements from SpectrumIdentificationItem sections. >>> >>> >>> >>> Bye >>> Martin >>> >>> >>> >>> >>> David Creasy wrote: >>> Thanks Andy, >>> >>> I've added an updated example document to SVN: >>> http://code.google.com/p/psi- >>> pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F00 >>> 1350.xml >>> >>> Problem is that we have now removed the main point of these recent changes >>> which was to add the decoy flag... I think >>> that we need to add isDecoy to SpectrumIdentificationItem. >>> >>> And yes, I suspect that we should go back to using the >>> ConceptualMoleculeCollection >>> Um, and since we've not actually ended up adding anything to DBSequence... we >>> haven't actually achieved anything? >>> I think we need to discuss this again at the next telecon. >>> >>> David >>> >>> Jones, Andy wrote: >>> Hi all, >>> >>> I’ve updated the schema in SVN with the following main changes: >>> >>> PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the >>> call (simple mappings to proteins are done >>> at this level) >>> Added DBSequence that should be used instead of Sequence (following some of >>> the discussion below) >>> Created a new collection class SequenceCollection (rather than >>> ConceptualMoleculeCollection) so that only references can >>> be given to DBSequence and Peptide >>> In fact, I’m not sure if this is sensible since it prevents other types of >>> ConceptualMolecule being added later... to >>> discuss >>> In FuGE on cvParam, the value attribute is no longer mandatory >>> >>> I’ve added a simple example that validates under >>> examples\schema_usecase_examples\working27June >>> >>> Feel free to mail me any changes to make on Monday, >>> Cheers >>> Andy >>> >>> >>> >>> From: psi...@li... [mailto:psidev-pi-dev- >>> bo...@li...] On Behalf Of >>> Jones, Andy >>> Sent: 27 June 2008 16:24 >>> To: Angel Pizarro >>> Cc: psi...@li... >>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences >>> >>> I think Angel’s response below might not have made it round the list yet. >>> >>> I tend to agree that isDecoy is redundant information and perhaps this is not the >>> best place to encode semantic >>> information. An alternative would be to have a parameter, say on >>> SpectrumIdentification for cvParam = “decoy_string” >>> value = “Rev”. This would be a more compact representation and we would not >>> have to add what is quite a specific >>> attribute type (isDecoy) to Sequence. >>> >>> >>> >>> From: an...@it... [mailto:an...@it...] On Behalf Of Angel >>> Pizarro >>> Sent: 27 June 2008 15:59 >>> To: Jones, Andy >>> Cc: psi...@li... >>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences >>> >>> my 2¢ : >>> You need to be able to extend this to all molecule types, or am I missing the point >>> of this thread, and you mean that >>> this would be a suclass of the conceptual molecule element? >>> >>> Second, and this is is tangentially related, but are decoy sequences really a >>> problem we should be putting our effort >>> into? Is it in our domain to encode semantic information about a sequence, and >>> possibly relating reported sequences as >>> part of our schema? >>> On a personal level I could care less if "isDecoy" is an attribute or not, but the >>> temptation then would be for folks to >>> encode the same accession for two different sequences, effectively making the >>> primary key of the sequence object >>> (accession, isDecoy) >>> >>> >>> Do we want to go there? >>> On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> >>> wrote: >>> So how about include length as an attribute and then let all other things go in the >>> CV (pI, mass, etc.)? >>> >>> >>> >>> From: Jones, Andy >>> Sent: 27 June 2008 14:54 >>> To: 'David Creasy' >>> Subject: RE: [Psidev-pi-dev] Representing Sequences >>> >>> id and name are standard for all elements that inherit from FuGE identifiable – this >>> is perhaps a separate discussion as >>> to whether the optional name attribute should be there. >>> >>> I agree that length may be useful – is this just an integer value with no unit? >>> Yes, I think so. >>> I'm less sure about pI and mass since mass at least can be calculated very simply >>> Only if you have the sequence... (we have residue masses in the file). >>> >>> >>> , and pI values (in my opinion) are pretty inaccurate and fairly meaningless >>> Scandalous! (I happen to agree, but now some people will never speak to either of >>> us ever again). >>> >>> The main problem with mass and pI is that these are 'irrelevant' if the sequence is >>> nuleic acid rather than residues. >>> Why not just allow CV there? We can share the same CV as the PEFF format, >>> which includes, taxonomy, sequence type, gene >>> ID, and lots of wonderful other things? >>> >>> >>> – unless someone can convince me otherwise? >>> Cheers >>> Andy >>> >>> >>> From: David Creasy [mailto:dc...@ma...] >>> Sent: 27 June 2008 14:51 >>> To: Jones, Andy >>> Cc: psi...@li... >>> Subject: Re: [Psidev-pi-dev] Representing Sequences >>> >>> Hi Andy, >>> >>> length may be useful, because some people won't want to output the actual >>> sequence for space reasons. The other things >>> we wanted to add before were pI and mass. >>> Why do we want name? Is this for, say, a description line? >>> (Also, identifier -> id?) >>> >>> David >>> >>> Jones, Andy wrote: >>> Hi all, >>> >>> It was decided on the call that we would like to flag that Sequences in the >>> ConceptualMoleculeCollection should have a >>> Boolean attribute to capture if they are decoy sequences. At the moment we are >>> using the FuGE:Sequence element. I don't >>> really want to add another attribute to this (it's less problematic cutting down FuGE >>> than adding new things), so I'm >>> wondering if we should define our own Sequence type in AnalysisXML. This >>> would also allow us to choose exactly the >>> relevant attributes. At the moment, Sequence can have all of the following: >>> >>> <pf:Sequence isCircular="true" sequence="String" length="0" >>> isApproximateLength="true" >>> SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" >>> name="String"> >>> >>> Several of these attributes were created to represent concepts that probably will >>> never be required or implemented in >>> AnalysisXML. How about the following: >>> >>> <DBSequence identifier = "" name = "" isDecoy = "true"> >>> <seq>MCTMG...</seq> >>> <pf:DatabaseReference Database_ref="" >>> accession="Rev_IPI00013808.1"/> >>> </DBSequence> >>> >>> Are any of the other attributes on Sequence actually required? I'll post a new >>> version of the schema with other changes >>> WRT to PeptideEvidence shortly, >>> Cheers >>> Andy >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ________________________________________ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://sourceforge.net/services/buy/index.php >>> >>> >>> >>> >>> >>> >>> ________________________________________ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> >>> -- >>> David Creasy >>> Matrix Science >>> 64 Baker Street >>> London W1U 7GB, UK >>> Tel: +44 (0)20 7486 1050 >>> Fax: +44 (0)20 7224 1344 >>> >>> dc...@ma... >>> http://www.matrixscience.com >>> >>> Matrix Science Ltd. is registered in England and Wales >>> Company number 3533898 >>> >>> >>> >>> ________________________________________ >>> >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://sourceforge.net/services/buy/index.php >>> >>> >>> >>> >>> ________________________________________ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> >>> -- >>> David Creasy >>> Matrix Science >>> 64 Baker Street >>> London W1U 7GB, UK >>> Tel: +44 (0)20 7486 1050 >>> Fax: +44 (0)20 7224 1344 >>> >>> dc...@ma... >>> http://www.matrixscience.com >>> >>> Matrix Science Ltd. is registered in England and Wales >>> Company number 3533898 >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://sourceforge.net/services/buy/index.php >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> >>> >>> -- >>> Angel Pizarro >>> Director, ITMAT Bioinformatics Facility >>> 806 Biological Research Building >>> 421 Curie Blvd. >>> Philadelphia, PA 19104-6160 >>> 215-573-3736 >>> ________________________________________ >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://sourceforge.net/services/buy/index.php >>> ________________________________________ >>> >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> >>> >>> -- >>> David Creasy >>> Matrix Science >>> 64 Baker Street >>> London W1U 7GB, UK >>> Tel: +44 (0)20 7486 1050 >>> Fax: +44 (0)20 7224 1344 >>> >>> dc...@ma... >>> http://www.matrixscience.com >>> >>> Matrix Science Ltd. is registered in England and Wales >>> Company number 3533898 >>> >>> ________________________________________ >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://sourceforge.net/services/buy/index.php >>> >>> ________________________________________ >>> >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> >>> >>> ------------------------------------------------------------------------- >>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >>> Build the coolest Linux based applications with Moblin SDK & win great prizes >>> Grand prize is a trip for two to an Open Source event anywhere in the world >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |