From: Martin E. <mar...@ru...> - 2008-07-31 14:25:46
|
> >But I might be wrong and we definitely have to wait for > > a top-down instance doc. > Sorry for the delay ;) I've put one here: > http://code.google.com/p/psi-pi/source/browse/#svn/trunk/examples/schema_usecase_examples/working31July > It's not so bad really. In the case of signal peptides, or leading > methionine (as in this example), the protein that was analysed may be > different from the sequence in the database, and there must be a way of > representing this. So you think it's okay like it is and no doubling. Or can I derive an issue from the "not so bad really" phrase ;-) > >> I notice also that there is a small error in the schema in that on PeptideEvidence DBSequence_ref should be > >> mandatory (and it is missing from the instance docs). I can fix this if there is agreement on this? > > Yes, if <PeptideEvidence> stays optional. > What about denovo where there is no database... That is an argument to have PeptideEvidence optional, isn't it? But DBSequence_ref as attribute of it should be mandatory. > >> It is a database search parameter: > >> <AdditionalSearchParams> > >> <pf:cvParam accession="PRIDE:0000162" name="Mass value type setting monoisotopic" cvRef="PRIDE"/> > > Yes, it is, but in case we have more than one SpectrumIdentification, that could be conflicting. > > http://code.google.com/p/psi-pi/issues/detail?id=37 > > I'm not sure I understand whether this is OK or not now? (And why use > Pride CV?) I think the current schema is not okay, because it allows "average" in one SpecIdent and "mono" in another, so it is not well-defined for the masses in elements or attributes. We need a global attribute :-) or element. Or it can be done later in semantic validation :-( . Bye Martin > >>> -----Original Message----- > >>> From: psi...@li... [mailto:psidev-pi-dev- > >>> bo...@li...] On Behalf Of Martin Eisenacher > >>> Sent: 30 July 2008 13:05 > >>> To: 'Pierre-Alain Binz' > >>> Cc: psi...@li... > >>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > >>> > >>> Hi Pierre-Alain, quite old posting, but I saw no answer yet, so I will try: > >>> > >>>> 2nd July, 2008: > >>>> a couple of questions, just to make sure: > >>>> 1) in case of top-down approach, do we have to duplicate sequenceCollection > >>> information? > >>> I hope not, by referencing the same identifier. > >>> > >>>> as SpectrumIdentificationResult contains a PeptideEvidence refering to a Peptide > >>> element > >>>> (and not to a DBSequence), identification is obligatory a Peptide? > >>> At the moment I think it's possible to directly reference a DBSeq. At the time the > >>> foreign key definitions are implemented we can forbid that. > >>> But we should have in mind, that a peptide is a sequence plus modifications, so if > >>> top-down > >>> identifies only a sequence, we should allow that and if top-down identifies with > >>> mods, > >>> we should forbid that. > >>> It would be quite helpful to have a top-down instance doc. To check > >>> whether our thoughts are really deep enough... > >>> > >>>> 2) and what about spectral library searches, do we have to have Peptide > >>>> elements with possibly undefined explicit sequences to refer to > >>> >from the SpectrumIdentificationResult (because non peptidic, or because not > >>> identified > >>>> but good spectrum) > >>> At the moment the sequence element can be empty or even left out. > >>> User or CV params are allowed. > >>> How do they report results in spectral lib search if they identify non-peptidic or > >>> unidentified? > >>> We need CV terms for that... > >>> > >>>> 3) in the Peptide element, the Modifications are defined in a much more > >>>> detailed manner than in ModificationParams (PSI-MOD is there for > >>>> instance). Does this simply mean that The ModificationParams codes > >>>> the search engine settings and the Peptide includes the formal PSI > >>>> definition of the Mod? And the only reference is the ModName value? > >>> I think that has changed meanwhile, in the MPC use case I used PSI-MOD terms > >>> for both. If a search engine has its "own" mods, we need CV for that in PSI-PI CV > >>> or > >>> they can define their own. > >>> > >>>> 4) all mass values (sequenceMass, calculatedMassToCharge, > >>> experimentalMassToCharge, > >>>> are not specified whether monoisotopic or averaged. > >>>> Do we assume that averaged does not exist anymore? > >>> No, we decided to have only one type of masses in the whole analysisXML. > >>> But I cannot find a note for that or a schema attribute... I will add an issue for that. > >>> > >>> > >>>> 5) is sequenceMass the mass value with/without the mods? If with, the > >>>> name might be missleading (peptideMass would be more appropriate) > >>> It is indeed the mass of the sequence without mods. > >>> THAT is described in http://code.google.com/p/psi-pi/wiki/NotesForFocumentation > >>> > >>>> 6) in case the DBSequence is nucleotide, is there a tag for saying > >>>> this? (NB: MS on nucleotide molecules can be performed and analysed, > >>>> not only MS on AA sequences that are interpreting nucleotide sequences). > >>>> Or do we neglect MS experiments done on nucleotide molecules (and by > >>>> the way on glycans...) and only represent the DBSequences as AA > >>>> sequences (frame translations)? (and what about glycans?) > >>>> Probaly can be solved if one can replace SequenceCollection by > >>>> something else if needed (SmallMoleculeCollection, GlycanCollection, > >>>> MoleculeCollection)... but the validator might not like this. > >>> Mh, these can be extensions, I think they are not possible at the moment. > >>> But a tag for the type can indeed be useful, it could be a CV param. > >>> I will create an issue for that. > >>> > >>>> 7) in case that DBSequence is nucleotide, do we represent the > >>>> Peptide as AA sequence in case of MS done on proteins? > >>> I hope the following answers this: > >>> > >>> <DBSequence> is the nucleotide seq from the nucleotide DB, > >>> <Peptide> is the identified amino acid sequence plus mods (without any translation > >>> frame or something). > >>> <PeptideEvidence> contains the DBSequence_Ref together with a frame and a > >>> TranslationTable_Ref attribute. > >>> (The Peptide_Ref is done in SpectrumIdentificationItem as in the amino acid DB > >>> case.) > >>> If a protein detection is performed, there are <PeptideHypothesis> elements > >>> referencing > >>> PeptideEvidence elements from SpectrumIdentificationItem sections. > >>> > >>> > >>> > >>> Bye > >>> Martin > >>> > >>> > >>> > >>> > >>> David Creasy wrote: > >>> Thanks Andy, > >>> > >>> I've added an updated example document to SVN: > >>> http://code.google.com/p/psi- > >>> pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F00 > >>> 1350.xml > >>> > >>> Problem is that we have now removed the main point of these recent changes > >>> which was to add the decoy flag... I think > >>> that we need to add isDecoy to SpectrumIdentificationItem. > >>> > >>> And yes, I suspect that we should go back to using the > >>> ConceptualMoleculeCollection > >>> Um, and since we've not actually ended up adding anything to DBSequence... we > >>> haven't actually achieved anything? > >>> I think we need to discuss this again at the next telecon. > >>> > >>> David > >>> > >>> Jones, Andy wrote: > >>> Hi all, > >>> > >>> Ive updated the schema in SVN with the following main changes: > >>> > >>> PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the > >>> call (simple mappings to proteins are done > >>> at this level) > >>> Added DBSequence that should be used instead of Sequence (following some of > >>> the discussion below) > >>> Created a new collection class SequenceCollection (rather than > >>> ConceptualMoleculeCollection) so that only references can > >>> be given to DBSequence and Peptide > >>> In fact, Im not sure if this is sensible since it prevents other types of > >>> ConceptualMolecule being added later... to > >>> discuss > >>> In FuGE on cvParam, the value attribute is no longer mandatory > >>> > >>> Ive added a simple example that validates under > >>> examples\schema_usecase_examples\working27June > >>> > >>> Feel free to mail me any changes to make on Monday, > >>> Cheers > >>> Andy > >>> > >>> > >>> > >>> From: psi...@li... [mailto:psidev-pi-dev- > >>> bo...@li...] On Behalf Of > >>> Jones, Andy > >>> Sent: 27 June 2008 16:24 > >>> To: Angel Pizarro > >>> Cc: psi...@li... > >>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > >>> > >>> I think Angels response below might not have made it round the list yet. > >>> > >>> I tend to agree that isDecoy is redundant information and perhaps this is not the > >>> best place to encode semantic > >>> information. An alternative would be to have a parameter, say on > >>> SpectrumIdentification for cvParam = decoy_string > >>> value = Rev. This would be a more compact representation and we would not > >>> have to add what is quite a specific > >>> attribute type (isDecoy) to Sequence. > >>> > >>> > >>> > >>> From: an...@it... [mailto:an...@it...] On Behalf Of Angel > >>> Pizarro > >>> Sent: 27 June 2008 15:59 > >>> To: Jones, Andy > >>> Cc: psi...@li... > >>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > >>> > >>> my 2¢ : > >>> You need to be able to extend this to all molecule types, or am I missing the point > >>> of this thread, and you mean that > >>> this would be a suclass of the conceptual molecule element? > >>> > >>> Second, and this is is tangentially related, but are decoy sequences really a > >>> problem we should be putting our effort > >>> into? Is it in our domain to encode semantic information about a sequence, and > >>> possibly relating reported sequences as > >>> part of our schema? > >>> On a personal level I could care less if "isDecoy" is an attribute or not, but the > >>> temptation then would be for folks to > >>> encode the same accession for two different sequences, effectively making the > >>> primary key of the sequence object > >>> (accession, isDecoy) > >>> > >>> > >>> Do we want to go there? > >>> On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> > >>> wrote: > >>> So how about include length as an attribute and then let all other things go in the > >>> CV (pI, mass, etc.)? > >>> > >>> > >>> > >>> From: Jones, Andy > >>> Sent: 27 June 2008 14:54 > >>> To: 'David Creasy' > >>> Subject: RE: [Psidev-pi-dev] Representing Sequences > >>> > >>> id and name are standard for all elements that inherit from FuGE identifiable this > >>> is perhaps a separate discussion as > >>> to whether the optional name attribute should be there. > >>> > >>> I agree that length may be useful is this just an integer value with no unit? > >>> Yes, I think so. > >>> I'm less sure about pI and mass since mass at least can be calculated very simply > >>> Only if you have the sequence... (we have residue masses in the file). > >>> > >>> > >>> , and pI values (in my opinion) are pretty inaccurate and fairly meaningless > >>> Scandalous! (I happen to agree, but now some people will never speak to either of > >>> us ever again). > >>> > >>> The main problem with mass and pI is that these are 'irrelevant' if the sequence is > >>> nuleic acid rather than residues. > >>> Why not just allow CV there? We can share the same CV as the PEFF format, > >>> which includes, taxonomy, sequence type, gene > >>> ID, and lots of wonderful other things? > >>> > >>> > >>> unless someone can convince me otherwise? > >>> Cheers > >>> Andy > >>> > >>> > >>> From: David Creasy [mailto:dc...@ma...] > >>> Sent: 27 June 2008 14:51 > >>> To: Jones, Andy > >>> Cc: psi...@li... > >>> Subject: Re: [Psidev-pi-dev] Representing Sequences > >>> > >>> Hi Andy, > >>> > >>> length may be useful, because some people won't want to output the actual > >>> sequence for space reasons. The other things > >>> we wanted to add before were pI and mass. > >>> Why do we want name? Is this for, say, a description line? > >>> (Also, identifier -> id?) > >>> > >>> David > >>> > >>> Jones, Andy wrote: > >>> Hi all, > >>> > >>> It was decided on the call that we would like to flag that Sequences in the > >>> ConceptualMoleculeCollection should have a > >>> Boolean attribute to capture if they are decoy sequences. At the moment we are > >>> using the FuGE:Sequence element. I don't > >>> really want to add another attribute to this (it's less problematic cutting down FuGE > >>> than adding new things), so I'm > >>> wondering if we should define our own Sequence type in AnalysisXML. This > >>> would also allow us to choose exactly the > >>> relevant attributes. At the moment, Sequence can have all of the following: > >>> > >>> <pf:Sequence isCircular="true" sequence="String" length="0" > >>> isApproximateLength="true" > >>> SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" > >>> name="String"> > >>> > >>> Several of these attributes were created to represent concepts that probably will > >>> never be required or implemented in > >>> AnalysisXML. How about the following: > >>> > >>> <DBSequence identifier = "" name = "" isDecoy = "true"> > >>> <seq>MCTMG...</seq> > >>> <pf:DatabaseReference Database_ref="" > >>> accession="Rev_IPI00013808.1"/> > >>> </DBSequence> > >>> > >>> Are any of the other attributes on Sequence actually required? I'll post a new > >>> version of the schema with other changes > >>> WRT to PeptideEvidence shortly, > >>> Cheers > >>> Andy > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> ________________________________________ > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> ------------------------------------------------------------------------- > >>> Check out the new SourceForge.net Marketplace. > >>> It's the best place to buy or sell services for > >>> just about anything Open Source. > >>> http://sourceforge.net/services/buy/index.php > >>> > >>> > >>> > >>> > >>> > >>> > >>> ________________________________________ > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Psidev-pi-dev mailing list > >>> Psi...@li... > >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>> > >>> > >>> -- > >>> David Creasy > >>> Matrix Science > >>> 64 Baker Street > >>> London W1U 7GB, UK > >>> Tel: +44 (0)20 7486 1050 > >>> Fax: +44 (0)20 7224 1344 > >>> > >>> dc...@ma... > >>> http://www.matrixscience.com > >>> > >>> Matrix Science Ltd. is registered in England and Wales > >>> Company number 3533898 > >>> > >>> > >>> > >>> ________________________________________ > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> ------------------------------------------------------------------------- > >>> Check out the new SourceForge.net Marketplace. > >>> It's the best place to buy or sell services for > >>> just about anything Open Source. > >>> http://sourceforge.net/services/buy/index.php > >>> > >>> > >>> > >>> > >>> ________________________________________ > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Psidev-pi-dev mailing list > >>> Psi...@li... > >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>> > >>> > >>> -- > >>> David Creasy > >>> Matrix Science > >>> 64 Baker Street > >>> London W1U 7GB, UK > >>> Tel: +44 (0)20 7486 1050 > >>> Fax: +44 (0)20 7224 1344 > >>> > >>> dc...@ma... > >>> http://www.matrixscience.com > >>> > >>> Matrix Science Ltd. is registered in England and Wales > >>> Company number 3533898 > >>> > >>> ------------------------------------------------------------------------- > >>> Check out the new SourceForge.net Marketplace. > >>> It's the best place to buy or sell services for > >>> just about anything Open Source. > >>> http://sourceforge.net/services/buy/index.php > >>> _______________________________________________ > >>> Psidev-pi-dev mailing list > >>> Psi...@li... > >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>> > >>> > >>> > >>> -- > >>> Angel Pizarro > >>> Director, ITMAT Bioinformatics Facility > >>> 806 Biological Research Building > >>> 421 Curie Blvd. > >>> Philadelphia, PA 19104-6160 > >>> 215-573-3736 > >>> ________________________________________ > >>> > >>> ------------------------------------------------------------------------- > >>> Check out the new SourceForge.net Marketplace. > >>> It's the best place to buy or sell services for > >>> just about anything Open Source. > >>> http://sourceforge.net/services/buy/index.php > >>> ________________________________________ > >>> > >>> _______________________________________________ > >>> Psidev-pi-dev mailing list > >>> Psi...@li... > >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>> > >>> > >>> > >>> -- > >>> David Creasy > >>> Matrix Science > >>> 64 Baker Street > >>> London W1U 7GB, UK > >>> Tel: +44 (0)20 7486 1050 > >>> Fax: +44 (0)20 7224 1344 > >>> > >>> dc...@ma... > >>> http://www.matrixscience.com > >>> > >>> Matrix Science Ltd. is registered in England and Wales > >>> Company number 3533898 > >>> > >>> ________________________________________ > >>> > >>> ------------------------------------------------------------------------- > >>> Check out the new SourceForge.net Marketplace. > >>> It's the best place to buy or sell services for > >>> just about anything Open Source. > >>> http://sourceforge.net/services/buy/index.php > >>> > >>> ________________________________________ > >>> > >>> _______________________________________________ > >>> Psidev-pi-dev mailing list > >>> Psi...@li... > >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>> > >>> > >>> > >>> ------------------------------------------------------------------------- > >>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > >>> Build the coolest Linux based applications with Moblin SDK & win great prizes > >>> Grand prize is a trip for two to an Open Source event anywhere in the world > >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ > >>> _______________________________________________ > >>> Psidev-pi-dev mailing list > >>> Psi...@li... > >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > > > ------------------------------------------------------------------------- > > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > > Build the coolest Linux based applications with Moblin SDK & win great prizes > > Grand prize is a trip for two to an Open Source event anywhere in the world > > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > _______________________________________________ > > Psidev-pi-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > -- > David Creasy > Matrix Science > 64 Baker Street > London W1U 7GB, UK > Tel: +44 (0)20 7486 1050 > Fax: +44 (0)20 7224 1344 > > dc...@ma... > http://www.matrixscience.com > > Matrix Science Ltd. is registered in England and Wales > Company number 3533898 |