From: Martin E. <mar...@ru...> - 2008-08-07 14:17:30
|
> >>>> I notice also that there is a small error in the schema in that on PeptideEvidence DBSequence_ref > should be > >>>> mandatory (and it is missing from the instance docs). I can fix this if there is agreement on this? > >>> Yes, if <PeptideEvidence> stays optional. > >> What about denovo where there is no database... > > That is an argument to have PeptideEvidence optional, isn't it? > > But DBSequence_ref as attribute of it should be mandatory. > Doh, sorry, yes you are totally correct. It should be mandatory. Now it is. ANd next problem: WE have two "Sequence_Ref" attributes, in <PeptideEvidence> and <ProteinHypothesis> (now both mandatory). What if they are contradictory (validator?)? If they are not contradictory, at least the one in <ProteinHypothesis> is redundant. > >(and it is missing from the instance docs). > I believe it's in all the Mascot ones? It is. > >>>> It is a database search parameter: > >>>> <AdditionalSearchParams> > >>>> <pf:cvParam accession="PRIDE:0000162" name="Mass value type setting monoisotopic" cvRef="PRIDE"/> > >>> Yes, it is, but in case we have more than one SpectrumIdentification, that could be conflicting. > >>> http://code.google.com/p/psi-pi/issues/detail?id=37 > >> I'm not sure I understand whether this is OK or not now? (And why use > >> Pride CV?) > > I think the current schema is not okay, because it allows "average" in one SpecIdent and "mono" in > another, > > so it is not well-defined for the masses in elements or attributes. > > We need a global attribute :-) or element. Or it can be done later in semantic validation :-( . > I think it's actually _required_ to be like this. For example, at least > one search engine allows you to specify mono for masses below x and > average for masses above x. So, in this case, the output should be > similar to the N15 example that I've supplied, with two separate mass > tables. Maybe you could look at the Mascot_N15_example.xml and see if > you think that this is OK. It is okay with me; to answer Pierre-Alains original question then: all mass values for peptides then depend on the type of search performed and the residue table used. ;-) Bye Martin > Talk soon, > > David > > > > > Bye > > Martin > > > > > >>>>> -----Original Message----- > >>>>> From: psi...@li... [mailto:psidev-pi-dev- > >>>>> bo...@li...] On Behalf Of Martin Eisenacher > >>>>> Sent: 30 July 2008 13:05 > >>>>> To: 'Pierre-Alain Binz' > >>>>> Cc: psi...@li... > >>>>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > >>>>> > >>>>> Hi Pierre-Alain, quite old posting, but I saw no answer yet, so I will try: > >>>>> > >>>>>> 2nd July, 2008: > >>>>>> a couple of questions, just to make sure: > >>>>>> 1) in case of top-down approach, do we have to duplicate sequenceCollection > >>>>> information? > >>>>> I hope not, by referencing the same identifier. > >>>>> > >>>>>> as SpectrumIdentificationResult contains a PeptideEvidence refering to a Peptide > >>>>> element > >>>>>> (and not to a DBSequence), identification is obligatory a Peptide? > >>>>> At the moment I think it's possible to directly reference a DBSeq. At the time the > >>>>> foreign key definitions are implemented we can forbid that. > >>>>> But we should have in mind, that a peptide is a sequence plus modifications, so if > >>>>> top-down > >>>>> identifies only a sequence, we should allow that and if top-down identifies with > >>>>> mods, > >>>>> we should forbid that. > >>>>> It would be quite helpful to have a top-down instance doc. To check > >>>>> whether our thoughts are really deep enough... > >>>>> > >>>>>> 2) and what about spectral library searches, do we have to have Peptide > >>>>>> elements with possibly undefined explicit sequences to refer to > >>>>> >from the SpectrumIdentificationResult (because non peptidic, or because not > >>>>> identified > >>>>>> but good spectrum) > >>>>> At the moment the sequence element can be empty or even left out. > >>>>> User or CV params are allowed. > >>>>> How do they report results in spectral lib search if they identify non-peptidic or > >>>>> unidentified? > >>>>> We need CV terms for that... > >>>>> > >>>>>> 3) in the Peptide element, the Modifications are defined in a much more > >>>>>> detailed manner than in ModificationParams (PSI-MOD is there for > >>>>>> instance). Does this simply mean that The ModificationParams codes > >>>>>> the search engine settings and the Peptide includes the formal PSI > >>>>>> definition of the Mod? And the only reference is the ModName value? > >>>>> I think that has changed meanwhile, in the MPC use case I used PSI-MOD terms > >>>>> for both. If a search engine has its "own" mods, we need CV for that in PSI-PI CV > >>>>> or > >>>>> they can define their own. > >>>>> > >>>>>> 4) all mass values (sequenceMass, calculatedMassToCharge, > >>>>> experimentalMassToCharge, > >>>>>> are not specified whether monoisotopic or averaged. > >>>>>> Do we assume that averaged does not exist anymore? > >>>>> No, we decided to have only one type of masses in the whole analysisXML. > >>>>> But I cannot find a note for that or a schema attribute... I will add an issue for that. > >>>>> > >>>>> > >>>>>> 5) is sequenceMass the mass value with/without the mods? If with, the > >>>>>> name might be missleading (peptideMass would be more appropriate) > >>>>> It is indeed the mass of the sequence without mods. > >>>>> THAT is described in http://code.google.com/p/psi-pi/wiki/NotesForFocumentation > >>>>> > >>>>>> 6) in case the DBSequence is nucleotide, is there a tag for saying > >>>>>> this? (NB: MS on nucleotide molecules can be performed and analysed, > >>>>>> not only MS on AA sequences that are interpreting nucleotide sequences). > >>>>>> Or do we neglect MS experiments done on nucleotide molecules (and by > >>>>>> the way on glycans...) and only represent the DBSequences as AA > >>>>>> sequences (frame translations)? (and what about glycans?) > >>>>>> Probaly can be solved if one can replace SequenceCollection by > >>>>>> something else if needed (SmallMoleculeCollection, GlycanCollection, > >>>>>> MoleculeCollection)... but the validator might not like this. > >>>>> Mh, these can be extensions, I think they are not possible at the moment. > >>>>> But a tag for the type can indeed be useful, it could be a CV param. > >>>>> I will create an issue for that. > >>>>> > >>>>>> 7) in case that DBSequence is nucleotide, do we represent the > >>>>>> Peptide as AA sequence in case of MS done on proteins? > >>>>> I hope the following answers this: > >>>>> > >>>>> <DBSequence> is the nucleotide seq from the nucleotide DB, > >>>>> <Peptide> is the identified amino acid sequence plus mods (without any translation > >>>>> frame or something). > >>>>> <PeptideEvidence> contains the DBSequence_Ref together with a frame and a > >>>>> TranslationTable_Ref attribute. > >>>>> (The Peptide_Ref is done in SpectrumIdentificationItem as in the amino acid DB > >>>>> case.) > >>>>> If a protein detection is performed, there are <PeptideHypothesis> elements > >>>>> referencing > >>>>> PeptideEvidence elements from SpectrumIdentificationItem sections. > >>>>> > >>>>> > >>>>> > >>>>> Bye > >>>>> Martin > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> David Creasy wrote: > >>>>> Thanks Andy, > >>>>> > >>>>> I've added an updated example document to SVN: > >>>>> http://code.google.com/p/psi- > >>>>> pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F00 > >>>>> 1350.xml > >>>>> > >>>>> Problem is that we have now removed the main point of these recent changes > >>>>> which was to add the decoy flag... I think > >>>>> that we need to add isDecoy to SpectrumIdentificationItem. > >>>>> > >>>>> And yes, I suspect that we should go back to using the > >>>>> ConceptualMoleculeCollection > >>>>> Um, and since we've not actually ended up adding anything to DBSequence... we > >>>>> haven't actually achieved anything? > >>>>> I think we need to discuss this again at the next telecon. > >>>>> > >>>>> David > >>>>> > >>>>> Jones, Andy wrote: > >>>>> Hi all, > >>>>> > >>>>> Ive updated the schema in SVN with the following main changes: > >>>>> > >>>>> PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the > >>>>> call (simple mappings to proteins are done > >>>>> at this level) > >>>>> Added DBSequence that should be used instead of Sequence (following some of > >>>>> the discussion below) > >>>>> Created a new collection class SequenceCollection (rather than > >>>>> ConceptualMoleculeCollection) so that only references can > >>>>> be given to DBSequence and Peptide > >>>>> In fact, Im not sure if this is sensible since it prevents other types of > >>>>> ConceptualMolecule being added later... to > >>>>> discuss > >>>>> In FuGE on cvParam, the value attribute is no longer mandatory > >>>>> > >>>>> Ive added a simple example that validates under > >>>>> examples\schema_usecase_examples\working27June > >>>>> > >>>>> Feel free to mail me any changes to make on Monday, > >>>>> Cheers > >>>>> Andy > >>>>> > >>>>> > >>>>> > >>>>> From: psi...@li... [mailto:psidev-pi-dev- > >>>>> bo...@li...] On Behalf Of > >>>>> Jones, Andy > >>>>> Sent: 27 June 2008 16:24 > >>>>> To: Angel Pizarro > >>>>> Cc: psi...@li... > >>>>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > >>>>> > >>>>> I think Angels response below might not have made it round the list yet. > >>>>> > >>>>> I tend to agree that isDecoy is redundant information and perhaps this is not the > >>>>> best place to encode semantic > >>>>> information. An alternative would be to have a parameter, say on > >>>>> SpectrumIdentification for cvParam = decoy_string > >>>>> value = Rev. This would be a more compact representation and we would not > >>>>> have to add what is quite a specific > >>>>> attribute type (isDecoy) to Sequence. > >>>>> > >>>>> > >>>>> > >>>>> From: an...@it... [mailto:an...@it...] On Behalf Of Angel > >>>>> Pizarro > >>>>> Sent: 27 June 2008 15:59 > >>>>> To: Jones, Andy > >>>>> Cc: psi...@li... > >>>>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > >>>>> > >>>>> my 2¢ : > >>>>> You need to be able to extend this to all molecule types, or am I missing the point > >>>>> of this thread, and you mean that > >>>>> this would be a suclass of the conceptual molecule element? > >>>>> > >>>>> Second, and this is is tangentially related, but are decoy sequences really a > >>>>> problem we should be putting our effort > >>>>> into? Is it in our domain to encode semantic information about a sequence, and > >>>>> possibly relating reported sequences as > >>>>> part of our schema? > >>>>> On a personal level I could care less if "isDecoy" is an attribute or not, but the > >>>>> temptation then would be for folks to > >>>>> encode the same accession for two different sequences, effectively making the > >>>>> primary key of the sequence object > >>>>> (accession, isDecoy) > >>>>> > >>>>> > >>>>> Do we want to go there? > >>>>> On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> > >>>>> wrote: > >>>>> So how about include length as an attribute and then let all other things go in the > >>>>> CV (pI, mass, etc.)? > >>>>> > >>>>> > >>>>> > >>>>> From: Jones, Andy > >>>>> Sent: 27 June 2008 14:54 > >>>>> To: 'David Creasy' > >>>>> Subject: RE: [Psidev-pi-dev] Representing Sequences > >>>>> > >>>>> id and name are standard for all elements that inherit from FuGE identifiable this > >>>>> is perhaps a separate discussion as > >>>>> to whether the optional name attribute should be there. > >>>>> > >>>>> I agree that length may be useful is this just an integer value with no unit? > >>>>> Yes, I think so. > >>>>> I'm less sure about pI and mass since mass at least can be calculated very simply > >>>>> Only if you have the sequence... (we have residue masses in the file). > >>>>> > >>>>> > >>>>> , and pI values (in my opinion) are pretty inaccurate and fairly meaningless > >>>>> Scandalous! (I happen to agree, but now some people will never speak to either of > >>>>> us ever again). > >>>>> > >>>>> The main problem with mass and pI is that these are 'irrelevant' if the sequence is > >>>>> nuleic acid rather than residues. > >>>>> Why not just allow CV there? We can share the same CV as the PEFF format, > >>>>> which includes, taxonomy, sequence type, gene > >>>>> ID, and lots of wonderful other things? > >>>>> > >>>>> > >>>>> unless someone can convince me otherwise? > >>>>> Cheers > >>>>> Andy > >>>>> > >>>>> > >>>>> From: David Creasy [mailto:dc...@ma...] > >>>>> Sent: 27 June 2008 14:51 > >>>>> To: Jones, Andy > >>>>> Cc: psi...@li... > >>>>> Subject: Re: [Psidev-pi-dev] Representing Sequences > >>>>> > >>>>> Hi Andy, > >>>>> > >>>>> length may be useful, because some people won't want to output the actual > >>>>> sequence for space reasons. The other things > >>>>> we wanted to add before were pI and mass. > >>>>> Why do we want name? Is this for, say, a description line? > >>>>> (Also, identifier -> id?) > >>>>> > >>>>> David > >>>>> > >>>>> Jones, Andy wrote: > >>>>> Hi all, > >>>>> > >>>>> It was decided on the call that we would like to flag that Sequences in the > >>>>> ConceptualMoleculeCollection should have a > >>>>> Boolean attribute to capture if they are decoy sequences. At the moment we are > >>>>> using the FuGE:Sequence element. I don't > >>>>> really want to add another attribute to this (it's less problematic cutting down FuGE > >>>>> than adding new things), so I'm > >>>>> wondering if we should define our own Sequence type in AnalysisXML. This > >>>>> would also allow us to choose exactly the > >>>>> relevant attributes. At the moment, Sequence can have all of the following: > >>>>> > >>>>> <pf:Sequence isCircular="true" sequence="String" length="0" > >>>>> isApproximateLength="true" > >>>>> SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" > >>>>> name="String"> > >>>>> > >>>>> Several of these attributes were created to represent concepts that probably will > >>>>> never be required or implemented in > >>>>> AnalysisXML. How about the following: > >>>>> > >>>>> <DBSequence identifier = "" name = "" isDecoy = "true"> > >>>>> <seq>MCTMG...</seq> > >>>>> <pf:DatabaseReference Database_ref="" > >>>>> accession="Rev_IPI00013808.1"/> > >>>>> </DBSequence> > >>>>> > >>>>> Are any of the other attributes on Sequence actually required? I'll post a new > >>>>> version of the schema with other changes > >>>>> WRT to PeptideEvidence shortly, > >>>>> Cheers > >>>>> Andy > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> ________________________________________ > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> ------------------------------------------------------------------------- > >>>>> Check out the new SourceForge.net Marketplace. > >>>>> It's the best place to buy or sell services for > >>>>> just about anything Open Source. > >>>>> http://sourceforge.net/services/buy/index.php > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> ________________________________________ > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Psidev-pi-dev mailing list > >>>>> Psi...@li... > >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>>>> > >>>>> > >>>>> -- > >>>>> David Creasy > >>>>> Matrix Science > >>>>> 64 Baker Street > >>>>> London W1U 7GB, UK > >>>>> Tel: +44 (0)20 7486 1050 > >>>>> Fax: +44 (0)20 7224 1344 > >>>>> > >>>>> dc...@ma... > >>>>> http://www.matrixscience.com > >>>>> > >>>>> Matrix Science Ltd. is registered in England and Wales > >>>>> Company number 3533898 > >>>>> > >>>>> > >>>>> > >>>>> ________________________________________ > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> ------------------------------------------------------------------------- > >>>>> Check out the new SourceForge.net Marketplace. > >>>>> It's the best place to buy or sell services for > >>>>> just about anything Open Source. > >>>>> http://sourceforge.net/services/buy/index.php > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> ________________________________________ > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Psidev-pi-dev mailing list > >>>>> Psi...@li... > >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>>>> > >>>>> > >>>>> -- > >>>>> David Creasy > >>>>> Matrix Science > >>>>> 64 Baker Street > >>>>> London W1U 7GB, UK > >>>>> Tel: +44 (0)20 7486 1050 > >>>>> Fax: +44 (0)20 7224 1344 > >>>>> > >>>>> dc...@ma... > >>>>> http://www.matrixscience.com > >>>>> > >>>>> Matrix Science Ltd. is registered in England and Wales > >>>>> Company number 3533898 > >>>>> > >>>>> ------------------------------------------------------------------------- > >>>>> Check out the new SourceForge.net Marketplace. > >>>>> It's the best place to buy or sell services for > >>>>> just about anything Open Source. > >>>>> http://sourceforge.net/services/buy/index.php > >>>>> _______________________________________________ > >>>>> Psidev-pi-dev mailing list > >>>>> Psi...@li... > >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Angel Pizarro > >>>>> Director, ITMAT Bioinformatics Facility > >>>>> 806 Biological Research Building > >>>>> 421 Curie Blvd. > >>>>> Philadelphia, PA 19104-6160 > >>>>> 215-573-3736 > >>>>> ________________________________________ > >>>>> > >>>>> ------------------------------------------------------------------------- > >>>>> Check out the new SourceForge.net Marketplace. > >>>>> It's the best place to buy or sell services for > >>>>> just about anything Open Source. > >>>>> http://sourceforge.net/services/buy/index.php > >>>>> ________________________________________ > >>>>> > >>>>> _______________________________________________ > >>>>> Psidev-pi-dev mailing list > >>>>> Psi...@li... > >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> David Creasy > >>>>> Matrix Science > >>>>> 64 Baker Street > >>>>> London W1U 7GB, UK > >>>>> Tel: +44 (0)20 7486 1050 > >>>>> Fax: +44 (0)20 7224 1344 > >>>>> > >>>>> dc...@ma... > >>>>> http://www.matrixscience.com > >>>>> > >>>>> Matrix Science Ltd. is registered in England and Wales > >>>>> Company number 3533898 > >>>>> > >>>>> ________________________________________ > >>>>> > >>>>> ------------------------------------------------------------------------- > >>>>> Check out the new SourceForge.net Marketplace. > >>>>> It's the best place to buy or sell services for > >>>>> just about anything Open Source. > >>>>> http://sourceforge.net/services/buy/index.php > >>>>> > >>>>> ________________________________________ > >>>>> > >>>>> _______________________________________________ > >>>>> Psidev-pi-dev mailing list > >>>>> Psi...@li... > >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>>>> > >>>>> > >>>>> > >>>>> ------------------------------------------------------------------------- > >>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > >>>>> Build the coolest Linux based applications with Moblin SDK & win great prizes > >>>>> Grand prize is a trip for two to an Open Source event anywhere in the world > >>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ > >>>>> _______________________________________________ > >>>>> Psidev-pi-dev mailing list > >>>>> Psi...@li... > >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>> > >>> ------------------------------------------------------------------------- > >>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > >>> Build the coolest Linux based applications with Moblin SDK & win great prizes > >>> Grand prize is a trip for two to an Open Source event anywhere in the world > >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ > >>> _______________________________________________ > >>> Psidev-pi-dev mailing list > >>> Psi...@li... > >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >> -- > >> David Creasy > >> Matrix Science > >> 64 Baker Street > >> London W1U 7GB, UK > >> Tel: +44 (0)20 7486 1050 > >> Fax: +44 (0)20 7224 1344 > >> > >> dc...@ma... > >> http://www.matrixscience.com > >> > >> Matrix Science Ltd. is registered in England and Wales > >> Company number 3533898 > > > > -- > David Creasy > Matrix Science > 64 Baker Street > London W1U 7GB, UK > Tel: +44 (0)20 7486 1050 > Fax: +44 (0)20 7224 1344 > > dc...@ma... > http://www.matrixscience.com > > Matrix Science Ltd. is registered in England and Wales > Company number 3533898 |