From: David C. <dc...@ma...> - 2008-08-07 14:38:22
|
Martin Eisenacher wrote: >>>>>> I notice also that there is a small error in the schema in that on PeptideEvidence DBSequence_ref >>>>>> >> should be >> >>>>>> mandatory (and it is missing from the instance docs). I can fix this if there is agreement on this? >>>>>> >>>>> Yes, if <PeptideEvidence> stays optional. >>>>> >>>> What about denovo where there is no database... >>>> >>> That is an argument to have PeptideEvidence optional, isn't it? >>> But DBSequence_ref as attribute of it should be mandatory. >>> >> Doh, sorry, yes you are totally correct. It should be mandatory. >> > Now it is. > ANd next problem: WE have two "Sequence_Ref" attributes, > in <PeptideEvidence> and <ProteinHypothesis> (now both mandatory). > What if they are contradictory (validator?)? > If they are not contradictory, at least the one in <ProteinHypothesis> is redundant. > With current use cases, I think it's always redundant, but I'm trying to think of a case where it wouldn't be. However, since the <ProteinDetectionHypothesis> has to have at least one <PeptideHypothesis>, you must be right. Suggest that we remove the reference from <ProteinHypothesis> > >> >(and it is missing from the instance docs). >> I believe it's in all the Mascot ones? >> > It is. > > >>>>>> It is a database search parameter: >>>>>> <AdditionalSearchParams> >>>>>> <pf:cvParam accession="PRIDE:0000162" name="Mass value type setting monoisotopic" cvRef="PRIDE"/> >>>>>> >>>>> Yes, it is, but in case we have more than one SpectrumIdentification, that could be conflicting. >>>>> http://code.google.com/p/psi-pi/issues/detail?id=37 >>>>> >>>> I'm not sure I understand whether this is OK or not now? (And why use >>>> Pride CV?) >>>> >>> I think the current schema is not okay, because it allows "average" in one SpecIdent and "mono" in >>> >> another, >> >>> so it is not well-defined for the masses in elements or attributes. >>> We need a global attribute :-) or element. Or it can be done later in semantic validation :-( . >>> >> I think it's actually _required_ to be like this. For example, at least >> one search engine allows you to specify mono for masses below x and >> average for masses above x. So, in this case, the output should be >> similar to the N15 example that I've supplied, with two separate mass >> tables. Maybe you could look at the Mascot_N15_example.xml and see if >> you think that this is OK. >> > It is okay with me; to answer Pierre-Alains original question then: > all mass values for peptides then depend on > the type of search performed and the residue table used. ;-) > > Bye > Martin > > > > >> Talk soon, >> >> David >> >> >>> Bye >>> Martin >>> >>> >>> >>>>>>> -----Original Message----- >>>>>>> From: psi...@li... [mailto:psidev-pi-dev- >>>>>>> bo...@li...] On Behalf Of Martin Eisenacher >>>>>>> Sent: 30 July 2008 13:05 >>>>>>> To: 'Pierre-Alain Binz' >>>>>>> Cc: psi...@li... >>>>>>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences >>>>>>> >>>>>>> Hi Pierre-Alain, quite old posting, but I saw no answer yet, so I will try: >>>>>>> >>>>>>> >>>>>>>> 2nd July, 2008: >>>>>>>> a couple of questions, just to make sure: >>>>>>>> 1) in case of top-down approach, do we have to duplicate sequenceCollection >>>>>>>> >>>>>>> information? >>>>>>> I hope not, by referencing the same identifier. >>>>>>> >>>>>>> >>>>>>>> as SpectrumIdentificationResult contains a PeptideEvidence refering to a Peptide >>>>>>>> >>>>>>> element >>>>>>> >>>>>>>> (and not to a DBSequence), identification is obligatory a Peptide? >>>>>>>> >>>>>>> At the moment I think it's possible to directly reference a DBSeq. At the time the >>>>>>> foreign key definitions are implemented we can forbid that. >>>>>>> But we should have in mind, that a peptide is a sequence plus modifications, so if >>>>>>> top-down >>>>>>> identifies only a sequence, we should allow that and if top-down identifies with >>>>>>> mods, >>>>>>> we should forbid that. >>>>>>> It would be quite helpful to have a top-down instance doc. To check >>>>>>> whether our thoughts are really deep enough... >>>>>>> >>>>>>> >>>>>>>> 2) and what about spectral library searches, do we have to have Peptide >>>>>>>> elements with possibly undefined explicit sequences to refer to >>>>>>>> >>>>>>> >from the SpectrumIdentificationResult (because non peptidic, or because not >>>>>>> identified >>>>>>> >>>>>>>> but good spectrum) >>>>>>>> >>>>>>> At the moment the sequence element can be empty or even left out. >>>>>>> User or CV params are allowed. >>>>>>> How do they report results in spectral lib search if they identify non-peptidic or >>>>>>> unidentified? >>>>>>> We need CV terms for that... >>>>>>> >>>>>>> >>>>>>>> 3) in the Peptide element, the Modifications are defined in a much more >>>>>>>> detailed manner than in ModificationParams (PSI-MOD is there for >>>>>>>> instance). Does this simply mean that The ModificationParams codes >>>>>>>> the search engine settings and the Peptide includes the formal PSI >>>>>>>> definition of the Mod? And the only reference is the ModName value? >>>>>>>> >>>>>>> I think that has changed meanwhile, in the MPC use case I used PSI-MOD terms >>>>>>> for both. If a search engine has its "own" mods, we need CV for that in PSI-PI CV >>>>>>> or >>>>>>> they can define their own. >>>>>>> >>>>>>> >>>>>>>> 4) all mass values (sequenceMass, calculatedMassToCharge, >>>>>>>> >>>>>>> experimentalMassToCharge, >>>>>>> >>>>>>>> are not specified whether monoisotopic or averaged. >>>>>>>> Do we assume that averaged does not exist anymore? >>>>>>>> >>>>>>> No, we decided to have only one type of masses in the whole analysisXML. >>>>>>> But I cannot find a note for that or a schema attribute... I will add an issue for that. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> 5) is sequenceMass the mass value with/without the mods? If with, the >>>>>>>> name might be missleading (peptideMass would be more appropriate) >>>>>>>> >>>>>>> It is indeed the mass of the sequence without mods. >>>>>>> THAT is described in http://code.google.com/p/psi-pi/wiki/NotesForFocumentation >>>>>>> >>>>>>> >>>>>>>> 6) in case the DBSequence is nucleotide, is there a tag for saying >>>>>>>> this? (NB: MS on nucleotide molecules can be performed and analysed, >>>>>>>> not only MS on AA sequences that are interpreting nucleotide sequences). >>>>>>>> Or do we neglect MS experiments done on nucleotide molecules (and by >>>>>>>> the way on glycans...) and only represent the DBSequences as AA >>>>>>>> sequences (frame translations)? (and what about glycans?) >>>>>>>> Probaly can be solved if one can replace SequenceCollection by >>>>>>>> something else if needed (SmallMoleculeCollection, GlycanCollection, >>>>>>>> MoleculeCollection)... but the validator might not like this. >>>>>>>> >>>>>>> Mh, these can be extensions, I think they are not possible at the moment. >>>>>>> But a tag for the type can indeed be useful, it could be a CV param. >>>>>>> I will create an issue for that. >>>>>>> >>>>>>> >>>>>>>> 7) in case that DBSequence is nucleotide, do we represent the >>>>>>>> Peptide as AA sequence in case of MS done on proteins? >>>>>>>> >>>>>>> I hope the following answers this: >>>>>>> >>>>>>> <DBSequence> is the nucleotide seq from the nucleotide DB, >>>>>>> <Peptide> is the identified amino acid sequence plus mods (without any translation >>>>>>> frame or something). >>>>>>> <PeptideEvidence> contains the DBSequence_Ref together with a frame and a >>>>>>> TranslationTable_Ref attribute. >>>>>>> (The Peptide_Ref is done in SpectrumIdentificationItem as in the amino acid DB >>>>>>> case.) >>>>>>> If a protein detection is performed, there are <PeptideHypothesis> elements >>>>>>> referencing >>>>>>> PeptideEvidence elements from SpectrumIdentificationItem sections. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Bye >>>>>>> Martin >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> David Creasy wrote: >>>>>>> Thanks Andy, >>>>>>> >>>>>>> I've added an updated example document to SVN: >>>>>>> http://code.google.com/p/psi- >>>>>>> pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F00 >>>>>>> 1350.xml >>>>>>> >>>>>>> Problem is that we have now removed the main point of these recent changes >>>>>>> which was to add the decoy flag... I think >>>>>>> that we need to add isDecoy to SpectrumIdentificationItem. >>>>>>> >>>>>>> And yes, I suspect that we should go back to using the >>>>>>> ConceptualMoleculeCollection >>>>>>> Um, and since we've not actually ended up adding anything to DBSequence... we >>>>>>> haven't actually achieved anything? >>>>>>> I think we need to discuss this again at the next telecon. >>>>>>> >>>>>>> David >>>>>>> >>>>>>> Jones, Andy wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> I’ve updated the schema in SVN with the following main changes: >>>>>>> >>>>>>> PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the >>>>>>> call (simple mappings to proteins are done >>>>>>> at this level) >>>>>>> Added DBSequence that should be used instead of Sequence (following some of >>>>>>> the discussion below) >>>>>>> Created a new collection class SequenceCollection (rather than >>>>>>> ConceptualMoleculeCollection) so that only references can >>>>>>> be given to DBSequence and Peptide >>>>>>> In fact, I’m not sure if this is sensible since it prevents other types of >>>>>>> ConceptualMolecule being added later... to >>>>>>> discuss >>>>>>> In FuGE on cvParam, the value attribute is no longer mandatory >>>>>>> >>>>>>> I’ve added a simple example that validates under >>>>>>> examples\schema_usecase_examples\working27June >>>>>>> >>>>>>> Feel free to mail me any changes to make on Monday, >>>>>>> Cheers >>>>>>> Andy >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: psi...@li... [mailto:psidev-pi-dev- >>>>>>> bo...@li...] On Behalf Of >>>>>>> Jones, Andy >>>>>>> Sent: 27 June 2008 16:24 >>>>>>> To: Angel Pizarro >>>>>>> Cc: psi...@li... >>>>>>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences >>>>>>> >>>>>>> I think Angel’s response below might not have made it round the list yet. >>>>>>> >>>>>>> I tend to agree that isDecoy is redundant information and perhaps this is not the >>>>>>> best place to encode semantic >>>>>>> information. An alternative would be to have a parameter, say on >>>>>>> SpectrumIdentification for cvParam = “decoy_string” >>>>>>> value = “Rev”. This would be a more compact representation and we would not >>>>>>> have to add what is quite a specific >>>>>>> attribute type (isDecoy) to Sequence. >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: an...@it... [mailto:an...@it...] On Behalf Of Angel >>>>>>> Pizarro >>>>>>> Sent: 27 June 2008 15:59 >>>>>>> To: Jones, Andy >>>>>>> Cc: psi...@li... >>>>>>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences >>>>>>> >>>>>>> my 2¢ : >>>>>>> You need to be able to extend this to all molecule types, or am I missing the point >>>>>>> of this thread, and you mean that >>>>>>> this would be a suclass of the conceptual molecule element? >>>>>>> >>>>>>> Second, and this is is tangentially related, but are decoy sequences really a >>>>>>> problem we should be putting our effort >>>>>>> into? Is it in our domain to encode semantic information about a sequence, and >>>>>>> possibly relating reported sequences as >>>>>>> part of our schema? >>>>>>> On a personal level I could care less if "isDecoy" is an attribute or not, but the >>>>>>> temptation then would be for folks to >>>>>>> encode the same accession for two different sequences, effectively making the >>>>>>> primary key of the sequence object >>>>>>> (accession, isDecoy) >>>>>>> >>>>>>> >>>>>>> Do we want to go there? >>>>>>> On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> >>>>>>> wrote: >>>>>>> So how about include length as an attribute and then let all other things go in the >>>>>>> CV (pI, mass, etc.)? >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Jones, Andy >>>>>>> Sent: 27 June 2008 14:54 >>>>>>> To: 'David Creasy' >>>>>>> Subject: RE: [Psidev-pi-dev] Representing Sequences >>>>>>> >>>>>>> id and name are standard for all elements that inherit from FuGE identifiable – this >>>>>>> is perhaps a separate discussion as >>>>>>> to whether the optional name attribute should be there. >>>>>>> >>>>>>> I agree that length may be useful – is this just an integer value with no unit? >>>>>>> Yes, I think so. >>>>>>> I'm less sure about pI and mass since mass at least can be calculated very simply >>>>>>> Only if you have the sequence... (we have residue masses in the file). >>>>>>> >>>>>>> >>>>>>> , and pI values (in my opinion) are pretty inaccurate and fairly meaningless >>>>>>> Scandalous! (I happen to agree, but now some people will never speak to either of >>>>>>> us ever again). >>>>>>> >>>>>>> The main problem with mass and pI is that these are 'irrelevant' if the sequence is >>>>>>> nuleic acid rather than residues. >>>>>>> Why not just allow CV there? We can share the same CV as the PEFF format, >>>>>>> which includes, taxonomy, sequence type, gene >>>>>>> ID, and lots of wonderful other things? >>>>>>> >>>>>>> >>>>>>> – unless someone can convince me otherwise? >>>>>>> Cheers >>>>>>> Andy >>>>>>> >>>>>>> >>>>>>> From: David Creasy [mailto:dc...@ma...] >>>>>>> Sent: 27 June 2008 14:51 >>>>>>> To: Jones, Andy >>>>>>> Cc: psi...@li... >>>>>>> Subject: Re: [Psidev-pi-dev] Representing Sequences >>>>>>> >>>>>>> Hi Andy, >>>>>>> >>>>>>> length may be useful, because some people won't want to output the actual >>>>>>> sequence for space reasons. The other things >>>>>>> we wanted to add before were pI and mass. >>>>>>> Why do we want name? Is this for, say, a description line? >>>>>>> (Also, identifier -> id?) >>>>>>> >>>>>>> David >>>>>>> >>>>>>> Jones, Andy wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> It was decided on the call that we would like to flag that Sequences in the >>>>>>> ConceptualMoleculeCollection should have a >>>>>>> Boolean attribute to capture if they are decoy sequences. At the moment we are >>>>>>> using the FuGE:Sequence element. I don't >>>>>>> really want to add another attribute to this (it's less problematic cutting down FuGE >>>>>>> than adding new things), so I'm >>>>>>> wondering if we should define our own Sequence type in AnalysisXML. This >>>>>>> would also allow us to choose exactly the >>>>>>> relevant attributes. At the moment, Sequence can have all of the following: >>>>>>> >>>>>>> <pf:Sequence isCircular="true" sequence="String" length="0" >>>>>>> isApproximateLength="true" >>>>>>> SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" >>>>>>> name="String"> >>>>>>> >>>>>>> Several of these attributes were created to represent concepts that probably will >>>>>>> never be required or implemented in >>>>>>> AnalysisXML. How about the following: >>>>>>> >>>>>>> <DBSequence identifier = "" name = "" isDecoy = "true"> >>>>>>> <seq>MCTMG...</seq> >>>>>>> <pf:DatabaseReference Database_ref="" >>>>>>> accession="Rev_IPI00013808.1"/> >>>>>>> </DBSequence> >>>>>>> >>>>>>> Are any of the other attributes on Sequence actually required? I'll post a new >>>>>>> version of the schema with other changes >>>>>>> WRT to PeptideEvidence shortly, >>>>>>> Cheers >>>>>>> Andy >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ________________________________________ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------- >>>>>>> Check out the new SourceForge.net Marketplace. >>>>>>> It's the best place to buy or sell services for >>>>>>> just about anything Open Source. >>>>>>> http://sourceforge.net/services/buy/index.php >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ________________________________________ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Psidev-pi-dev mailing list >>>>>>> Psi...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> David Creasy >>>>>>> Matrix Science >>>>>>> 64 Baker Street >>>>>>> London W1U 7GB, UK >>>>>>> Tel: +44 (0)20 7486 1050 >>>>>>> Fax: +44 (0)20 7224 1344 >>>>>>> >>>>>>> dc...@ma... >>>>>>> http://www.matrixscience.com >>>>>>> >>>>>>> Matrix Science Ltd. is registered in England and Wales >>>>>>> Company number 3533898 >>>>>>> >>>>>>> >>>>>>> >>>>>>> ________________________________________ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------- >>>>>>> Check out the new SourceForge.net Marketplace. >>>>>>> It's the best place to buy or sell services for >>>>>>> just about anything Open Source. >>>>>>> http://sourceforge.net/services/buy/index.php >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ________________________________________ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Psidev-pi-dev mailing list >>>>>>> Psi...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> David Creasy >>>>>>> Matrix Science >>>>>>> 64 Baker Street >>>>>>> London W1U 7GB, UK >>>>>>> Tel: +44 (0)20 7486 1050 >>>>>>> Fax: +44 (0)20 7224 1344 >>>>>>> >>>>>>> dc...@ma... >>>>>>> http://www.matrixscience.com >>>>>>> >>>>>>> Matrix Science Ltd. is registered in England and Wales >>>>>>> Company number 3533898 >>>>>>> >>>>>>> ------------------------------------------------------------------------- >>>>>>> Check out the new SourceForge.net Marketplace. >>>>>>> It's the best place to buy or sell services for >>>>>>> just about anything Open Source. >>>>>>> http://sourceforge.net/services/buy/index.php >>>>>>> _______________________________________________ >>>>>>> Psidev-pi-dev mailing list >>>>>>> Psi...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Angel Pizarro >>>>>>> Director, ITMAT Bioinformatics Facility >>>>>>> 806 Biological Research Building >>>>>>> 421 Curie Blvd. >>>>>>> Philadelphia, PA 19104-6160 >>>>>>> 215-573-3736 >>>>>>> ________________________________________ >>>>>>> >>>>>>> ------------------------------------------------------------------------- >>>>>>> Check out the new SourceForge.net Marketplace. >>>>>>> It's the best place to buy or sell services for >>>>>>> just about anything Open Source. >>>>>>> http://sourceforge.net/services/buy/index.php >>>>>>> ________________________________________ >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Psidev-pi-dev mailing list >>>>>>> Psi...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> David Creasy >>>>>>> Matrix Science >>>>>>> 64 Baker Street >>>>>>> London W1U 7GB, UK >>>>>>> Tel: +44 (0)20 7486 1050 >>>>>>> Fax: +44 (0)20 7224 1344 >>>>>>> >>>>>>> dc...@ma... >>>>>>> http://www.matrixscience.com >>>>>>> >>>>>>> Matrix Science Ltd. is registered in England and Wales >>>>>>> Company number 3533898 >>>>>>> >>>>>>> ________________________________________ >>>>>>> >>>>>>> ------------------------------------------------------------------------- >>>>>>> Check out the new SourceForge.net Marketplace. >>>>>>> It's the best place to buy or sell services for >>>>>>> just about anything Open Source. >>>>>>> http://sourceforge.net/services/buy/index.php >>>>>>> >>>>>>> ________________________________________ >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Psidev-pi-dev mailing list >>>>>>> Psi...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------- >>>>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >>>>>>> Build the coolest Linux based applications with Moblin SDK & win great prizes >>>>>>> Grand prize is a trip for two to an Open Source event anywhere in the world >>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>> _______________________________________________ >>>>>>> Psidev-pi-dev mailing list >>>>>>> Psi...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>>>>>> >>>>> ------------------------------------------------------------------------- >>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >>>>> Build the coolest Linux based applications with Moblin SDK & win great prizes >>>>> Grand prize is a trip for two to an Open Source event anywhere in the world >>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>> _______________________________________________ >>>>> Psidev-pi-dev mailing list >>>>> Psi...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>>>> >>>> -- >>>> David Creasy >>>> Matrix Science >>>> 64 Baker Street >>>> London W1U 7GB, UK >>>> Tel: +44 (0)20 7486 1050 >>>> Fax: +44 (0)20 7224 1344 >>>> >>>> dc...@ma... >>>> http://www.matrixscience.com >>>> >>>> Matrix Science Ltd. is registered in England and Wales >>>> Company number 3533898 >>>> >> -- >> David Creasy >> Matrix Science >> 64 Baker Street >> London W1U 7GB, UK >> Tel: +44 (0)20 7486 1050 >> Fax: +44 (0)20 7224 1344 >> >> dc...@ma... >> http://www.matrixscience.com >> >> Matrix Science Ltd. is registered in England and Wales >> Company number 3533898 >> > > -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |