From: Martin E. <mar...@ru...> - 2008-07-31 11:35:29
|
Hi Andy, hi all, > As I see it SpectrumIdentificationItem is intended only for identifying Peptides. I didn't fully understand Yes, I agree; but I understood Pierre-Alains question as a hint, that top-down identifies protein sequences, so we would have to double information, referencing a protein sequence as <Peptide> from <SpectrumIdentificationItem> and then the same sequence as <DBSequence> from <ProteinDetectionResult>. But I might be wrong and we definitely have to wait for a top-down instance doc. > Looking at it again, the model of SpectrumIdentificationItem is a little hard to understand and we could > probably improve it. This is because SpectrumIdentificationItem has both Peptide_ref (i.e. a reference to a > Peptide sequence and its mods) plus PeptideEvidence which is a reference to the part of the ProteinSequence > this Peptide was derived from. The PeptideEvidence lines could be shifted up to <Peptide> and renamed e.g. > SourceProtein - this would save some space and would appear to be a logically more sensible model... You mean shifting <PeptideEvidence> under <Peptide> in the SequenceCollection? But missedcleavages is only well-defined in relation to a search (using an enzyme)! > I notice also that there is a small error in the schema in that on PeptideEvidence DBSequence_ref should be > mandatory (and it is missing from the instance docs). I can fix this if there is agreement on this? Yes, if <PeptideEvidence> stays optional. > > >4) all mass values (sequenceMass, calculatedMassToCharge, > > experimentalMassToCharge, > > >are not specified whether monoisotopic or averaged. > > >Do we assume that averaged does not exist anymore? > > No, we decided to have only one type of masses in the whole analysisXML. > > But I cannot find a note for that or a schema attribute... I will add an issue for that. > > It is a database search parameter: > <AdditionalSearchParams> > <pf:cvParam accession="PRIDE:0000162" name="Mass value type setting monoisotopic" cvRef="PRIDE"/> Yes, it is, but in case we have more than one SpectrumIdentification, that could be conflicting. http://code.google.com/p/psi-pi/issues/detail?id=37 bye Martin > > > -----Original Message----- > > From: psi...@li... [mailto:psidev-pi-dev- > > bo...@li...] On Behalf Of Martin Eisenacher > > Sent: 30 July 2008 13:05 > > To: 'Pierre-Alain Binz' > > Cc: psi...@li... > > Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > > > > Hi Pierre-Alain, quite old posting, but I saw no answer yet, so I will try: > > > > >2nd July, 2008: > > >a couple of questions, just to make sure: > > > > >1) in case of top-down approach, do we have to duplicate sequenceCollection > > information? > > I hope not, by referencing the same identifier. > > > > >as SpectrumIdentificationResult contains a PeptideEvidence refering to a Peptide > > element > > >(and not to a DBSequence), identification is obligatory a Peptide? > > At the moment I think it's possible to directly reference a DBSeq. At the time the > > foreign key definitions are implemented we can forbid that. > > But we should have in mind, that a peptide is a sequence plus modifications, so if > > top-down > > identifies only a sequence, we should allow that and if top-down identifies with > > mods, > > we should forbid that. > > It would be quite helpful to have a top-down instance doc. To check > > whether our thoughts are really deep enough... > > > > >2) and what about spectral library searches, do we have to have Peptide > > >elements with possibly undefined explicit sequences to refer to > > >from the SpectrumIdentificationResult (because non peptidic, or because not > > identified > > >but good spectrum) > > At the moment the sequence element can be empty or even left out. > > User or CV params are allowed. > > How do they report results in spectral lib search if they identify non-peptidic or > > unidentified? > > We need CV terms for that... > > > > >3) in the Peptide element, the Modifications are defined in a much more > > >detailed manner than in ModificationParams (PSI-MOD is there for > > >instance). Does this simply mean that The ModificationParams codes > > >the search engine settings and the Peptide includes the formal PSI > > >definition of the Mod? And the only reference is the ModName value? > > I think that has changed meanwhile, in the MPC use case I used PSI-MOD terms > > for both. If a search engine has its "own" mods, we need CV for that in PSI-PI CV > > or > > they can define their own. > > > > >4) all mass values (sequenceMass, calculatedMassToCharge, > > experimentalMassToCharge, > > >are not specified whether monoisotopic or averaged. > > >Do we assume that averaged does not exist anymore? > > No, we decided to have only one type of masses in the whole analysisXML. > > But I cannot find a note for that or a schema attribute... I will add an issue for that. > > > > > > >5) is sequenceMass the mass value with/without the mods? If with, the > > >name might be missleading (peptideMass would be more appropriate) > > It is indeed the mass of the sequence without mods. > > THAT is described in http://code.google.com/p/psi-pi/wiki/NotesForFocumentation > > > > >6) in case the DBSequence is nucleotide, is there a tag for saying > > >this? (NB: MS on nucleotide molecules can be performed and analysed, > > >not only MS on AA sequences that are interpreting nucleotide sequences). > > >Or do we neglect MS experiments done on nucleotide molecules (and by > > >the way on glycans...) and only represent the DBSequences as AA > > >sequences (frame translations)? (and what about glycans?) > > >Probaly can be solved if one can replace SequenceCollection by > > >something else if needed (SmallMoleculeCollection, GlycanCollection, > > >MoleculeCollection)... but the validator might not like this. > > Mh, these can be extensions, I think they are not possible at the moment. > > But a tag for the type can indeed be useful, it could be a CV param. > > I will create an issue for that. > > > > >7) in case that DBSequence is nucleotide, do we represent the > > >Peptide as AA sequence in case of MS done on proteins? > > I hope the following answers this: > > > > <DBSequence> is the nucleotide seq from the nucleotide DB, > > <Peptide> is the identified amino acid sequence plus mods (without any translation > > frame or something). > > <PeptideEvidence> contains the DBSequence_Ref together with a frame and a > > TranslationTable_Ref attribute. > > (The Peptide_Ref is done in SpectrumIdentificationItem as in the amino acid DB > > case.) > > If a protein detection is performed, there are <PeptideHypothesis> elements > > referencing > > PeptideEvidence elements from SpectrumIdentificationItem sections. > > > > > > > > Bye > > Martin > > > > > > > > > > David Creasy wrote: > > Thanks Andy, > > > > I've added an updated example document to SVN: > > http://code.google.com/p/psi- > > pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F00 > > 1350.xml > > > > Problem is that we have now removed the main point of these recent changes > > which was to add the decoy flag... I think > > that we need to add isDecoy to SpectrumIdentificationItem. > > > > And yes, I suspect that we should go back to using the > > ConceptualMoleculeCollection > > Um, and since we've not actually ended up adding anything to DBSequence... we > > haven't actually achieved anything? > > I think we need to discuss this again at the next telecon. > > > > David > > > > Jones, Andy wrote: > > Hi all, > > > > Ive updated the schema in SVN with the following main changes: > > > > PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the > > call (simple mappings to proteins are done > > at this level) > > Added DBSequence that should be used instead of Sequence (following some of > > the discussion below) > > Created a new collection class SequenceCollection (rather than > > ConceptualMoleculeCollection) so that only references can > > be given to DBSequence and Peptide > > In fact, Im not sure if this is sensible since it prevents other types of > > ConceptualMolecule being added later... to > > discuss > > In FuGE on cvParam, the value attribute is no longer mandatory > > > > Ive added a simple example that validates under > > examples\schema_usecase_examples\working27June > > > > Feel free to mail me any changes to make on Monday, > > Cheers > > Andy > > > > > > > > From: psi...@li... [mailto:psidev-pi-dev- > > bo...@li...] On Behalf Of > > Jones, Andy > > Sent: 27 June 2008 16:24 > > To: Angel Pizarro > > Cc: psi...@li... > > Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > > > > I think Angels response below might not have made it round the list yet. > > > > I tend to agree that isDecoy is redundant information and perhaps this is not the > > best place to encode semantic > > information. An alternative would be to have a parameter, say on > > SpectrumIdentification for cvParam = decoy_string > > value = Rev. This would be a more compact representation and we would not > > have to add what is quite a specific > > attribute type (isDecoy) to Sequence. > > > > > > > > From: an...@it... [mailto:an...@it...] On Behalf Of Angel > > Pizarro > > Sent: 27 June 2008 15:59 > > To: Jones, Andy > > Cc: psi...@li... > > Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > > > > my 2¢ : > > You need to be able to extend this to all molecule types, or am I missing the point > > of this thread, and you mean that > > this would be a suclass of the conceptual molecule element? > > > > Second, and this is is tangentially related, but are decoy sequences really a > > problem we should be putting our effort > > into? Is it in our domain to encode semantic information about a sequence, and > > possibly relating reported sequences as > > part of our schema? > > On a personal level I could care less if "isDecoy" is an attribute or not, but the > > temptation then would be for folks to > > encode the same accession for two different sequences, effectively making the > > primary key of the sequence object > > (accession, isDecoy) > > > > > > Do we want to go there? > > On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> > > wrote: > > So how about include length as an attribute and then let all other things go in the > > CV (pI, mass, etc.)? > > > > > > > > From: Jones, Andy > > Sent: 27 June 2008 14:54 > > To: 'David Creasy' > > Subject: RE: [Psidev-pi-dev] Representing Sequences > > > > id and name are standard for all elements that inherit from FuGE identifiable this > > is perhaps a separate discussion as > > to whether the optional name attribute should be there. > > > > I agree that length may be useful is this just an integer value with no unit? > > Yes, I think so. > > I'm less sure about pI and mass since mass at least can be calculated very simply > > Only if you have the sequence... (we have residue masses in the file). > > > > > > , and pI values (in my opinion) are pretty inaccurate and fairly meaningless > > Scandalous! (I happen to agree, but now some people will never speak to either of > > us ever again). > > > > The main problem with mass and pI is that these are 'irrelevant' if the sequence is > > nuleic acid rather than residues. > > Why not just allow CV there? We can share the same CV as the PEFF format, > > which includes, taxonomy, sequence type, gene > > ID, and lots of wonderful other things? > > > > > > unless someone can convince me otherwise? > > Cheers > > Andy > > > > > > From: David Creasy [mailto:dc...@ma...] > > Sent: 27 June 2008 14:51 > > To: Jones, Andy > > Cc: psi...@li... > > Subject: Re: [Psidev-pi-dev] Representing Sequences > > > > Hi Andy, > > > > length may be useful, because some people won't want to output the actual > > sequence for space reasons. The other things > > we wanted to add before were pI and mass. > > Why do we want name? Is this for, say, a description line? > > (Also, identifier -> id?) > > > > David > > > > Jones, Andy wrote: > > Hi all, > > > > It was decided on the call that we would like to flag that Sequences in the > > ConceptualMoleculeCollection should have a > > Boolean attribute to capture if they are decoy sequences. At the moment we are > > using the FuGE:Sequence element. I don't > > really want to add another attribute to this (it's less problematic cutting down FuGE > > than adding new things), so I'm > > wondering if we should define our own Sequence type in AnalysisXML. This > > would also allow us to choose exactly the > > relevant attributes. At the moment, Sequence can have all of the following: > > > > <pf:Sequence isCircular="true" sequence="String" length="0" > > isApproximateLength="true" > > SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" > > name="String"> > > > > Several of these attributes were created to represent concepts that probably will > > never be required or implemented in > > AnalysisXML. How about the following: > > > > <DBSequence identifier = "" name = "" isDecoy = "true"> > > <seq>MCTMG...</seq> > > <pf:DatabaseReference Database_ref="" > > accession="Rev_IPI00013808.1"/> > > </DBSequence> > > > > Are any of the other attributes on Sequence actually required? I'll post a new > > version of the schema with other changes > > WRT to PeptideEvidence shortly, > > Cheers > > Andy > > > > > > > > > > > > > > > > > > > > > > > > ________________________________________ > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------- > > Check out the new SourceForge.net Marketplace. > > It's the best place to buy or sell services for > > just about anything Open Source. > > http://sourceforge.net/services/buy/index.php > > > > > > > > > > > > > > ________________________________________ > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Psidev-pi-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > > > -- > > David Creasy > > Matrix Science > > 64 Baker Street > > London W1U 7GB, UK > > Tel: +44 (0)20 7486 1050 > > Fax: +44 (0)20 7224 1344 > > > > dc...@ma... > > http://www.matrixscience.com > > > > Matrix Science Ltd. is registered in England and Wales > > Company number 3533898 > > > > > > > > ________________________________________ > > > > > > > > > > > > > > > > ------------------------------------------------------------------------- > > Check out the new SourceForge.net Marketplace. > > It's the best place to buy or sell services for > > just about anything Open Source. > > http://sourceforge.net/services/buy/index.php > > > > > > > > > > ________________________________________ > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Psidev-pi-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > > > -- > > David Creasy > > Matrix Science > > 64 Baker Street > > London W1U 7GB, UK > > Tel: +44 (0)20 7486 1050 > > Fax: +44 (0)20 7224 1344 > > > > dc...@ma... > > http://www.matrixscience.com > > > > Matrix Science Ltd. is registered in England and Wales > > Company number 3533898 > > > > ------------------------------------------------------------------------- > > Check out the new SourceForge.net Marketplace. > > It's the best place to buy or sell services for > > just about anything Open Source. > > http://sourceforge.net/services/buy/index.php > > _______________________________________________ > > Psidev-pi-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > > > > > -- > > Angel Pizarro > > Director, ITMAT Bioinformatics Facility > > 806 Biological Research Building > > 421 Curie Blvd. > > Philadelphia, PA 19104-6160 > > 215-573-3736 > > ________________________________________ > > > > ------------------------------------------------------------------------- > > Check out the new SourceForge.net Marketplace. > > It's the best place to buy or sell services for > > just about anything Open Source. > > http://sourceforge.net/services/buy/index.php > > ________________________________________ > > > > _______________________________________________ > > Psidev-pi-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > > > > > -- > > David Creasy > > Matrix Science > > 64 Baker Street > > London W1U 7GB, UK > > Tel: +44 (0)20 7486 1050 > > Fax: +44 (0)20 7224 1344 > > > > dc...@ma... > > http://www.matrixscience.com > > > > Matrix Science Ltd. is registered in England and Wales > > Company number 3533898 > > > > ________________________________________ > > > > ------------------------------------------------------------------------- > > Check out the new SourceForge.net Marketplace. > > It's the best place to buy or sell services for > > just about anything Open Source. > > http://sourceforge.net/services/buy/index.php > > > > ________________________________________ > > > > _______________________________________________ > > Psidev-pi-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > > > > > ------------------------------------------------------------------------- > > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > > Build the coolest Linux based applications with Moblin SDK & win great prizes > > Grand prize is a trip for two to an Open Source event anywhere in the world > > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > _______________________________________________ > > Psidev-pi-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev |