From: Matthew C. <mat...@va...> - 2008-06-18 22:12:20
|
Hi all, The prediction and matching rules in database search engines are arbitrarily complex algorithms and it is quite intractable to come up with a file-scope way to specify the algorithm. Thinking again about result-scope specification of the predicted and matched ions, it seems possible to come up with an optimized format (minimizing markup bloat). Is there value in annotating the predicted but missed ions, or is it only desirable to annotate the matched ions? Also, is there an established format for specifying the composition of peptide fragments (accounting for losses, radicals, and charge)? There was some suggestion on the last call of referring to a script which could run the algorithm. I think that's a reasonable approach if the algorithms are put in the CV and there is a version parameter when they are referenced by a results file. I would also hope that we maintain a link to the script in the CV definition. Possibly, the version would have to be part of the term itself, or there could be some convention where the script provides a version parameter? Given these nasty versioning issues, the result-scope approach starts to look better... -Matt |
From: Jones, A. <And...@li...> - 2008-06-20 10:40:11
|
Hi all, I've made an update to the schema to try out a spec for translation table and committed to SVN. There is an example under schema_usecase_examples\working20June. It differs slightly from David's proposal in two ways: 1) I have not included any model for a custom translation table. My opinion is that this would be overkill in the schema, and it would be better to create an external reference e.g. to a web page (as a CV ref) if someone invents a new translation table. Any opinions? 2) I have not included translation frame on Sequence, since the translation frame is a property of the result, not of the sequence itself. I decided to follow the same pattern as for searches of protein sequences databases, as follows: <!-- Standard DNA sequence entry --> <pf:Sequence identifier="EST_1" length="251" sequence="ATATAGCAGCTAGCTGTAGCTAGCTAGCTACGTACGATCGATCGTACGTACGTAGCTACGATCGTAGTAGTATACGATCGGCGCTAGCTATCGCTACGATCGGCTATCGATCGTC"/> <!-- Standard peptide entry --> <Peptide identifier="peptide_1_1" sequenceMass="1341.72522" sequenceLength="12"> <peptideSequence>DAGTISGLNVLR</peptideSequence> </Peptide> <!-- Standard SpectrumIdentificationItem --> <SpectrumIdentificationResult identifier="ident_pep_1_1"> <SpectrumIdentificationItem identifier="1_1" calculatedMassToCharge="670.86261" chargeState="2" experimentalMassToCharge="671.9" Peptide_ref="peptide_1_1"> <pf:cvParam accession="PI:99999" name="mascot_ions_score" cvRef="PSI-PI" value="62.72" /> <pf:cvParam accession="PI:99999" name="mascot_expect_value" cvRef="PSI-PI" value="0.00273766445377971" /> <pf:cvParam accession="PI:99999" name="mascot_rank" cvRef="PSI-PI" value="1" /> </SpectrumIdentificationItem> <SpectrumElement spectrumID="dp210198c 21-Jan-98 DERIVED SPECTRUM #9" SpectraData_ref="IF1"/> </SpectrumIdentificationResult> <!-- Translation table specified in the search protocol --> <DatabaseTranslationFrames frames="1,2,3,4,5,6"> <TranslationTable identifier="Table_1"> <pf:cvParam accession="transl_table=2" name="The Vertebrate Mitochondrial Code" value="" cvRef="NCBI"> </pf:cvParam> </TranslationTable> </DatabaseTranslationFrames> <!-- Then a new element (subclass of PeptideEvidence) under ProteinDetectionHypothesis <ProteinAmbiguityGroup identifier="hit_1" > <ProteinDetectionHypothesis identifier="prot_1" Sequence_ref = "EST_1"> <TranslatedPeptideEvidence start="160" end="171" SpectrumIdentificationItem_ref="1_1" post="K" pre="I" frame = "3" TranslationTable_ref="Table_1" /> </ProteinDetectionHypothesis> Does this make sense? I haven't yet added anything to the schema for cleavage enzyme (Issue 30), I think this needs more discussion on the list. Simon H makes a good point about nomenclature. I think we should try get people on the call next week to work through the options...? Cheers Andy |
From: David C. <dc...@ma...> - 2008-06-24 10:33:39
|
Hi Andy, Jones, Andy wrote: > Hi all, > > I've made an update to the schema to try out a spec for translation table and committed to SVN. There is an example under schema_usecase_examples\working20June. Brilliant, thanks. > > It differs slightly from David's proposal in two ways: > > 1) I have not included any model for a custom translation table. My opinion is that this would be overkill in the schema, and it would be better to create an external reference e.g. to a web page (as a CV ref) if someone invents a new translation table. Any opinions? I think this is fine. > 2) I have not included translation frame on Sequence, since the translation frame is a property of the result, not of the sequence itself. That's also fine - in fact this is what I asked for. (Translation frame as a property of the result, but translation table as a property of the sequence). Ah, I see now that you want both as a property of the result. That's also fine by me, and probably clearer (although slightly more verbose). > I decided to follow the same pattern as for searches of protein sequences databases, as follows: > > <!-- Standard DNA sequence entry --> > <pf:Sequence identifier="EST_1" length="251" sequence="ATATAGCAGCTAGCTGTAGCTAGCTAGCTACGTACGATCGATCGTACGTACGTAGCTACGATCGTAGTAGTATACGATCGGCGCTAGCTATCGCTACGATCGGCTATCGATCGTC"/> > > <!-- Standard peptide entry --> > <Peptide identifier="peptide_1_1" sequenceMass="1341.72522" sequenceLength="12"> > <peptideSequence>DAGTISGLNVLR</peptideSequence> > </Peptide> > > <!-- Standard SpectrumIdentificationItem --> > <SpectrumIdentificationResult identifier="ident_pep_1_1"> > <SpectrumIdentificationItem identifier="1_1" calculatedMassToCharge="670.86261" chargeState="2" experimentalMassToCharge="671.9" Peptide_ref="peptide_1_1"> > <pf:cvParam accession="PI:99999" name="mascot_ions_score" cvRef="PSI-PI" value="62.72" /> > <pf:cvParam accession="PI:99999" name="mascot_expect_value" cvRef="PSI-PI" value="0.00273766445377971" /> > <pf:cvParam accession="PI:99999" name="mascot_rank" cvRef="PSI-PI" value="1" /> > </SpectrumIdentificationItem> > <SpectrumElement spectrumID="dp210198c 21-Jan-98 DERIVED SPECTRUM #9" SpectraData_ref="IF1"/> > </SpectrumIdentificationResult> > > > <!-- Translation table specified in the search protocol --> > <DatabaseTranslationFrames frames="1,2,3,4,5,6"> > <TranslationTable identifier="Table_1"> > <pf:cvParam accession="transl_table=2" name="The Vertebrate Mitochondrial Code" value="" cvRef="NCBI"> Why not <pf:cvParam accession="transl_table" name="The Vertebrate Mitochondrial Code" value="2" cvRef="NCBI"> ?? > </pf:cvParam> > </TranslationTable> > </DatabaseTranslationFrames> > > > <!-- Then a new element (subclass of PeptideEvidence) under ProteinDetectionHypothesis > <ProteinAmbiguityGroup identifier="hit_1" > > <ProteinDetectionHypothesis identifier="prot_1" Sequence_ref = "EST_1"> > > <TranslatedPeptideEvidence start="160" end="171" SpectrumIdentificationItem_ref="1_1" post="K" pre="I" frame = "3" TranslationTable_ref="Table_1" /> > </ProteinDetectionHypothesis> > > > Does this make sense? Yes. But it makes it harder for a parser to have to look for TranslatedPeptideEvidence and PeptideEvidence? Why not just add optional attributes to PeptideEvidence? > > > I haven't yet added anything to the schema for cleavage enzyme (Issue 30), I think this needs more discussion on the list. Simon H makes a good point about nomenclature. I think we should try get people on the call next week to work through the options...? Sounds good, David > > Cheers > Andy > > > > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: Martin E. <mar...@ru...> - 2008-07-02 12:00:56
|
Hi! > > <!-- Then a new element (subclass of PeptideEvidence) under ProteinDetectionHypothesis > > <ProteinAmbiguityGroup identifier="hit_1" > > > <ProteinDetectionHypothesis identifier="prot_1" Sequence_ref = "EST_1"> > > <TranslatedPeptideEvidence start="160" end="171" > > SpectrumIdentificationItem_ref="1_1" post="K" pre="I" frame = "3" > > TranslationTable_ref="Table_1" > > > > Does this make sense? > Yes. But it makes it harder for a parser to have to look for TranslatedPeptideEvidence and > PeptideEvidence? > Why not just add optional attributes to PeptideEvidence? With optional attributes it would be possible to code peptide results containing reference to a nucleotide sequence without frame and translation table attribute (frame can be eventually reconstructed). With mandatory attributes (can be schema-coded if we have TranslatedPeptideEvidence!) this can be avoided. So I vote for Andy's proposal. Bye Martin |