From: Jones, A. <And...@li...> - 2017-02-01 13:18:32
|
Hi all, I think on balance we should go with option 2, and add the following new terms to the CV: Term: “unmapped peptide” Def: Within the context of a proteogenomics approach, a peptide sequence that has not been mapped to a genomic location is_a: MS:1002636 ! proteogenomics attribute Term: “unmapped protein” Def: Within the context of a proteogenomics approach, a protein sequence that has not been mapped to a genomic location is_a: MS:1002636 ! proteogenomics attribute Please shout if you’d like to change these, otherwise we will update the specs (say by the end of the week) and the validator. Best wishes Andy From: mayerg97 [mailto:ger...@ru...] Sent: 01 February 2017 08:00 To: psi...@li... Subject: Re: [Psidev-pi-dev] FW: ProteoAnnotator_1_2.mzid Hi all, from the validation side, it makes not much difference, if option 1 or 2 is choosen, but option 2 is much clearer to read. Cheers, Gerhard Am 31.01.2017 um 16:20 schrieb Jones, Andy: Hi all, The validation of proteogenomics example files has revealed a minor flaw in the proteogenomics specs. We use a MUST rule that says every peptide and protein that is not a decoy must state its genomic location. This doesn’t make sense for two reasons: - Some peptides may have been identified but not mappable to a chromosome, due to the approach taken i.e. the protein database and gene models are not consistent for sensible reasons - As we have done, we have a merged result file from hits to Ensembl and Uniprot (Uniprot hits are not mapped). Solutions: 1. Relax the MUST rule, to say that CV terms should be added for all mapped peptides/proteins. a. Downside: hard to encode this logic in the validator 2. Introduce another CV term for “unmapped peptide” and “unmapped protein” to cater for this case explicitly. Option 2 seems more formally sensible, but makes more work for data exporters to add CV terms to every peptide/protein, even if they only intended to map one subset. If possible, can people give opinions fairly quickly. We now have a time pressure to get MCP paper resubmitted within around 2 weeks otherwise it is a new submission. Best wishes Andy From: mayerg97 [mailto:ger...@ru...] Sent: 26 January 2017 13:33 To: Jones, Andy <jo...@li...><mailto:jo...@li...>; Ghali, Fawaz <fg...@li...><mailto:fg...@li...> Subject: Re: ProteoAnnotator_1_2.mzid Hi Fawaz and Andy, e.g. <DBSequence searchDatabase_ref="SearchDB_1" accession="generic|B_GENSCAN00000004093|" id="dbseq_generic|B_GENSCAN00000004093|"></DBSequence> is referenced by <PeptideEvidence dBSequence_ref="dbseq_generic|B_GENSCAN00000004093|" peptide_ref="FAALDNEEEDK_" start="241" end="251" pre="K" post="E" isDecoy="false" id="FAALDNEEEDK_generic|B_GENSCAN00000004093|_241_251"></PeptideEvidence> but this PeptideEvidence is not a decoy and has also no genome mapping information defined, but the specification document defines in Figure 5 that the CV terms must be present on every PeptideEvidence, unless ifDecoy="true" Best wishes, Gerhard Am 26.01.2017 um 13:52 schrieb mayerg97: Hi Fawaz and Andy, it's because there are DBSequences contained, which have no genome mapping defined, e.g. <DBSequence searchDatabase_ref="SearchDB_1" accession="generic|B_GENSCAN00000004093|" id="dbseq_generic|B_GENSCAN00000004093|"></DBSequence> <DBSequence searchDatabase_ref="SearchDB_1" accession="generic|B_GENSCAN00000027223|" id="dbseq_generic|B_GENSCAN00000027223|"></DBSequence> <DBSequence searchDatabase_ref="SearchDB_1" accession="generic|B_GENSCAN00000009965|" id="dbseq_generic|B_GENSCAN00000009965|"></DBSequence> <DBSequence searchDatabase_ref="SearchDB_1" accession="generic|B_GENSCAN00000034417|" id="dbseq_generic|B_GENSCAN00000034417|"></DBSequence> If we want to allow here both sequences with and without genome mapping, we can change the ProteogenomicsDBSequence_must_rule into a SHOULD rule instead. Best wishes, Gerhard Am 26.01.2017 um 13:14 schrieb Jones, Andy: Hi Fawaz, I don't see anything wrong with this - Gerhard, do you have any ideas? Thanks Andy -----Original Message----- From: Ghali, Fawaz Sent: 26 January 2017 11:55 To: Jones, Andy <jo...@li...><mailto:jo...@li...> Subject: ProteoAnnotator_1_2.mzid Hi Andy, ProteoAnnotator_1_2.mzid has an error: Message 1: Rule ID: ProteogenomicsDBSequence_must_rule Level: ERROR Context(/cvParam/@accession ) in 380 locations --> None of the given CvTerms were found at '/MzIdentML/SequenceCollection/DBSequence/cvParam/@accession' because no values were found: - The sole term MS:1002637 (chromosome name) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name. - The sole term MS:1002638 (chromosome strand) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name. - The sole term MS:1002644 (genome reference version) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name. Example: <DBSequence searchDatabase_ref="SearchDB_1" accession="generic|A_ENSP00000395953|" id="dbseq_generic|A_ENSP00000395953|"> <cvParam cvRef="PSI-MS" accession="MS:1002637" name="chromosome name" value="11"></cvParam> <cvParam cvRef="PSI-MS" accession="MS:1002638" name="chromosome strand" value="+"></cvParam> <cvParam cvRef="PSI-MS" accession="MS:1002644" name="genome reference version" value="Homo_sapiens.GRCh38.77.gff3"></cvParam> </DBSequence> Why it's complaining about the name? Best wishes, Fawaz -- -------------------------------------------------------------------- Dipl. Inform. med., Dipl. Wirtsch. Inf. GERHARD MAYER PhD student Medizinisches Proteom-Center DEPARTMENT Medical Bioinformatics Building ZKF E.049a | Universitätsstraße 150 | D-44801 Bochum Fon +49 (0)234 32-21006 | Fax +49 (0)234 32-14554 E-mail ger...@ru...<mailto:ger...@ru...> www.medizinisches-proteom-center.de<http://www.medizinisches-proteom-center.de/> -- -------------------------------------------------------------------- Dipl. Inform. med., Dipl. Wirtsch. Inf. GERHARD MAYER PhD student Medizinisches Proteom-Center DEPARTMENT Medical Bioinformatics Building ZKF E.049a | Universitätsstraße 150 | D-44801 Bochum Fon +49 (0)234 32-21006 | Fax +49 (0)234 32-14554 E-mail ger...@ru...<mailto:ger...@ru...> www.medizinisches-proteom-center.de<http://www.medizinisches-proteom-center.de/> ________________________________ No virus found in this message. Checked by AVG - www.avg.com<http://www.avg.com/email-signature> Version: 2016.0.7998 / Virus Database: 4749/13832 - Release Date: 01/25/17 ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Psidev-pi-dev mailing list Psi...@li...<mailto:Psi...@li...> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- -------------------------------------------------------------------- Dipl. Inform. med., Dipl. Wirtsch. Inf. GERHARD MAYER PhD student Medizinisches Proteom-Center DEPARTMENT Medical Bioinformatics Building ZKF E.049a | Universitätsstraße 150 | D-44801 Bochum Fon +49 (0)234 32-21006 | Fax +49 (0)234 32-14554 E-mail ger...@ru...<mailto:ger...@ru...> www.medizinisches-proteom-center.de<http://www.medizinisches-proteom-center.de/> |