From: mayerg97 <ger...@ru...> - 2017-02-01 08:00:33
|
Hi all, from the validation side, it makes not much difference, if option 1 or 2 is choosen, but option 2 is much clearer to read. Cheers, Gerhard Am 31.01.2017 um 16:20 schrieb Jones, Andy: > > Hi all, > > The validation of proteogenomics example files has revealed a minor > flaw in the proteogenomics specs. We use a MUST rule that says every > peptide and protein that is not a decoy must state its genomic > location. This doesn’t make sense for two reasons: > > -Some peptides may have been identified but not mappable to a > chromosome, due to the approach taken i.e. the protein database and > gene models are not consistent for sensible reasons > > -As we have done, we have a merged result file from hits to Ensembl > and Uniprot (Uniprot hits are not mapped). > > Solutions: > > 1.Relax the MUST rule, to say that CV terms should be added for all > mapped peptides/proteins. > > a.Downside: hard to encode this logic in the validator > > 2.Introduce another CV term for “unmapped peptide” and “unmapped > protein” to cater for this case explicitly. > > Option 2 seems more formally sensible, but makes more work for data > exporters to add CV terms to every peptide/protein, even if they only > intended to map one subset. > > If possible, can people give opinions fairly quickly. We now have a > time pressure to get MCP paper resubmitted within around 2 weeks > otherwise it is a new submission. > > Best wishes > > Andy > > *From:*mayerg97 [mailto:ger...@ru...] > *Sent:* 26 January 2017 13:33 > *To:* Jones, Andy <jo...@li...>; Ghali, Fawaz > <fg...@li...> > *Subject:* Re: ProteoAnnotator_1_2.mzid > > Hi Fawaz and Andy, > > e.g. > > <DBSequence searchDatabase_ref="SearchDB_1" > accession="generic|B_GENSCAN00000004093|" > id="dbseq_generic|B_GENSCAN00000004093|"></DBSequence> > > is referenced by > > <PeptideEvidence > dBSequence_ref="dbseq_generic|B_GENSCAN00000004093|" > peptide_ref="FAALDNEEEDK_" start="241" end="251" pre="K" post="E" > isDecoy="false" > id="FAALDNEEEDK_generic|B_GENSCAN00000004093|_241_251"></PeptideEvidence> > > but this PeptideEvidence is not a decoy and has also no genome mapping > information defined, > but the specification document defines in Figure 5 that the CV terms > must be present on every PeptideEvidence, unless ifDecoy="true" > > Best wishes, > Gerhard > > Am 26.01.2017 um 13:52 schrieb mayerg97: > > Hi Fawaz and Andy, > > it's because there are DBSequences contained, which have no genome > mapping defined, e.g. > > <DBSequence searchDatabase_ref="SearchDB_1" > accession="generic|B_GENSCAN00000004093|" > id="dbseq_generic|B_GENSCAN00000004093|"></DBSequence> > <DBSequence searchDatabase_ref="SearchDB_1" > accession="generic|B_GENSCAN00000027223|" > id="dbseq_generic|B_GENSCAN00000027223|"></DBSequence> > <DBSequence searchDatabase_ref="SearchDB_1" > accession="generic|B_GENSCAN00000009965|" > id="dbseq_generic|B_GENSCAN00000009965|"></DBSequence> > <DBSequence searchDatabase_ref="SearchDB_1" > accession="generic|B_GENSCAN00000034417|" > id="dbseq_generic|B_GENSCAN00000034417|"></DBSequence> > > If we want to allow here both sequences with and without genome > mapping, we can change the > > ProteogenomicsDBSequence_must_rule > > into a SHOULD rule instead. > > Best wishes, > Gerhard > > Am 26.01.2017 um 13:14 schrieb Jones, Andy: > > Hi Fawaz, > > I don't see anything wrong with this - Gerhard, do you have any ideas? > > Thanks > > Andy > > -----Original Message----- > > From: Ghali, Fawaz > > Sent: 26 January 2017 11:55 > > To: Jones, Andy<jo...@li...> <mailto:jo...@li...> > > Subject: ProteoAnnotator_1_2.mzid > > Hi Andy, > > ProteoAnnotator_1_2.mzid has an error: > > Message 1: > > Rule ID: ProteogenomicsDBSequence_must_rule > > Level: ERROR > > Context(/cvParam/@accession ) in 380 locations > > --> None of the given CvTerms were found at '/MzIdentML/SequenceCollection/DBSequence/cvParam/@accession' because no values were found: > > - The sole term MS:1002637 (chromosome name) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name. > > - The sole term MS:1002638 (chromosome strand) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name. > > - The sole term MS:1002644 (genome reference version) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name. > > Example: > > <DBSequence searchDatabase_ref="SearchDB_1" accession="generic|A_ENSP00000395953|" id="dbseq_generic|A_ENSP00000395953|"> > > <cvParam cvRef="PSI-MS" accession="MS:1002637" name="chromosome name" value="11"></cvParam> > > <cvParam cvRef="PSI-MS" accession="MS:1002638" name="chromosome strand" value="+"></cvParam> > > <cvParam cvRef="PSI-MS" accession="MS:1002644" name="genome reference version" value="Homo_sapiens.GRCh38.77.gff3"></cvParam> > > </DBSequence> > > Why it's complaining about the name? > > Best wishes, > > Fawaz > > -- > > *--------------------------------------------------------------------* > > *Dipl. Inform. med., Dipl. Wirtsch. **Inf. GERHARD MAYER* > > *PhD student* > > *Medizinisches Proteom-Center* > > *DEPARTMENT Medical Bioinformatics* > > *Building *ZKF E.049a | Universitätsstraße 150 | D-44801 Bochum > > *Fon *+49 (0)234 32-21006 | *Fax *+49 (0)234 32-14554 > > *E-mail *ger...@ru... <mailto:ger...@ru...> > > www.medizinisches-proteom-center.de > <http://www.medizinisches-proteom-center.de/> > > -- > > *--------------------------------------------------------------------* > > *Dipl. Inform. med., Dipl. Wirtsch. **Inf. GERHARD MAYER* > > *PhD student* > > *Medizinisches Proteom-Center* > > *DEPARTMENT Medical Bioinformatics* > > *Building *ZKF E.049a | Universitätsstraße 150 | D-44801 Bochum > > *Fon *+49 (0)234 32-21006 | *Fax *+49 (0)234 32-14554 > > *E-mail *ger...@ru... <mailto:ger...@ru...> > > www.medizinisches-proteom-center.de > <http://www.medizinisches-proteom-center.de/> > > ------------------------------------------------------------------------ > > No virus found in this message. > Checked by AVG - www.avg.com <http://www.avg.com/email-signature> > Version: 2016.0.7998 / Virus Database: 4749/13832 - Release Date: 01/25/17 > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > > > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- *--------------------------------------------------------------------* *Dipl. Inform. med., Dipl. Wirtsch. **Inf. GERHARD MAYER* *PhD student* *Medizinisches Proteom-Center* *DEPARTMENT Medical Bioinformatics* *Building *ZKF E.049a | Universitätsstraße 150 | D-44801 Bochum *Fon *+49 (0)234 32-21006 | *Fax *+49 (0)234 32-14554 *E-mail***ger...@ru... <mailto:ger...@ru...> www.medizinisches-proteom-center.de <http://www.medizinisches-proteom-center.de/> |