From: <cod...@go...> - 2009-06-23 10:45:01
|
Comment #81 on issue 42 by dcreasy: Issues with the CV http://code.google.com/p/psi-pi/issues/detail?id=42 We seem to have two different ways of specifying taxonomy: <SequenceCollection> <DBSequence id="DBSeq_HSP7D_MANSE" length="652" SearchDatabase_ref="SDB_SwissProt" accession="HSP7D_MANSE" > <seq>MAKAPAVGIDLGTTYSCVGVFQHGKVEIIANDQGNRTTPSYVAFTDTDRLIGDAAKNQVAMNP...</seq> <cvParam accession="MS:1001088" name="protein description" cvRef="PSI-MS" value="Heat shock 70 kDa protein cognate... - Manduca sexta ..." /> <cvParam accession="MS:1001469" name="taxonomy: scientific name" cvRef="PSI-MS" value="Manduca sexta"/> <cvParam accession="MS:1001467" name="taxonomy: NCBI TaxID" cvRef="PSI-MS" value="7130"/> </DBSequence> and <cv id="NCBI-TAXONOMY" fullName="NCBI-TAXONOMY" URI="ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz"></cv> . . . <DatabaseFilters> <Filter> <FilterType> <cvParam accession="MS:1001020" name="DB filter taxonomy" cvRef="PSI-MS" /> </FilterType> <Include> <cvParam accession="NCBI:33208" name="Metazoa" cvRef="NCBI-TAXONOMY" /> </Include> </Filter> </DatabaseFilters> Obviously we should be consistent and get rid of one of the methods. The other CV that could be used in the first example is: id: MS:1001470 name: taxonomy: Swiss-Prot ID id: MS:1001468 name: taxonomy: common name I suggest that we ditch the second method and allow: MS:1001467 - taxonomy: NCBI TaxID MS:1001468 - taxonomy: common name MS:1001469 - taxonomy: scientific name MS:1001470 - taxonomy: Swiss-Prot ID to be included in DatabaseFilters/Filter/Include For MS:1001468, MS:1001469, MS:1001470, I would like to see something like this added to the def: Recommend using MS:1001467 where possible For MS:1001467, the type should be an unsigned 32 bit integer -- You received this message because you are listed in the owner or CC fields of this issue, or because you starred this issue. You may adjust your issue notification preferences at: http://code.google.com/hosting/settings |