Re: [Treebase-devel] Columns in the taxon intelligence files

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Jan 29, 2010, at 1:29 PM, Vladimir Gapeyev wrote:

> Technically, I see how to do step 4 below, but I am not clear why it will work.  (This is clearly a tangent, in the direction of data quality, i guess.)  As I understood before, the natural key in the TAXONLABEL table is the pair (study_id, taxonlabel), and I thought the reason was that two studies, say 981 and 1207, might use the same label, say "Abelia", but mean different taxa by it.  In this case, TB2 should connect (981, "Abelia") and (1207, "Abelia") to different variant_ids and taxon_ids. This, however, does not mesh with your instruction to connect TAXONVARIANT and TAXONLABEL tables based on the value of TAXONLABEL.taxonlabel alone, relying on the fact that taxonlabel field is unique within the taxon_labels.tab file.  Under this arrangement, there is no way "Abelia"  from different studies can be connected to different taxonvariants!  I actually expected to see a study_id field in taxon_labels.tab.
> 
> I misunderstood something, but what?

Although we have taxon labels that *potentially* could point to two different TAXONVARIANT records, in no current cases do two instances of the name point to different ones.  So in terms of the migrated TreeBASE1 data, we are okay. But indeed, in future, we can expect to get homonyms -- e.g. an "Aotus" as the taxon label on a tree of monkeys and an "Aotus" as the taxon label on a tree of plants.

So in summary, TreeBASE1 did not allow homonyms in the entire database and did not allow homonyms within a study (hence only one record for each unique taxon label); TreeBASE2 allows homonyms in different studies but does not allow homonyms within the same study. 

bp