Re: [Treebase-devel] Columns in the taxon intelligence files

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Jan 28, 2010, at 6:19 PM, Vladimir Gapeyev wrote:

> Here is one important piece of info I could not find: What is the  
> meaning of each column in the taxon intelligence files  
> (taxon_labels.tab, taxon_variants.tab, taxa.tab)?

taxon:
	taxon_id (PK)
	namebankid
	namestring
	ncbi_taxid

taxon_variants:
	taxon_variant_id (PK)
	taxon_id (FK)
	namebankid
	namestring
	fullnamestring
	lexicalqualifier

taxon_labels:
	taxon_label_id (PK)
	taxon_variant_id (FK)
	legacy_id (from TreeBASE1)
	taxon_label (from TreeBASE1 -- useful for mapping output from headless Mesquite to TV table)

I thing the process should go something like this:

1. In the taxonlabel table, update all values in the taxonvariant_id column to NULL
2. Delete all records in the taxonvariant and taxon tables. 
3. The first two files can be sucked straight into the taxa and taxonvariant tables (the mapping among columns should be fairly obvious -- let me know if you have questions)
4. For mapping the taxonlabel table with the taxonvariant table, I'd suggest that you upload the third file into a temporary table, and then write an update statement to do the matching: use the last column of the taxon_labels file to match against the taxonlabel column in the taxonlabel table. Wherever there is a match, put the value of the second column in the taxon_labels file in the taxonvariant_id column of the taxonlabel table. Recall that in the third file, each value in the last column (the "taxon_label") is unique; by contrast, the taxonlabel column in the taxonlabel table are not unique. 
5. Drop the temporary table. 

bp