From: William P. <wil...@ya...> - 2010-01-29 04:27:08
|
On Jan 28, 2010, at 6:19 PM, Vladimir Gapeyev wrote: > Here is one important piece of info I could not find: What is the > meaning of each column in the taxon intelligence files > (taxon_labels.tab, taxon_variants.tab, taxa.tab)? taxon: taxon_id (PK) namebankid namestring ncbi_taxid taxon_variants: taxon_variant_id (PK) taxon_id (FK) namebankid namestring fullnamestring lexicalqualifier taxon_labels: taxon_label_id (PK) taxon_variant_id (FK) legacy_id (from TreeBASE1) taxon_label (from TreeBASE1 -- useful for mapping output from headless Mesquite to TV table) I thing the process should go something like this: 1. In the taxonlabel table, update all values in the taxonvariant_id column to NULL 2. Delete all records in the taxonvariant and taxon tables. 3. The first two files can be sucked straight into the taxa and taxonvariant tables (the mapping among columns should be fairly obvious -- let me know if you have questions) 4. For mapping the taxonlabel table with the taxonvariant table, I'd suggest that you upload the third file into a temporary table, and then write an update statement to do the matching: use the last column of the taxon_labels file to match against the taxonlabel column in the taxonlabel table. Wherever there is a match, put the value of the second column in the taxon_labels file in the taxonvariant_id column of the taxonlabel table. Recall that in the third file, each value in the last column (the "taxon_label") is unique; by contrast, the taxonlabel column in the taxonlabel table are not unique. 5. Drop the temporary table. bp |