From: Vladimir G. <vla...@du...> - 2010-01-29 18:29:20
|
Bill, thanks for the info! I did intend import this data in a way similar to what you suggest. Technically, I see how to do step 4 below, but I am not clear why it will work. (This is clearly a tangent, in the direction of data quality, i guess.) As I understood before, the natural key in the TAXONLABEL table is the pair (study_id, taxonlabel), and I thought the reason was that two studies, say 981 and 1207, might use the same label, say "Abelia", but mean different taxa by it. In this case, TB2 should connect (981, "Abelia") and (1207, "Abelia") to different variant_ids and taxon_ids. This, however, does not mesh with your instruction to connect TAXONVARIANT and TAXONLABEL tables based on the value of TAXONLABEL.taxonlabel alone, relying on the fact that taxonlabel field is unique within the taxon_labels.tab file. Under this arrangement, there is no way "Abelia" from different studies can be connected to different taxonvariants! I actually expected to see a study_id field in taxon_labels.tab. I misunderstood something, but what? --VG On Jan 28, 2010, at 11:27 PM, William Piel wrote: > > On Jan 28, 2010, at 6:19 PM, Vladimir Gapeyev wrote: > >> Here is one important piece of info I could not find: What is the >> meaning of each column in the taxon intelligence files >> (taxon_labels.tab, taxon_variants.tab, taxa.tab)? > > > taxon: > taxon_id (PK) > namebankid > namestring > ncbi_taxid > > taxon_variants: > taxon_variant_id (PK) > taxon_id (FK) > namebankid > namestring > fullnamestring > lexicalqualifier > > taxon_labels: > taxon_label_id (PK) > taxon_variant_id (FK) > legacy_id (from TreeBASE1) > taxon_label (from TreeBASE1 -- useful for mapping output from > headless Mesquite to TV table) > > I thing the process should go something like this: > > 1. In the taxonlabel table, update all values in the taxonvariant_id > column to NULL > 2. Delete all records in the taxonvariant and taxon tables. > 3. The first two files can be sucked straight into the taxa and > taxonvariant tables (the mapping among columns should be fairly > obvious -- let me know if you have questions) > 4. For mapping the taxonlabel table with the taxonvariant table, I'd > suggest that you upload the third file into a temporary table, and > then write an update statement to do the matching: use the last > column of the taxon_labels file to match against the taxonlabel > column in the taxonlabel table. Wherever there is a match, put the > value of the second column in the taxon_labels file in the > taxonvariant_id column of the taxonlabel table. Recall that in the > third file, each value in the last column (the "taxon_label") is > unique; by contrast, the taxonlabel column in the taxonlabel table > are not unique. > 5. Drop the temporary table. > > bp > > > > > > > > > ------------------------------------------------------------------------------ > The Planet: dedicated and managed hosting, cloud storage, colocation > Stay online with enterprise data centers and the best network in the > business > Choose flexible plans and management services without long-term > contracts > Personal 24x7 support from experience hosting pros just a phone call > away. > http://p.sf.net/sfu/theplanet-com > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel |