From: <vga...@us...> - 2010-03-09 20:45:40
|
Revision: 533 http://treebase.svn.sourceforge.net/treebase/?rev=533&view=rev Author: vgapeyev Date: 2010-03-09 20:45:33 +0000 (Tue, 09 Mar 2010) Log Message: ----------- Scripts for loading data from TB1 dumps Added Paths: ----------- trunk/treebase-core/db/tb1load/ trunk/treebase-core/db/tb1load/README.txt trunk/treebase-core/db/tb1load/apple/ trunk/treebase-core/db/tb1load/data/ trunk/treebase-core/db/tb1load/doc-mjd.txt trunk/treebase-core/db/tb1load/mesquite/ trunk/treebase-core/db/tb1load/scripts/ trunk/treebase-core/db/tb1load/scripts/all_steps.sh trunk/treebase-core/db/tb1load/scripts/clean.sh trunk/treebase-core/db/tb1load/scripts/common.sh trunk/treebase-core/db/tb1load/scripts/copy_tb2code.sh trunk/treebase-core/db/tb1load/scripts/dbinfo.sh trunk/treebase-core/db/tb1load/scripts/deletematrix.sql trunk/treebase-core/db/tb1load/scripts/step1a_load_matrices.sh trunk/treebase-core/db/tb1load/scripts/step1b_load_trees.sh trunk/treebase-core/db/tb1load/scripts/step1c_fix_matrices.sh trunk/treebase-core/db/tb1load/scripts/step1d_fix_trees.sh trunk/treebase-core/db/tb1load/scripts/step1e_load_dump.sh trunk/treebase-core/db/tb1load/scripts/step2_taxon_intell.sh trunk/treebase-core/db/tb1load/scripts/step2_taxon_intell.sql trunk/treebase-core/db/tb1load/scripts/step3_load_citations.sh trunk/treebase-core/db/tb1load/tb2classes/ trunk/treebase-core/db/tb1load/tb2jars/ Added: trunk/treebase-core/db/tb1load/README.txt =================================================================== --- trunk/treebase-core/db/tb1load/README.txt (rev 0) +++ trunk/treebase-core/db/tb1load/README.txt 2010-03-09 20:45:33 UTC (rev 533) @@ -0,0 +1,151 @@ +Instructions and scripts for performing data loading from TB1 dumps to a TB2 postrges instance. +[Put together by Vladimir Gapeyev in Feb-Mar 2010, based on Java tools developed Mark-Jason Dominus (see doc-mjd.txt)] + +The are two stages to the process: (1) setting up working environment and (2) running the loading scripts. The goal of the first stage is to extract a subset of the TB2 development environment sufficient to run loading on the server -- this circumvents the more complex task of re-creating a full-fledged development environment on the server. + + +Contents of this directory +========================== + + tb2jars/ - placeholder for dependent JARs of Treebase2 + tb2classes/ - placeholder for Treebase2 classes + apple/ - placeholder for MacOSX-specific JARs + mesquite/ - placeholder for Mesquite code + data/ - placeholder for the data files to be migrated + + scripts/ - shell and SQL scripts to run for loading; + the scripts invoke code from tb2classes, which + relies on jars in tb2jars, mesquite, and apple. + + +Stage 1: Set up working environment +=================================== + +You must have checked out from SVN and configured a working TB2 build environment (for the treebase, treebase-core, and treebase-web projects). This directory (treebase-core/db/tb1load) is a part of this environment and some of the following instructions rely on its relative location. + +Bring in necessary Treebase2 code +--------------------------------- + + cd scripts; ./copy_tb2code.sh + +Assuming Treebase has been built, this script + copies JARS from treebase-web/target/treebase-web/WEB-INF/lib/ + to tb2jars/ + copies contents of treebase-core/target/classes/ + to tb2classes/ + +Bring in Mesquite code +---------------------- + + copy contents of your Mesquite installation + to mesquite/ + +- If you already have a working TB2 instance running under tomcat, it should point to a Mesquite installation from its config file $CATALINA_HOME/conf/Catalina/localhost/treebase-web.xml. Copy the contents of that directory into mesquite/. + +- Alternatively, if you have successfully built TB2 with Maven, you will have Mesquite in ~/.m2/repository/mesquite/mesquite/2.01.tb/mesquite-2.01.tb.jar. +Unzip this archive and place its *contents* into mesquite/. + +Bring in Apple JARs +------------------- + +Only if the run location will not be a Mac OSX unix: + JARs MRJToolkit.jar and ui.jar should be placed in tb2apple/ + +[These are Apple-specific JARs, apparently loaded by Mesquite. Since Mesquite runs in "headless" mode, code from these JARs probably is not executed, so they are harmless.] + +On a Mac OSX machine, these could be at: + +/System/Library/Java/Extensions/MRJToolkit.jar /System/Library/Frameworks/JavaVM.framework/Versions/1.5.0/Classes/ui.jar + +Relocate to the run location +---------------------------- + +* Copy the whole tb1load/ directory (the current directory) to the run location, i.e. the location from which loading will be performed. + + *** The rest of these instructions are for the run location. *** + +[Tip: The run location should probably be at the server machine that runs the postgresl instance, to avoid sending large amounts of data over the network.] + + +Specify DB credentials and the Mesquite location +------------------------------------------------ + +* Edit tb2classes/jdbc.properties file, providing appropriate values for the properties: + jdbc.url + jdbc.username + jdbc.password + mesquite.folder_dir + +If the file does not exist, create it from tb2classes/jdbc.properties.example. + +mesquite.folder_dir should contain a path to the mesquite/ directory populated above (a relative path might work). + +* Edit scripts/dbinfo.sh to provide the same DB credentials, to be used in Step 2. Note: The user that runs this step must have full DDL rights, as it creates and removes several tables, constraints and functions. + + +Stage 2: Prepare the data +======================== + +Download the data to be migrated and place it in the data/ directory, under the exact file names as listed below. The scripts expect these locations and file names. + + data/ + characters/ - contains Nexus matrix files ***.nex + trees/ - contains Nexus tree files ***.tre + dump.txt - metadata about studies + TI/ - taxon intelligence, 3 tab-separated files + taxa.tab + taxon_variants.tab + taxon_labels.tab + citations.txt - tab-separated EndNote file + +[Note: If you have several data sets, e.g. a testing set or several separate delta data sets, it could make sense to keep them in separate directories (with this structure) and link or move them to data/ when it is time to run the scripts.] + + +Stage : Run the scripts +======================== + +There is a script, all_steps.sh that performs full loading. + + cd scripts; ./all_steps.sh + +However, each loading step can be run separately. Consider doing this, as well as intermittent backups, if failures at each step are likely. + +On sizable data, the scripts can take hours (especially Steps 1a,b). Consider running them via cron, by putting in crontab something like +35 11 * * * (cd ..../tb1load/scripts; ./step3_load_citations.sh ) &> migr_log.log + + +Steps 1a,b: load matrices and trees +----------------------------------- + ./step1a_load_matrices.sh + ./step1b_load_trees.sh + +These load matrices and trees into a study with the name "UPLOAD", creating one if needed. If you already have a "UPLOAD" study and do not want it to be affected, rename it first. + +Code for Step1a appears to contain memory leak and may run out oh heap space midway. The matrix being uploaded at that time will be left inconsistent. It should be scraped out of the DB (use deletematrix.sql) and reloaded in a separate run of Step1a. + + +Step 1c: fix counts in the uploaded data +---------------------------------------- + ./step1c_fix_matrices.sh + ./step1d_fix_trees.sh + + +Step 1e: load the dump containing metadata for studies +------------------------------------------------------ + ./step1e_load_dump.sh + + +Step 2: load taxon intelligence +------------------------------- + ./step2_taxon_intell.sh +or + psql -f step2_taxon_intell.sql -h dbhost -U dbuser dbname + +This one requies the DB schema to be at least at Patch 0005. + + +Step 3: load citations +---------------------- + ./step3_load_citations.sh + +This code does not work for "Book Section" enties. Comment them out and enter manually later. Added: trunk/treebase-core/db/tb1load/doc-mjd.txt =================================================================== --- trunk/treebase-core/db/tb1load/doc-mjd.txt (rev 0) +++ trunk/treebase-core/db/tb1load/doc-mjd.txt 2010-03-09 20:45:33 UTC (rev 533) @@ -0,0 +1,188 @@ +Most scripts in scripts/* invoke Java tools developed Mark-Jason Dominus. +These tools are in treebase-core, in the package org.cipres.treebase.util. +They are described in a message from Mark posted at +http://sourceforge.net/apps/mediawiki/treebase/index.php?title=DataDumps#Migrating_TB1_-.3E_TB2_data +and copied below. + +Understanding of this text is not necessary in order to run scripts/*, but may help if you need to work on the underlying Java tools. + + +---------------------------------------------------------------------- + += Migrating TB1 -> TB2 data = + +1. Bill supplied the following data: + + a. Nexus files, each containing either one tree or one matrix. + + i. Matrix files are in ~/treebase-data/matrix/ + ii. Tree files are in ~/treebase-data/tree/ + + b. "Dump.txt" file, describing the structure of studies, analyses, and + including author and citation information. + + i. Most recent version in ~/treebase-data/Dump.txt + ii. Older versions in ~/treebase-data/Dump-YYYYMMDD.txt + + c. Taxon data, as three tab-separated files in ~/treebase-data/taxon/: + + i. T.tax, the taxon data + ii. TV.tab, the taxon variant data + iii. TL.tab, the taxon label data + + Bill had a document explaining the format of these files, but I + don't know where my copy is; get it from him if you need it. The + format is also documented in the util/LoadTaxonData.java file. + +2. My importing programs are in three places: + + a. Nexus file importer and miscellaneous utilities are in: + + treebase-core/src/main/java/org/cipres/treebase/util + + henceforward called "~util". + + b. Dump.txt loader: + + treebase-core/src/main/java/org/cipres/treebase/auxdata + + henceforward called "~auxdata". + + c. Perl utilities: + + /home/mjd/t/treebase-core/src/main/perl + + henceforward called "~perl". There are a number of useful + utilities in ~perl/bin that you should take a look at. + +3. Running standalone programs + + It was not clear to me that there was an easy way to run a + standalone command-line-type utility in the Spring/Hibernate + environment without binding it to a web page. I built the + Standalone interface to address this. + + To run one of these programs, say BulkUpload, do this: + + cd treebase-core/target/classes + jrun -D M2_REPO=path/to/m2_repository org.cipress.treebase.util.BulkUpload <args...> + + + I use $HOME/.m2/repository for M2_REPO. + + The purpose of "jrun" is to assemble the correct classpath (by + examining the .classpath files written out by Eclipse) and execute + the Java interpreter. The complete classpath is really big, and it + was helpful to have this utility to figure it out. + + "jrun" is in ~perl/jrun. It has some other command-line options; + see the source code. + +3. To import data: + + a. Load in the nexus files with ~util/BulkUpload. BulkUpload gets + a list of file or directory names as arguments. If given a + directory name, it processes all the files contained in the + directory. + + b. The code Jin wrote to import nexus files forgets to set the row + and column count on matrices, and the node count on trees. So + now run ~util/SetMatrixNChar and ~util/SetTreeNChar to adjust + these correctly. These get no arguments. + + c. Process the Dump.txt file. The utility for this in + util/AuxiliaryDataImporter. To invoke: + + jrun -D M2_REPO=... org.cipres.treebase.util.AuiliaryDataImporter [flags...] Dump.txt + + If you omit "Dump.txt", it will try to read from some hardwired + default path. + + The program will read and parse the Dump.txt file, which + contains a series of sections, one for each study. The program + will construct a representation of each study, and perform an + action for each one. + + The default action, defined by CommitStudy.java, is to create a + study and its associated analyses, link the appropriate Matrix + and Tree objects, and generally try to build the study as + defined in the Dump.txt file. If a section calls for it to + create a study with a TB1 legacy ID number that is already in + the database, it will skip that section. + + Command-line flags: + + -a ActionClass. The ActionClass must implement interface + CompleteStudyAction. The default is CommitStudy. + + -l logFileName. Write diagnostics to specified log file instead + of standard error. + + -n. Say what would be done, but don't actually do anything. + + -t. Set "testing mode" flag. Interpretation of this is up to + the specific action class. For the default, it probebly tells + the action to set the "TestMode" flag in Study and Submission + objects that it creates. + + -s ID,ID,ID... Process only those studies with the specified + legacy ID numbers. (Default: Process all studies.) + +4. Check results + + a. Look for missing objects. + + util/WhatsMissing checks the contents of the database against a + census file that lists the objects that should be there + post-migration. It uses the Java resrouce path to find the + census files. The most recent census files I have are in + treebase-core/src/main/resources/{matrices,studies,trees}.lst. + + Command-line options: + + -s : Check for missing studie. + -m : Check for missing matrices. + -t : Check for missing trees. + + If none of -s, -m, or -t is supplied, the program checks all three. + + -Q : Quiet mode. + + b. Check database consistency. + + perl/bin/check is a program that understands a lot of the + proper structure of TB2 objects and does a variety of + consistency checks on the database. To run: + + check ObjectType ID# + + Where ObjectType is something like "Study" or "Matrix". Use + "check -X" to get a list of legal values for this argument. + + To extend the checker to understand more object types, add them + to ~perl/lib/CIPRES/TreeBase/TreeBaseObjects.pm. Email me if + you need to do this; I will be able to explain what is needed. + + I suggest that you do something like this: + + for id in `sel study_id from study`; do + echo $id + check Study $id > S$id.out 2> S$id.err + done + + And then go away for a while. Any consistency failures will be + reported on a line with a leading '***'. The program in + ~perl/bin/digester may be useful in going over the consistency + checker's reports: It aggregates similar error messages into + groups, and creates a file for each group of messages. So after + running 'check' as above, you may + + grep '\*\*\*' S*.err | digester + + to get an overview of what sorts of failures occurred, and how + many times each. Use '-d directoryName' to have it put the + digested reports into some subdirectory. + + + +As always, please email me if you have any questions. Added: trunk/treebase-core/db/tb1load/scripts/all_steps.sh =================================================================== --- trunk/treebase-core/db/tb1load/scripts/all_steps.sh (rev 0) +++ trunk/treebase-core/db/tb1load/scripts/all_steps.sh 2010-03-09 20:45:33 UTC (rev 533) @@ -0,0 +1,18 @@ +#!/bin/bash -e +# -e Exit immediately if a command exits with a non-zero status. + + +./step1a_load_matrices.sh + +./step1b_load_trees.sh + +./step1c_fix_matrices.sh + +./step1d_fix_trees.sh + +./step1e_load_dump.sh + +./step2_taxon_intell.sh + +./step3_load_citations.sh + Property changes on: trunk/treebase-core/db/tb1load/scripts/all_steps.sh ___________________________________________________________________ Added: svn:executable + * Added: trunk/treebase-core/db/tb1load/scripts/clean.sh =================================================================== --- trunk/treebase-core/db/tb1load/scripts/clean.sh (rev 0) +++ trunk/treebase-core/db/tb1load/scripts/clean.sh 2010-03-09 20:45:33 UTC (rev 533) @@ -0,0 +1,7 @@ +#!/bin/bash + +rm -rf ../mesquite/* +rm -rf ../apple/* +rm -rf ../tb2classes/* +rm -rf ../tb2jars/* +rm -rf ../data/* Property changes on: trunk/treebase-core/db/tb1load/scripts/clean.sh ___________________________________________________________________ Added: svn:executable + * Added: trunk/treebase-core/db/tb1load/scripts/common.sh =================================================================== --- trunk/treebase-core/db/tb1load/scripts/common.sh (rev 0) +++ trunk/treebase-core/db/tb1load/scripts/common.sh 2010-03-09 20:45:33 UTC (rev 533) @@ -0,0 +1,13 @@ + +TB2JARS=../tb2jars +TB2CLASSES=../tb2classes +APPLE=../apple +DATA=../data + +CP=$TB2JARS/spring-2.0.7.jar:$TB2JARS/hibernate-3.2.6.ga.jar:$TB2JARS/log4j-1.2.13.jar:$TB2JARS/commons-logging-1.1.jar:$TB2JARS/c3p0-0.9.1.2.jar:$TB2JARS/dom4j-1.6.1.jar:$TB2JARS/hibernate-annotations-3.3.1.GA.jar:$TB2JARS/jta-1.1.jar:$TB2JARS/hibernate-commons-annotations-3.0.0.ga.jar:$TB2JARS/acegi-security-1.0.1.jar:$TB2JARS/ejb3-persistence-1.0.1.GA.jar:$TB2JARS/commons-collections-3.1.jar:$TB2JARS/jdom-1.1.jar:$TB2JARS/postgresql-8.3-603.jdbc3.jar:$TB2JARS/ehcache-1.2.3.jar:$TB2JARS/cglib-nodep-2.1_3.jar:$TB2JARS/mesquite-2.01.tb.jar:$TB2JARS/antlr-2.7.2.jar:$TB2JARS/tolbaseclasses-mesquite.jar:$TB2JARS/nexml-1.5-SNAPSHOT.jar + +CP=$CP:$APPLE/MRJToolkit.jar:$APPLE/ui.jar + +CP=$CP:$TB2CLASSES + +JAVA="java -Xmx2048M -cp $CP" Property changes on: trunk/treebase-core/db/tb1load/scripts/common.sh ___________________________________________________________________ Added: svn:executable + * Added: trunk/treebase-core/db/tb1load/scripts/copy_tb2code.sh =================================================================== --- trunk/treebase-core/db/tb1load/scripts/copy_tb2code.sh (rev 0) +++ trunk/treebase-core/db/tb1load/scripts/copy_tb2code.sh 2010-03-09 20:45:33 UTC (rev 533) @@ -0,0 +1,12 @@ +#!/bin/bash + +# must be executed from data/scripts + +CMD="cp -R ../../../../treebase-web/target/treebase-web/WEB-INF/lib/ ../tb2jars/" +echo $CMD +$CMD + +CMD="cp -R ../../../../treebase-core/target/classes/ ../tb2classes/" +echo $CMD +$CMD + Property changes on: trunk/treebase-core/db/tb1load/scripts/copy_tb2code.sh ___________________________________________________________________ Added: svn:executable + * Added: trunk/treebase-core/db/tb1load/scripts/dbinfo.sh =================================================================== --- trunk/treebase-core/db/tb1load/scripts/dbinfo.sh (rev 0) +++ trunk/treebase-core/db/tb1load/scripts/dbinfo.sh 2010-03-09 20:45:33 UTC (rev 533) @@ -0,0 +1,3 @@ +DBHOST=localhost +DBNAME=xxxx +DBUSER=yyyy Property changes on: trunk/treebase-core/db/tb1load/scripts/dbinfo.sh ___________________________________________________________________ Added: svn:executable + * Added: trunk/treebase-core/db/tb1load/scripts/deletematrix.sql =================================================================== --- trunk/treebase-core/db/tb1load/scripts/deletematrix.sql (rev 0) +++ trunk/treebase-core/db/tb1load/scripts/deletematrix.sql 2010-03-09 20:45:33 UTC (rev 533) @@ -0,0 +1,44 @@ +-- Use this script to delete a matrix with id NNNN. +-- (Useful for matrices in which Step1a broke.) +-- Replace NNNN by your matrix_id + +begin transaction; + +delete from codonnoncoding_colrange r + using codonpositionset p, matrix m + where r.codonpositionset_id = p.codonpositionset_id + and p.matrix_id = m.matrix_id + and m.matrix_id = NNNN +returning *; + +update matrix + set codonpositionset_id = null +where matrix_id = NNNN +returning *; + +delete from codonpositionset where matrix_id = NNNN +returning *; + +delete from sub_matrix where matrix_id = NNNN +returning *; + +delete from matrixelement e +using matrixrow r, matrixcolumn c +where e.matrixrow_id = r.matrixrow_id and r.matrix_id = NNNN + and e.matrixcolumn_id = c.matrixcolumn_id and c.matrix_id = NNNN +returning *; + + +delete from matrixcolumn where matrix_id = NNNN +returning *; + +delete from matrixrow where matrix_id = NNNN +returning *; + + +delete from matrix where matrix_id = NNNN +returning *; + + +commit transaction; + Added: trunk/treebase-core/db/tb1load/scripts/step1a_load_matrices.sh =================================================================== --- trunk/treebase-core/db/tb1load/scripts/step1a_load_matrices.sh (rev 0) +++ trunk/treebase-core/db/tb1load/scripts/step1a_load_matrices.sh 2010-03-09 20:45:33 UTC (rev 533) @@ -0,0 +1,13 @@ +#!/bin/bash -e + +source common.sh + +MATRICES=$DATA/characters + +echo "######################################################################" +echo "############ Step 1a: load character matrices from $MATRICES #########" +echo "######################################################################" + +CMD="$JAVA org.cipres.treebase.util.BulkUpload $MATRICES" +echo $CMD +$CMD Property changes on: trunk/treebase-core/db/tb1load/scripts/step1a_load_matrices.sh ___________________________________________________________________ Added: svn:executable + * Added: trunk/treebase-core/db/tb1load/scripts/step1b_load_trees.sh =================================================================== --- trunk/treebase-core/db/tb1load/scripts/step1b_load_trees.sh (rev 0) +++ trunk/treebase-core/db/tb1load/scripts/step1b_load_trees.sh 2010-03-09 20:45:33 UTC (rev 533) @@ -0,0 +1,13 @@ +#!/bin/bash -e + +source common.sh + +TREES=$DATA/trees + +echo "######################################################################" +echo "############ Step 1b: load trees from $TREES ############" +echo "######################################################################" + +CMD="$JAVA org.cipres.treebase.util.BulkUpload $TREES" +echo $CMD +$CMD Property changes on: trunk/treebase-core/db/tb1load/scripts/step1b_load_trees.sh ___________________________________________________________________ Added: svn:executable + * Added: trunk/treebase-core/db/tb1load/scripts/step1c_fix_matrices.sh =================================================================== --- trunk/treebase-core/db/tb1load/scripts/step1c_fix_matrices.sh (rev 0) +++ trunk/treebase-core/db/tb1load/scripts/step1c_fix_matrices.sh 2010-03-09 20:45:33 UTC (rev 533) @@ -0,0 +1,12 @@ +#!/bin/bash -e + +source common.sh + +echo "######################################################################" +echo "############ Step 1c: fix counts in matrices ############" +echo "######################################################################" + +CMD="$JAVA org.cipres.treebase.util.SetMatrixNChar" +echo $CMD +$CMD + Property changes on: trunk/treebase-core/db/tb1load/scripts/step1c_fix_matrices.sh ___________________________________________________________________ Added: svn:executable + * Added: trunk/treebase-core/db/tb1load/scripts/step1d_fix_trees.sh =================================================================== --- trunk/treebase-core/db/tb1load/scripts/step1d_fix_trees.sh (rev 0) +++ trunk/treebase-core/db/tb1load/scripts/step1d_fix_trees.sh 2010-03-09 20:45:33 UTC (rev 533) @@ -0,0 +1,11 @@ +#!/bin/bash -e + +source common.sh + +echo "######################################################################" +echo "############ Step 1c: fix counts in trees ############" +echo "######################################################################" + +CMD="$JAVA org.cipres.treebase.util.SetTreeNChar" +echo $CMD +$CMD Property changes on: trunk/treebase-core/db/tb1load/scripts/step1d_fix_trees.sh ___________________________________________________________________ Added: svn:executable + * Added: trunk/treebase-core/db/tb1load/scripts/step1e_load_dump.sh =================================================================== --- trunk/treebase-core/db/tb1load/scripts/step1e_load_dump.sh (rev 0) +++ trunk/treebase-core/db/tb1load/scripts/step1e_load_dump.sh 2010-03-09 20:45:33 UTC (rev 533) @@ -0,0 +1,11 @@ +#!/bin/bash -e + +source common.sh + +echo "######################################################################" +echo "############ Step 1d: Loading Study info from the dump ############" +echo "######################################################################" + +CMD="$JAVA org.cipres.treebase.util.AuxiliaryDataImporter $DATA/dump.txt" +echo $CMD +$CMD Property changes on: trunk/treebase-core/db/tb1load/scripts/step1e_load_dump.sh ___________________________________________________________________ Added: svn:executable + * Added: trunk/treebase-core/db/tb1load/scripts/step2_taxon_intell.sh =================================================================== --- trunk/treebase-core/db/tb1load/scripts/step2_taxon_intell.sh (rev 0) +++ trunk/treebase-core/db/tb1load/scripts/step2_taxon_intell.sh 2010-03-09 20:45:33 UTC (rev 533) @@ -0,0 +1,11 @@ +#!/bin/bash -e + +source common.sh +source dbinfo.sh + +echo "######################################################################" +echo "############ Step 2: load taxon intelligence ############" +echo "######################################################################" + +psql -f step2_taxon_intell.sql -h $DBHOST -U $DBUSER $DBNAME + Property changes on: trunk/treebase-core/db/tb1load/scripts/step2_taxon_intell.sh ___________________________________________________________________ Added: svn:executable + * Added: trunk/treebase-core/db/tb1load/scripts/step2_taxon_intell.sql =================================================================== --- trunk/treebase-core/db/tb1load/scripts/step2_taxon_intell.sql (rev 0) +++ trunk/treebase-core/db/tb1load/scripts/step2_taxon_intell.sql 2010-03-09 20:45:33 UTC (rev 533) @@ -0,0 +1,117 @@ +-- Upload Taxon Intelligence dumps into DB. +-- Input data comes from several locations hard-wired into the 3 \copy commands. + +-- This script uses DDL to create temporary tables and functions, but it takes care +-- to clean after itself, so there is no net effect on the schema. + +BEGIN TRANSACTION; + +---------------------------------------------------------------------- +\echo 'Clean old TI data' +update taxonlabel + set taxonvariant_id = NULL; + +-- Drop the constraints temporarily. +-- Their presence slows down immencely deletions from TaxonVariant Taxon. + +alter table taxonlabel drop constraint taxonlabel_fkto_taxonvariant; + +delete from taxonvariant; + +alter table taxonlabel + add constraint taxonlabel_fkto_taxonvariant + FOREIGN KEY (taxonvariant_id) REFERENCES taxonvariant (taxonvariant_id); + + +alter table taxonset_taxon drop constraint taxonset_taxon_fkto_taxon; +alter table taxonlink drop constraint taxonlink_fkto_taxon; +alter table taxonvariant drop constraint taxonvariant_fkto_taxon; + +delete from taxon; + +alter table taxonset_taxon + add constraint taxonset_taxon_fkto_taxon + FOREIGN KEY (taxon_id) REFERENCES taxon (taxon_id); +alter table taxonlink + add constraint taxonlink_fkto_taxon + FOREIGN KEY (taxon_id) REFERENCES taxon (taxon_id); +alter table taxonvariant + add constraint taxonvariant_fkto_taxon + FOREIGN KEY (taxon_id) REFERENCES taxon (taxon_id); + + +---------------------------------------------------------------------- +\echo 'Upload Taxa and Taxon Variants into their tables' + +\copy taxon (taxon_id, ubionamebankid, name, ncbitaxid) from '../data/TI/taxa.tab' + +\copy taxonvariant (taxonvariant_id, taxon_id, namebankid, name, fullname, lexicalqualifier) from '../data/TI/taxon_variants.tab' + +update taxon set + version = 1, + tb1legacyid = taxon_id; +update taxonvariant set + version = 1, + tb1legacyid = taxonvariant_id; + + +---------------------------------------------------------------------- +\echo 'Upload TaxonLabels into a temporary table tb1taxonlabel' +CREATE TABLE tb1taxonlabel +( + tb1taxonlabel_id bigint NOT NULL, + taxonvariant_id bigint, + legacy_id character varying(20), + taxonlabel character varying(255), + CONSTRAINT tb1taxonlabel_pkey PRIMARY KEY (tb1taxonlabel_id), + CONSTRAINT tb1taxonlabel_fkto_taxonvariant FOREIGN KEY (taxonvariant_id) + REFERENCES taxonvariant (taxonvariant_id) +); + +\copy tb1taxonlabel (tb1taxonlabel_id, taxonvariant_id, legacy_id, taxonlabel) from '../data/TI/taxon_labels.tab' + + +---------------------------------------------------------------------- +\echo 'Update fields in TaxonLabel with values from tb1taxonlabel' +update taxonlabel as tb2 + set taxonvariant_id = tb1.taxonvariant_id, + tb1legacyid = tb1.legacy_id, + version = 1 +from tb1taxonlabel as tb1 +where tb2.taxonlabel = tb1.taxonlabel; + + +---------------------------------------------------------------------- +\echo 'Drop the temporary table tb1taxonlabel' +drop table tb1taxonlabel; + + +---------------------------------------------------------------------- +\echo 'Recompute taxon_id_sequence and taxonvariant_id_sequence' +create or replace function recompute_sequence (tablename varchar) returns void as $$ +BEGIN + execute 'create table tmp_newstart(val bigint)'; + execute 'insert into tmp_newstart '|| + 'select (div(max('||tablename||'_id), 10000)+1)*10000 from '||tablename; + DECLARE + start_cursor cursor for select * from tmp_newstart; + start_value bigint; + BEGIN + open start_cursor; + fetch start_cursor into start_value; + close start_cursor; + execute 'alter sequence '||tablename||'_id_sequence restart with '||start_value; + execute 'drop table tmp_newstart'; + raise notice 'New start value for %_id_sequence: %', tablename, start_value; + END; +END; +$$ language 'plpgsql'; + +select recompute_sequence ('taxon'); +select recompute_sequence ('taxonvariant'); + +drop function recompute_sequence(varchar); + +---------------------------------------------------------------------- + +COMMIT TRANSACTION; Added: trunk/treebase-core/db/tb1load/scripts/step3_load_citations.sh =================================================================== --- trunk/treebase-core/db/tb1load/scripts/step3_load_citations.sh (rev 0) +++ trunk/treebase-core/db/tb1load/scripts/step3_load_citations.sh 2010-03-09 20:45:33 UTC (rev 533) @@ -0,0 +1,14 @@ +#!/bin/bash -e + +source common.sh + +CITATIONS=$DATA/citations.txt + + +echo "######################################################################" +echo "############ Step 3: load citations $CITATIONS ############" +echo "######################################################################" + +CMD="$JAVA org.cipres.treebase.util.CitationDataImporter $CITATIONS" +echo $CMD +$CMD Property changes on: trunk/treebase-core/db/tb1load/scripts/step3_load_citations.sh ___________________________________________________________________ Added: svn:executable + * This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |