Re: [Treebase-devel] Final migration increment

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Mar 9, 2010, at 10:21 AM, Vladimir Gapeyev wrote:

>   (4) Some bibliography entries in the release will have all info  
> crammed into the title field.  This will be fixed after the release.
> 
> I am worried about the logistics of (4), and would prefer to have it  
> done prior to the release.

I thought this was the "normal" way that the migration proceeds -- i.e. with the following 3 stages:

1. parse and upload the dump.txt file and associated trees and matrices. TB1 has all citations in one line -- this is crammed in the title field, although full names and email addresses of authors are stored in separate tables but without info on author order.

2. replace the existing taxon_variant and taxon tables with the latest TI dump.

3. update the citation information with the latest Endnote file. Here, author names are abbreviated (per Endnote conventions) but author order is known.

Is this the same basic order of tasks which you used for the Dec09 migration? 

The only difference here is that we can go live before task 3 is performed. And note that we could task our undergrads with editing the citation info directly with the TreeBASE2 interface instead of first editing an Endnote file. i.e., we actually do away with step 3 as it stands.  (Although seeing as our metadata student help (in Endnote) has improved metadata for all TreeBASE studies considerably, we'll probably want to run a citation update script at some point anyway). 

Do we need to think about how we will run update scripts and data cleansing scripts in future? I'm not sure what's the point of "stage" other than the ability to test new builds against a (slightly older) version of the production data.  For update and data cleansing scripts, we will need to apply them directly to production (after first triggering a pg_dump, of course). 

bp