Re: [Treebase-devel] Migration update

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Mar 3, 2010, at 3:20 PM, Vladimir Gapeyev wrote:

> Bill, I'd like to check with you on the number of new matrices and  
> trees that were expected to be in the delta. The import tools are  
> written to skip files with the names that are already in the  
> database.  So, they uploaded about 590 new matrices and 720 new trees  
> (compared, respectively, to 4348 files in the characters directory and  
> 5151 files in the trees directory).  Does this outcome look about  
> right?  That is, did the data directories contain files that were  
> loaded into the database earlier and did you NOT expect for them to  
> affect the database?

So, this dump file:

http://www.treebase.org/treebase/migration/Dec-09/dump_Dec09_utf8.txt

...only makes reference to a subset of the data found here /Dec-09/characters/ ...and here... /Dec-09/trees/

I had assumed that the way the migration scripts would work is that they would read the dump file, and then only import each matrix or tree as needed (or instructed) by the dump file. i.e., the migration scripts need not have skipped over anything because dump_Dec09_utf8.txt has already done that for you.

But I guess in actual fact the migration scripts work differently -- I'm guessing that they first upload all new matrices and trees, and only afterwords wire them together in their proper study record after parsing the dump file. 

The December 09 dump file contains instructions to upload 284 studies, 560 matrices, and 714 trees.  So it's a bit odd that the migration scripts decided that there were 590 matrices and 720 trees to upload. This means that there are 30 matrices and 6 trees that will be uploaded, yet there is no info in the dump file about what studies or analyses they belong to. If you can save a list of these "orphaned" matrices and trees, I can look into what study they should belong to. 

bp