Re: [Treebase-devel] data migration SDSC -> NESCent

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Rutger Vos wrote:
> One possibility is that we use the scripts we've been using to import
> TreeBASE1 data into TreeBASE2. Unfortunately, loading the data that
> way takes a fair amount of time (think weeks) and human intervention.

This is definitely my last choice.

> A second possibility would be to write a program that, from NESCent,
> connects to the SDSC instance through JDBC and loads the data record
> by record. This might take a long time too,

I discussed this at some length in my message of 24 April, and I was 
hoping for a response from Hilmar.

Here are my comments from my earlier message:

> The drawback of this comes if we need to import the SDSC data a second 
> time for some reason.  It would all have to be transferred over the 
> network a second time.  A dump file need only be transferred once, and 
> then could be stored at NESCent and loaded as many times as needed.  The 
> benefit would be that there would be no need to worry about escape code 
> conventions or strange characters or anything like that, and there would 
> be no need to ship around a bunch of big files.

I regret that I wasn't clear about the size of the "big files".  The 
MATRIXELEMENT table has 3e8 records with at least 61 bytes each, which 
means the dump file is at least 18.4 GB.

> it'll depend on the
> JDBC connection staying up for that entire time.

No; we would write the program to do the loading incrementally.  We 
would have to do that anyway, since we would have the same problem just 
copying the dump file.  We cannot expect to copy a 20GB dump file 
cross-country in one piece.

> To kick off the discussion I'd like to suggest a third possibility: we
> implement functionality where each table can be dumped to some
> delimited format (CSV, say); 

Hilmar specifically suggested dumping the data in SQL format:

> Ideally we can get DB2 to dump the data as SQL standard-compliant  
> INSERT statements,

CSV is a poor format, and if we are going to write custom dump-and-load 
programs, I hope we do something better.

DB2 will already dump the data into compact  "IXF" files; I am still 
waiting for a reply from Hilmar about whether this will be useful to us. 
  DB2 does not have any other useful built-in dumping capabilities.