Re: [Treebase-devel] TreeBASE migration project

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hilmar Lapp wrote:
> On Apr 22, 2009, at 4:45 PM, Mark Dominus wrote:
>> I understand there is probably some way to dump the TB2-format data as
>> it currently exists at SDSC, transfer the dump files to NESCent, and
>> bulk-load them into the database on the NESCent side.
> 
> In theory yes, but in practice each RDBMS has its own dump format.  
> Ideally we can get DB2 to dump the data as SQL standard-compliant  
> INSERT statements, but I don't know DB2 enough yet to know whether it  
> does that, and aside from that there's more than the data itself, such  
> as the sequence(s), grants, etc that may not dump in a format that's  
> readily ingestible by Pg.

It dumps the sequences, grants, foreign key constraints, and so forth, 
as SQL; see trunk/schema.sql .

But for dumping the data, it seems as though we can get any format we 
want, as long as it is IXF.

So it then occurred to me that it would not be hard to write a program 
that would scan all the records in a table and write out a series of SQL 
INSERT statements.

But rather than do that, it seems to me that it might make more sense to 
skip the text representation and just write a program that would run at 
NESCent, scan the tables over the network, and execute the appropriate 
INSERT statements directly, without ever serializing the data in between.

The drawback of this comes if we need to import the SDSC data a second 
time for some reason.  It would all have to be transferred over the 
network a second time.  A dump file need only be transferred once, and 
then could be stored at NESCent and loaded as many times as needed.  The 
benefit would be that there would be no need to worry about escape code 
conventions or strange characters or anything like that, and there would 
be no need to ship around a bunch of big files.