Re: [Treebase-devel] TreeBASE migration project

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Ok, so I'll pick up this thread instead of the one earlier today that  
hijacked this conversation. I was swamped last week with other  
'stuff', so I did not jump in to the conversation.

My experience with postgresql to postgresql dumps has been spotty.  
Postgresql can dump to plain text, tar, or binary format. Sometimes  
the sequences try to get imported before the table and the whole  
import fails, in text mode. Some databases import just fine in text  
mode though. That's why I thought the csv file seemed like a good  
place to try. It's also the method documented on postgresql's wiki: http://wiki.postgresql.org/wiki/Image:DB2UDB-to-PG.pdf 
  Of course, you would have to be careful to pick a delimiter that  
does not occur in the data values.

I'm not opposed to the web service slurping the data in over the wire,  
but it seems like more work and more difficult to troubleshoot. Of  
course, I'm not much of a programmer, so I'm looking at things from a  
sys admin point of view.

That's also a good point about encoding.

-Jon
-------------------------------------------------------
  Jon Auman
  Systems Administrator
  National Evolutionary Synthesis Center
  Duke University
  http:www.nescent.org
  jon...@ne...
------------------------------------------------------

On Apr 24, 2009, at 2:21 PM, Mark Dominus wrote:

> Hilmar Lapp wrote:
>> On Apr 22, 2009, at 4:45 PM, Mark Dominus wrote:
>>> I understand there is probably some way to dump the TB2-format  
>>> data as
>>> it currently exists at SDSC, transfer the dump files to NESCent, and
>>> bulk-load them into the database on the NESCent side.
>>
>> In theory yes, but in practice each RDBMS has its own dump format.
>> Ideally we can get DB2 to dump the data as SQL standard-compliant
>> INSERT statements, but I don't know DB2 enough yet to know whether it
>> does that, and aside from that there's more than the data itself,  
>> such
>> as the sequence(s), grants, etc that may not dump in a format that's
>> readily ingestible by Pg.
>
> It dumps the sequences, grants, foreign key constraints, and so forth,
> as SQL; see trunk/schema.sql .
>
> But for dumping the data, it seems as though we can get any format we
> want, as long as it is IXF.
>
> So it then occurred to me that it would not be hard to write a program
> that would scan all the records in a table and write out a series of  
> SQL
> INSERT statements.
>
> But rather than do that, it seems to me that it might make more  
> sense to
> skip the text representation and just write a program that would run  
> at
> NESCent, scan the tables over the network, and execute the appropriate
> INSERT statements directly, without ever serializing the data in  
> between.
>
> The drawback of this comes if we need to import the SDSC data a second
> time for some reason.  It would all have to be transferred over the
> network a second time.  A dump file need only be transferred once, and
> then could be stored at NESCent and loaded as many times as needed.   
> The
> benefit would be that there would be no need to worry about escape  
> code
> conventions or strange characters or anything like that, and there  
> would
> be no need to ship around a bunch of big files.
>
> ------------------------------------------------------------------------------
> Crystal Reports &#45; New Free Runtime and 30 Day Trial
> Check out the new simplified licensign option that enables unlimited
> royalty&#45;free distribution of the report engine for externally  
> facing
> server and web deployment.
> http://p.sf.net/sfu/businessobjects
> _______________________________________________
> Treebase-devel mailing list
> Tre...@li...
> https://lists.sourceforge.net/lists/listinfo/treebase-devel