Re: [Treebase-devel] TreeBASE migration project

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

It would be very useful if we did have a dump format (I imagined
something simple like csv or some other delimiter). Some databases
offer this for downloads (e.g. ncbi taxonomy, itis, "mammal species of
the world") and it's a very useful feature. If we want treebase to be
more than a place where trees go to die, this would be one way to
facilitate meta-analyses and such. DB2::Admin on cpan
(http://search.cpan.org/dist/DB2-Admin/) has a facility to dump DB2
tables as delimited files, so we could write a cron job script to do
just that and make the output available as an archive.

On Fri, Apr 24, 2009 at 11:21 AM, Mark Dominus
<mjd...@ge...> wrote:
> Hilmar Lapp wrote:
>> On Apr 22, 2009, at 4:45 PM, Mark Dominus wrote:
>>> I understand there is probably some way to dump the TB2-format data as
>>> it currently exists at SDSC, transfer the dump files to NESCent, and
>>> bulk-load them into the database on the NESCent side.
>>
>> In theory yes, but in practice each RDBMS has its own dump format.
>> Ideally we can get DB2 to dump the data as SQL standard-compliant
>> INSERT statements, but I don't know DB2 enough yet to know whether it
>> does that, and aside from that there's more than the data itself, such
>> as the sequence(s), grants, etc that may not dump in a format that's
>> readily ingestible by Pg.
>
> It dumps the sequences, grants, foreign key constraints, and so forth,
> as SQL; see trunk/schema.sql .
>
> But for dumping the data, it seems as though we can get any format we
> want, as long as it is IXF.
>
> So it then occurred to me that it would not be hard to write a program
> that would scan all the records in a table and write out a series of SQL
> INSERT statements.
>
> But rather than do that, it seems to me that it might make more sense to
> skip the text representation and just write a program that would run at
> NESCent, scan the tables over the network, and execute the appropriate
> INSERT statements directly, without ever serializing the data in between.
>
> The drawback of this comes if we need to import the SDSC data a second
> time for some reason.  It would all have to be transferred over the
> network a second time.  A dump file need only be transferred once, and
> then could be stored at NESCent and loaded as many times as needed.  The
> benefit would be that there would be no need to worry about escape code
> conventions or strange characters or anything like that, and there would
> be no need to ship around a bunch of big files.
>
> ------------------------------------------------------------------------------
> Crystal Reports &#45; New Free Runtime and 30 Day Trial
> Check out the new simplified licensign option that enables unlimited
> royalty&#45;free distribution of the report engine for externally facing
> server and web deployment.
> http://p.sf.net/sfu/businessobjects
> _______________________________________________
> Treebase-devel mailing list
> Tre...@li...
> https://lists.sourceforge.net/lists/listinfo/treebase-devel
>

-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com