Re: [Treebase-devel] data migration SDSC -> NESCent

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

(Forwarding to mailing list.)

On Wed, Apr 29, 2009 at 11:48 AM, Jon Auman <jon...@du...> wrote:
> As the sysadmin, I've got a preference to #3.
> Importing from a csv file in Postgresql is trivial. You also avoid import
> aborts caused by postgresql errors and you always know at what stage you are
> at during the import.
> How big do you think the total dump files will be? For Postgresql, our data
> dump sizes depend upon the type of data in the database. We've got a 400 MB
> database with a fair amount of binary data that dumps to a 200 MB file, and
> we've got a 1 GB database with no binary data that dumps to a 20 MB file.
>  Do you have an idea of the size of the current DB2 database on disk, and
> what kind of data is in there (test or binary)?

I don't know the exact size of the current database, but it's larger
than your cases.

> Also, will this be a one time operation or ongoing?

One time-ish.

Rutger

> On Apr 29, 2009, at 1:51 PM, Rutger Vos wrote:
>
> Hi all,
>
> Bill, Mark, Val and I just had a conference call on pending issues
> pre-beta. We decided there really aren't any: we're ready to start
> beta testing. The topic then moved to what to do after testing. One of
> the main issues is how we will move the actual data in TreeBASE2 from
> the SDSC database instance (i.e. DB2 sitting on a computer in San
> Diego) to the NESCent instance (i.e. PG sitting on a computer in
> Durham).
>
> One possibility is that we use the scripts we've been using to import
> TreeBASE1 data into TreeBASE2. Unfortunately, loading the data that
> way takes a fair amount of time (think weeks) and human intervention.
>
> A second possibility would be to write a program that, from NESCent,
> connects to the SDSC instance through JDBC and loads the data record
> by record. This might take a long time too, and it'll depend on the
> JDBC connection staying up for that entire time.
>
> To kick off the discussion I'd like to suggest a third possibility: we
> implement functionality where each table can be dumped to some
> delimited format (CSV, say); the dumps are made available for download
> as a compressed archive; the NESCent machine downloads that archive
> and loads the tables into PG. It seems to me that we want the
> dump+zip+serve up functionality anyway, so this would be a good way to
> make that happen.
>
> Any thoughts?
>
> Thanks,
>
> Rutger
>
> --
> Dr. Rutger A. Vos
> Department of zoology
> University of British Columbia
> http://www.nexml.org
> http://rutgervos.blogspot.com
>
> ------------------------------------------------------------------------------
> Register Now & Save for Velocity, the Web Performance & Operations
> Conference from O'Reilly Media. Velocity features a full day of
> expert-led, hands-on workshops and two days of sessions from industry
> leaders in dedicated Performance & Operations tracks. Use code vel09scf
> and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf
> _______________________________________________
> Treebase-devel mailing list
> Tre...@li...
> https://lists.sourceforge.net/lists/listinfo/treebase-devel
>
>

-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com