From: Rutger V. <rut...@gm...> - 2009-04-29 19:01:27
|
(Forwarding to mailing list.) On Wed, Apr 29, 2009 at 11:48 AM, Jon Auman <jon...@du...> wrote: > As the sysadmin, I've got a preference to #3. > Importing from a csv file in Postgresql is trivial. You also avoid import > aborts caused by postgresql errors and you always know at what stage you are > at during the import. > How big do you think the total dump files will be? For Postgresql, our data > dump sizes depend upon the type of data in the database. We've got a 400 MB > database with a fair amount of binary data that dumps to a 200 MB file, and > we've got a 1 GB database with no binary data that dumps to a 20 MB file. > Do you have an idea of the size of the current DB2 database on disk, and > what kind of data is in there (test or binary)? I don't know the exact size of the current database, but it's larger than your cases. > Also, will this be a one time operation or ongoing? One time-ish. Rutger > On Apr 29, 2009, at 1:51 PM, Rutger Vos wrote: > > Hi all, > > Bill, Mark, Val and I just had a conference call on pending issues > pre-beta. We decided there really aren't any: we're ready to start > beta testing. The topic then moved to what to do after testing. One of > the main issues is how we will move the actual data in TreeBASE2 from > the SDSC database instance (i.e. DB2 sitting on a computer in San > Diego) to the NESCent instance (i.e. PG sitting on a computer in > Durham). > > One possibility is that we use the scripts we've been using to import > TreeBASE1 data into TreeBASE2. Unfortunately, loading the data that > way takes a fair amount of time (think weeks) and human intervention. > > A second possibility would be to write a program that, from NESCent, > connects to the SDSC instance through JDBC and loads the data record > by record. This might take a long time too, and it'll depend on the > JDBC connection staying up for that entire time. > > To kick off the discussion I'd like to suggest a third possibility: we > implement functionality where each table can be dumped to some > delimited format (CSV, say); the dumps are made available for download > as a compressed archive; the NESCent machine downloads that archive > and loads the tables into PG. It seems to me that we want the > dump+zip+serve up functionality anyway, so this would be a good way to > make that happen. > > Any thoughts? > > Thanks, > > Rutger > > -- > Dr. Rutger A. Vos > Department of zoology > University of British Columbia > http://www.nexml.org > http://rutgervos.blogspot.com > > ------------------------------------------------------------------------------ > Register Now & Save for Velocity, the Web Performance & Operations > Conference from O'Reilly Media. Velocity features a full day of > expert-led, hands-on workshops and two days of sessions from industry > leaders in dedicated Performance & Operations tracks. Use code vel09scf > and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > > -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com |