From: Rutger V. <rut...@gm...> - 2009-04-24 20:01:39
|
It would be very useful if we did have a dump format (I imagined something simple like csv or some other delimiter). Some databases offer this for downloads (e.g. ncbi taxonomy, itis, "mammal species of the world") and it's a very useful feature. If we want treebase to be more than a place where trees go to die, this would be one way to facilitate meta-analyses and such. DB2::Admin on cpan (http://search.cpan.org/dist/DB2-Admin/) has a facility to dump DB2 tables as delimited files, so we could write a cron job script to do just that and make the output available as an archive. On Fri, Apr 24, 2009 at 11:21 AM, Mark Dominus <mjd...@ge...> wrote: > Hilmar Lapp wrote: >> On Apr 22, 2009, at 4:45 PM, Mark Dominus wrote: >>> I understand there is probably some way to dump the TB2-format data as >>> it currently exists at SDSC, transfer the dump files to NESCent, and >>> bulk-load them into the database on the NESCent side. >> >> In theory yes, but in practice each RDBMS has its own dump format. >> Ideally we can get DB2 to dump the data as SQL standard-compliant >> INSERT statements, but I don't know DB2 enough yet to know whether it >> does that, and aside from that there's more than the data itself, such >> as the sequence(s), grants, etc that may not dump in a format that's >> readily ingestible by Pg. > > It dumps the sequences, grants, foreign key constraints, and so forth, > as SQL; see trunk/schema.sql . > > But for dumping the data, it seems as though we can get any format we > want, as long as it is IXF. > > So it then occurred to me that it would not be hard to write a program > that would scan all the records in a table and write out a series of SQL > INSERT statements. > > But rather than do that, it seems to me that it might make more sense to > skip the text representation and just write a program that would run at > NESCent, scan the tables over the network, and execute the appropriate > INSERT statements directly, without ever serializing the data in between. > > The drawback of this comes if we need to import the SDSC data a second > time for some reason. It would all have to be transferred over the > network a second time. A dump file need only be transferred once, and > then could be stored at NESCent and loaded as many times as needed. The > benefit would be that there would be no need to worry about escape code > conventions or strange characters or anything like that, and there would > be no need to ship around a bunch of big files. > > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensign option that enables unlimited > royalty-free distribution of the report engine for externally facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com |