From: Mark D. <mjd...@ge...> - 2009-04-24 18:22:01
|
Hilmar Lapp wrote: > On Apr 22, 2009, at 4:45 PM, Mark Dominus wrote: >> I understand there is probably some way to dump the TB2-format data as >> it currently exists at SDSC, transfer the dump files to NESCent, and >> bulk-load them into the database on the NESCent side. > > In theory yes, but in practice each RDBMS has its own dump format. > Ideally we can get DB2 to dump the data as SQL standard-compliant > INSERT statements, but I don't know DB2 enough yet to know whether it > does that, and aside from that there's more than the data itself, such > as the sequence(s), grants, etc that may not dump in a format that's > readily ingestible by Pg. It dumps the sequences, grants, foreign key constraints, and so forth, as SQL; see trunk/schema.sql . But for dumping the data, it seems as though we can get any format we want, as long as it is IXF. So it then occurred to me that it would not be hard to write a program that would scan all the records in a table and write out a series of SQL INSERT statements. But rather than do that, it seems to me that it might make more sense to skip the text representation and just write a program that would run at NESCent, scan the tables over the network, and execute the appropriate INSERT statements directly, without ever serializing the data in between. The drawback of this comes if we need to import the SDSC data a second time for some reason. It would all have to be transferred over the network a second time. A dump file need only be transferred once, and then could be stored at NESCent and loaded as many times as needed. The benefit would be that there would be no need to worry about escape code conventions or strange characters or anything like that, and there would be no need to ship around a bunch of big files. |