[cx-oracle-users] speeding up TB data download
Brought to you by:
atuining
From: Chuck W. <chu...@ch...> - 2010-07-26 03:44:37
|
Hello -- I am trying to export data from an Oracle 10 db to a Netezza db. The ORCL db has ~100tables (uncompressed ~ 5TB) with tables which range from a few kb to 1TB. I am using Python 2.7/cx_Oracle 5.0.4 to download the data as bzipped csv files and then loading in Netezza. The code to download data looks like this: ===================== self.connection = cx_Oracle.Connection(conninfo) ... cursor = self.connection.cursor() cursor.numbersAsStrings = True cursor.arraysize = 500 ... cursor.execute(query) fil_out = bz2.BZ2File(lower("%s.csv.bz2"%fname), "wb") csv.register_dialect("test", delimiter="|", quoting=csv.QUOTE_NONE, escapechar="\\", doublequote=False) csv_out = csv.writer(fil_out, "test") for row in cursor: for n,(pars,cell) in enumerate(zip(col_parser,row)): if cell: ... else: ... csv_out.writerow(rowparsed) cursor.close() ===================== The reason the data is being parsed is that I am try to obtain the min and max field size so that I can get the NZ schema to be efficient. Also, the reason I am using numbersAsStrings so that I can easily get the size of the numbers. The above method is transferring about 3.5GB/hour (uncompressed) or 0.6GB/hour (bz2 compressed). Using the multiprocessing library, I am able to spawn four processes on a quad-core machine which definitely help! I was wondering there way any way to speed this up further? Thanks. |