Re: [cx-oracle-users] speeding up TB data download
Brought to you by:
atuining
From: Anthony T. <ant...@gm...> - 2010-07-26 13:55:40
|
Hi, Just a suggestion: avoid the parsing completely if you can do it. If all you're trying to do is calculate the maximum size of fields, you can easily write something that does this: select max(length(<COL_NAME>)) from <TABLE_NAME> or for numbers select max(length(to_char(<COL_NAME>)) from <TABLE_NAME> This sort of query is far more efficient and can be done independently of your extracts. You can quite easily generate all of these queries by reading from user_tab_columns, all_tab_columns or dba_tab_columns. If you need help with that, let me know. Anthony On Sun, Jul 25, 2010 at 9:44 PM, Chuck White <chu...@ch...> wrote: > Hello -- I am trying to export data from an Oracle 10 db to a Netezza db. The ORCL db has ~100tables (uncompressed ~ 5TB) with tables which range from a few kb to 1TB. > > I am using Python 2.7/cx_Oracle 5.0.4 to download the data as bzipped csv files and then loading in Netezza. The code to download data looks like this: > > ===================== > self.connection = cx_Oracle.Connection(conninfo) > ... > cursor = self.connection.cursor() > cursor.numbersAsStrings = True > cursor.arraysize = 500 > ... > cursor.execute(query) > > fil_out = bz2.BZ2File(lower("%s.csv.bz2"%fname), "wb") > csv.register_dialect("test", delimiter="|", quoting=csv.QUOTE_NONE, escapechar="\\", doublequote=False) > csv_out = csv.writer(fil_out, "test") > > for row in cursor: > for n,(pars,cell) in enumerate(zip(col_parser,row)): > if cell: > ... > else: > ... > csv_out.writerow(rowparsed) > cursor.close() > ===================== > > The reason the data is being parsed is that I am try to obtain the min and max field size so that I can get the NZ schema to be efficient. Also, the reason I am using numbersAsStrings so that I can easily get the size of the numbers. > > The above method is transferring about 3.5GB/hour (uncompressed) or 0.6GB/hour (bz2 compressed). Using the multiprocessing library, I am able to spawn four processes on a quad-core machine which definitely help! > > I was wondering there way any way to speed this up further? > > Thanks. > > > ------------------------------------------------------------------------------ > The Palm PDK Hot Apps Program offers developers who use the > Plug-In Development Kit to bring their C/C++ apps to Palm for a share > of $1 Million in cash or HP Products. Visit us here for more details: > http://ad.doubleclick.net/clk;226879339;13503038;l? > http://clk.atdmt.com/CRS/go/247765532/direct/01/ > _______________________________________________ > cx-oracle-users mailing list > cx-...@li... > https://lists.sourceforge.net/lists/listinfo/cx-oracle-users > |