Thread: [cx-oracle-users] passing a connection to a process?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hello -- I am trying to export data from an Oracle 10 db (~1TB) to a Netezza db. The ORCL db has >150tables with tables which range from a few kb to 300GB.

I am using Python 2.6.5/cx_Oracle to download the data as bzipped csv files and then loading in Netezza. I have written a script which spawns as many processes as specified (using threading, Queue and subprocess modules) and pulls down the data. The large tables are pulled down as multiple files, each of which is a partition.  Briefly, here's the code which spawns the processes.

====
   jobq = Queue.Queue(...)
   put stuff in the queue

   def worker():
        while True:
            j = jobq.get()
            p = subprocess.Popen(command+" --job=%s"%j, stdout=subprocess.PIPE, shell=True)
            p.wait()
            jobq.task_done()

    for i in xrange(int(options.nproc)):
        t = threading.Thread(target=worker)
        t.setDaemon(True)
        t.start()

    jobq.join()
====

Since, I am spawning multiple processes, and cx_Oracle connection object is not picklable, each process has to start and close its own connection (adds about 15seconds to each process).  While that is fine for the larger tables, it adds a lot of time for the 100 or so small tables.

Is there any way in which all the connections can be started by the master and passed to the worker processes? I realize I can easily spawn multiple threads and just use one process. However, with all the compression that needs to happen, I am thinking that would be much slower.

Thank you.