[Py4j-users] Using Py4J with parallel programming
Status: Beta
Brought to you by:
barthe
From: Stephen M. <sdm...@gm...> - 2012-03-06 22:55:17
|
Hi all, I have a working implementation of some Java code I inherited, wrapped up in Python using Py4J. Due to the long time for the calculations to complete, and the embarrassingly parallel nature of my problem, I am trying to use the Python multiprocessing module to speed things up. In particular, I create a pool of processes, and then submit jobs to them. Each job is almost identical, and Py4J to create a new Java object in the JVM server, and performs the necessary calculations. The problem is that it fails very quickly inside a Py4J call. There is a lot of code involved in this, but the following is the function that fails. The class "AJDiskAnalysis" contains the call to the JVM server that you can see in the klysOutputPower() function. def klysOutputPower(Pin, AJDiskAnalObj): """Returns the output power of the klystron for a given input power.""" AJDiskAnalObj.klysanal.setInputPower(float(Pin)) AJDiskAnalObj.simulate(disks=20) return AJDiskAnalObj.klysanal.getSimulatedOutputPower() def powerScan(Pin, filename): Pout, result, AJobj = [], [], [] pool = mp.Pool(processes=12) for power in Pin: AJobj.append(AJDiskAnalysis(DSKParsObj=DSKParser(fname=filename))) result.append(pool.apply_async(klysOutputPower, (power, AJobj[-1]))) for i in result: Pout.append(i.get()) return Pout if __name__=="__main__": Pin = arange(0, 150, 10) Pout = powerScan(Pin, "ess-new-704.dsk") This results in the following error: Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner self.run() File "/usr/lib64/python2.6/threading.py", line 484, in run self.__target(*self.__args, **self.__kwargs) File "/usr/lib64/python2.6/multiprocessing/pool.py", line 225, in _handle_tasks put(task) File "/usr/lib/python2.6/site-packages/py4j-0.7-py2.6.egg/py4j/java_gateway.py", line 432, in __call__ self.target_id, self.name) File "/usr/lib/python2.6/site-packages/py4j-0.7-py2.6.egg/py4j/protocol.py", line 271, in get_return_value raise Py4JError('An error occurred while calling %s%s%s. Trace:\n%s\n' % (target_id, '.', name, value)) Py4JError: An error occurred while calling o19.__getnewargs__. Trace: py4j.Py4JException: Method __getnewargs__([]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:346) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:355) at py4j.Gateway.invoke(Gateway.java:247) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:124) at py4j.commands.CallCommand.execute(CallCommand.java:81) at py4j.GatewayConnection.run(GatewayConnection.java:175) at java.lang.Thread.run(Thread.java:662) Is there some problem that prevents Py4J being used alongside the multiprocessing module, or is there something I am doing wrong? I would be very happy for any hints or tips, and will happily provide any more information if it is needed. Steve |