Re: [Py4j-users] Using Py4J with parallel programming
Status: Beta
Brought to you by:
barthe
From: Barthelemy D. <ba...@cs...> - 2012-03-07 13:35:34
|
Hi Stephen, Here are a few pointers: 1. You should be able to use multiprocessing with py4j (multi-threading certainly works). Just make sure that you create a new JavaGateway instance in each process (don't reuse an instance that was created outside the process) and that you do not share JavaObject instances (objects returned by the Java side) across processes. I think you may be doing this in your code. Essentially, when you are sharing an instance across processes, you end up sharing a socket, which is tricky and probably won't work with the way sockets are initialized by py4j. 2. __getnewargs__ is usually used by python to pickle an object. If I remember correctly, pickle is used by multiprocessing to transfer/share objects across processes. 3. If you really want to "share" an object, you should probably save it somewhere on the Java side and request the object in each Python process. Hope this helps, Barthélémy On 2012-03-06, at 5:37 PM, Stephen Molloy wrote: > Hi all, > I have a working implementation of some Java code I inherited, wrapped up in Python using Py4J. Due to the long time for the calculations to complete, and the embarrassingly parallel nature of my problem, I am trying to use the Python multiprocessing module to speed things up. > > In particular, I create a pool of processes, and then submit jobs to them. Each job is almost identical, and Py4J to create a new Java object in the JVM server, and performs the necessary calculations. The problem is that it fails very quickly inside a Py4J call. > > There is a lot of code involved in this, but the following is the function that fails. The class "AJDiskAnalysis" contains the call to the JVM server that you can see in the klysOutputPower() function. > def klysOutputPower(Pin, AJDiskAnalObj): > """Returns the output power of the klystron for a given input power.""" > AJDiskAnalObj.klysanal.setInputPower(float(Pin)) > AJDiskAnalObj.simulate(disks=20) > return AJDiskAnalObj.klysanal.getSimulatedOutputPower() > > def powerScan(Pin, filename): > Pout, result, AJobj = [], [], [] > pool = mp.Pool(processes=12) > for power in Pin: > AJobj.append(AJDiskAnalysis(DSKParsObj=DSKParser(fname=filename))) > result.append(pool.apply_async(klysOutputPower, (power, AJobj[-1]))) > for i in result: > Pout.append(i.get()) > return Pout > > if __name__=="__main__": > Pin = arange(0, 150, 10) > Pout = powerScan(Pin, "ess-new-704.dsk") > > This results in the following error: > > Exception in thread Thread-1: > Traceback (most recent call last): > File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner > self.run() > File "/usr/lib64/python2.6/threading.py", line 484, in run > self.__target(*self.__args, **self.__kwargs) > File "/usr/lib64/python2.6/multiprocessing/pool.py", line 225, in _handle_tasks > put(task) > File "/usr/lib/python2.6/site-packages/py4j-0.7-py2.6.egg/py4j/java_gateway.py", line 432, in __call__ > self.target_id, self.name) > File "/usr/lib/python2.6/site-packages/py4j-0.7-py2.6.egg/py4j/protocol.py", line 271, in get_return_value > raise Py4JError('An error occurred while calling %s%s%s. Trace:\n%s\n' % (target_id, '.', name, value)) > Py4JError: An error occurred while calling o19.__getnewargs__. Trace: > py4j.Py4JException: Method __getnewargs__([]) does not exist > at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:346) > at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:355) > at py4j.Gateway.invoke(Gateway.java:247) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:124) > at py4j.commands.CallCommand.execute(CallCommand.java:81) > at py4j.GatewayConnection.run(GatewayConnection.java:175) > at java.lang.Thread.run(Thread.java:662) > > Is there some problem that prevents Py4J being used alongside the multiprocessing module, or is there something I am doing wrong? > I would be very happy for any hints or tips, and will happily provide any more information if it is needed. > > Steve > > ------------------------------------------------------------------------------ > Keep Your Developer Skills Current with LearnDevNow! > The most comprehensive online learning library for Microsoft developers > is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, > Metro Style Apps, more. Free future releases when you subscribe now! > http://p.sf.net/sfu/learndevnow-d2d_______________________________________________ > Py4j-users mailing list > Py4...@li... > https://lists.sourceforge.net/lists/listinfo/py4j-users |