Re: [Py4j-users] Multiprocessing error
Status: Beta
Brought to you by:
barthe
From: Barthelemy D. <ba...@cs...> - 2012-06-22 22:47:02
|
Hi, I'm a bit puzzled. This error, py4j.Py4JException: Method __getnewargs__([]) does not exist, means that something is trying to pickle a JavaObject instance. This typically happens if you try to share an object with multiprocessing. Here are a couple of debug statements I would add: 1. Try to place the import statements in singlethread(…). I just want to make sure that Py4J is not initializing some global states that are shared between multiprocessing (I don't think so because I used multiprocessing with Py4J in the past, but this is just to eliminate a possibility). 2. print the id the objects you create, e.g., print(config._target_id). The error message contains an object id that we can use to track down which object is supposed to have this __getnewargs__ method. 3. debug or print a message between each line after the java_import statements, to know exactly when the error occurs. Thanks, Barthélémy On 2012-06-22, at 10:52 AM, Eleftherios Avramidis wrote: > Hi, > > I am sorry, I missed two lines: > > > from py4j.java_gateway import JavaGateway, GatewayClient, java_import > > import multiprocessing, time, random > > def singlethread(socket_no): > print "Thread starting" > > gatewayclient = GatewayClient('localhost', socket_no) > gateway = JavaGateway(gatewayclient, auto_convert=True, auto_field=True) > #create a new view for the jvm > meteor_view = gateway.new_jvm_view() > #import required packages > java_import(meteor_view, 'edu.cmu.meteor.scorer.*') > #initialize the java object > java_import(meteor_view, 'edu.cmu.meteor.util.*') > #pass the language setting into the meteor configuration object > config = meteor_view.MeteorConfiguration(); > config.setLanguage("en"); > scorer = meteor_view.MeteorScorer(config) > print "object initialized" > #run object function > stats = scorer.getMeteorStats("Test sentence", "Test sentence !"); > print stats.score > return 1 > > > if __name__ == '__main__': > socket_no = 25336 > print "Gclient started" > > p = multiprocessing.Pool(3) > print "Multipool initialized" > p.map(singlethread, [socket_no, socket_no,socket_no]) > > > > > On 22/06/12 16:50, Eleftherios Avramidis wrote: >> Hi Barthélémy, >> >> I just tried rewriting the code as following. Two threads printed out their result successfully, the third pool worker gave a similar error. (I am running this in a 2-core machine) >> >> >> from py4j.java_gateway import JavaGateway, GatewayClient, java_import >> >> import multiprocessing, time, random >> >> def singlethread(socket_no): >> print "Thread starting" >> gatewayclient = GatewayClient('localhost', socket_no) >> gateway = JavaGateway(gatewayclient, auto_convert=True, auto_field=True) >> #create a new view for the jvm >> meteor_view = gateway.new_jvm_view() >> #import required packages >> java_import(meteor_view, 'edu.cmu.meteor.scorer.*') >> #initialize the java object >> java_import(meteor_view, 'edu.cmu.meteor.util.*') >> #pass the language setting into the meteor configuration object >> config = meteor_view.MeteorConfiguration(); >> config.setLanguage("en"); >> scorer = meteor_view.MeteorScorer(config) >> print "object initialized" >> #run object function >> stats = scorer.getMeteorStats("Test sentence", "Test sentence !"); >> print stats.score >> return 1 >> >> >> if __name__ == '__main__': >> socket_no = 25336 >> # gatewayclient = GatewayClient('localhost', socket_no) >> print "Gclient started" >> >> >> >> output: >> >> Gclient started >> Multipool initialized >> Thread starting >> Thread starting >> Thread starting >> Process PoolWorker-3: >> Traceback (most recent call last): >> File "/usr/local/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap >> self.run() >> File "/usr/local/lib/python2.7/multiprocessing/process.py", line 114, in run >> self._target(*self._args, **self._kwargs) >> File "/usr/local/lib/python2.7/multiprocessing/pool.py", line 99, in worker >> put((job, i, result)) >> File "/usr/local/lib/python2.7/multiprocessing/queues.py", line 392, in put >> return send(obj) >> File "/usr/local/lib/python2.7/site-packages/py4j/java_gateway.py", line 432, in __call__ >> self.target_id, self.name) >> File "/usr/local/lib/python2.7/site-packages/py4j/protocol.py", line 271, in get_return_value >> raise Py4JError('An error occurred while calling %s%s%s. Trace:\n%s\n' % (target_id, '.', name, value)) >> Py4JError: An error occurred while calling o6.__getnewargs__. Trace: >> py4j.Py4JException: Method __getnewargs__([]) does not exist >> at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:346) >> at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:355) >> at py4j.Gateway.invoke(Gateway.java:247) >> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:124) >> at py4j.commands.CallCommand.execute(CallCommand.java:81) >> at py4j.GatewayConnection.run(GatewayConnection.java:175) >> at java.lang.Thread.run(Thread.java:636) >> >> >> object initialized >> 0.335206780367 >> object initialized >> 0.335206780367 >> >> >> Any suggestions welcome, >> >> Lefteris >> >> >> On 22/06/12 16:25, Barthelemy Dagenais wrote: >>> Hi Eleftherios, >>> >>> in this code: >>> >>> if __name__ == '__main__': >>> socket_no = 25336 >>> gatewayclient = GatewayClient('localhost', socket_no) >>> print "Gclient started" >>> gateway = JavaGateway(gatewayclient, auto_convert=True, >>> auto_field=True) >>> print "Gateway started" >>> >>> p = multiprocessing.Pool(2) >>> print "Multipool initialized" >>> p.map(singlethread, [gateway, gateway, gateway]) >>> >>> You are creating only one instance of JavaGateway and this instance is passed to each worker (the list [gateway, gateway, gateway] is in fact a list of three pointers pointing to the same instance). I usually recommend to create one JavaGateway instance per worker. >>> >>> Unless this was not the code that caused the problem, I would start by creating a JavaGateway in each worker and then, if you still encounter the problem, we can start looking at other possibilities. >>> >>> Thanks, >>> Barthélémy >>> >>> >>> On 2012-06-22, at 10:20 AM, Eleftherios Avramidis wrote: >>> >>>> Hi Barthélémy, >>>> >>>> thanks for your quick response. I am trying to understand your suggestion: >>>> - On the Java side I have initialized GatewayServer on socket 25336 >>>> - On the Python side I initialized one GatewayClient per thread, >>>> connecting to socket 25336. >>>> >>>> Is this the right solution so that the sockets are not being shared? >>>> >>>> best >>>> Eleftherios >>>> >>>> >>>> >>>> >>>> >>>> On 22/06/12 15:55, Barthelemy Dagenais wrote: >>>>> Hi, >>>>> >>>>> Please try to initialize the gateway in the thread/process and not outside. I believe you may be sharing sockets across threads/processes, which would explain the weird behavior you are observing. >>>>> >>>>> Barthélémy >>>>> >>>>> On 2012-06-22, at 9:08 AM, Eleftherios Avramidis wrote: >>>>> >>>>>> Dear all, >>>>>> >>>>>> I would like your help with a particular issue which occurs when I am >>>>>> trying to use python multiprocessing, along with Py4j Java objects. >>>>>> >>>>>> >>>>>> def singlethread(gateway): >>>>>> print "Thread starting" >>>>>> >>>>>> #create a new view for the jvm >>>>>> meteor_view = gateway.new_jvm_view() >>>>>> #import required packages >>>>>> java_import(meteor_view, 'edu.cmu.meteor.scorer.*') >>>>>> #initialize the java object >>>>>> scorer = meteor_view.MeteorScorer() >>>>>> print "object initialized" >>>>>> #run object function >>>>>> stats = scorer.getMeteorStats("Test sentence", "Test sentence !"); >>>>>> print stats.score >>>>>> >>>>>> if __name__ == '__main__': >>>>>> socket_no = 25336 >>>>>> gatewayclient = GatewayClient('localhost', socket_no) >>>>>> print "Gclient started" >>>>>> gateway = JavaGateway(gatewayclient, auto_convert=True, >>>>>> auto_field=True) >>>>>> print "Gateway started" >>>>>> >>>>>> p = multiprocessing.Pool(2) >>>>>> print "Multipool initialized" >>>>>> p.map(singlethread, [gateway, gateway, gateway]) >>>>>> >>>>>> >>>>>> >>>>>> output: >>>>>> >>>>>> Gclient started >>>>>> Gateway started >>>>>> Multipool initialized >>>>>> Exception in thread Thread-2: >>>>>> Traceback (most recent call last): >>>>>> File "/usr/local/lib/python2.7/threading.py", line 552, in >>>>>> __bootstrap_inner >>>>>> self.run() >>>>>> File "/usr/local/lib/python2.7/threading.py", line 505, in run >>>>>> self.__target(*self.__args, **self.__kwargs) >>>>>> File "/usr/local/lib/python2.7/multiprocessing/pool.py", line 313, in >>>>>> _handle_tasks >>>>>> put(task) >>>>>> File "/usr/local/lib/python2.7/site-packages/py4j/java_gateway.py", >>>>>> line 432, in __call__ >>>>>> self.target_id, self.name) >>>>>> File "/usr/local/lib/python2.7/site-packages/py4j/protocol.py", line >>>>>> 271, in get_return_value >>>>>> raise Py4JError('An error occurred while calling %s%s%s. >>>>>> Trace:\n%s\n' % (target_id, '.', name, value)) >>>>>> Py4JError: An error occurred while calling t.__getnewargs__. Trace: >>>>>> java.lang.StringIndexOutOfBoundsException: String index out of range: -1 >>>>>> at java.lang.String.substring(String.java:1949) >>>>>> at java.lang.String.substring(String.java:1916) >>>>>> at py4j.Gateway.invoke(Gateway.java:250) >>>>>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:124) >>>>>> at py4j.commands.CallCommand.execute(CallCommand.java:81) >>>>>> at py4j.GatewayConnection.run(GatewayConnection.java:175) >>>>>> at java.lang.Thread.run(Thread.java:636) >>>>>> >>>>>> >>>>>> I would be happy if somebody can help me find out what exactly the error >>>>>> may be, or if you can indicate where I should look for the problem. >>>>>> >>>>>> Interestingly enough, this problem doesn't occur when running without >>>>>> multiprocessing. It also runs fine with other Java programs. But how can >>>>>> I specify which aspect of the Java program causes the error? >>>>>> >>>>>> best >>>>>> Eleftherios >>>>>> >>>>>> -- >>>>>> MSc. Inf. Eleftherios Avramidis >>>>>> DFKI GmbH, Alt-Moabit 91c, 10559 Berlin >>>>>> Tel. +49-30 238 95-1806 >>>>>> >>>>>> Fax. +49-30 238 95-1810 >>>>>> >>>>>> ------------------------------------------------------------------------------------------- >>>>>> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH >>>>>> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern >>>>>> >>>>>> Geschaeftsfuehrung: >>>>>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) >>>>>> Dr. Walter Olthoff >>>>>> >>>>>> Vorsitzender des Aufsichtsrats: >>>>>> Prof. Dr. h.c. Hans A. Aukes >>>>>> >>>>>> Amtsgericht Kaiserslautern, HRB 2313 >>>>>> ------------------------------------------------------------------------------------------- >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> Live Security Virtual Conference >>>>>> Exclusive live event will cover all the ways today's security and >>>>>> threat landscape has changed and how IT managers can respond. Discussions >>>>>> will include endpoint security, mobile security and the latest in malware >>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>> _______________________________________________ >>>>>> Py4j-users mailing list >>>>>> Py4...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/py4j-users >>>> >>>> -- >>>> MSc. Inf. Eleftherios Avramidis >>>> DFKI GmbH, Alt-Moabit 91c, 10559 Berlin >>>> Tel. +49-30 238 95-1806 >>>> >>>> Fax. +49-30 238 95-1810 >>>> >>>> ------------------------------------------------------------------------------------------- >>>> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH >>>> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern >>>> >>>> Geschaeftsfuehrung: >>>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) >>>> Dr. Walter Olthoff >>>> >>>> Vorsitzender des Aufsichtsrats: >>>> Prof. Dr. h.c. Hans A. Aukes >>>> >>>> Amtsgericht Kaiserslautern, HRB 2313 >>>> ------------------------------------------------------------------------------------------- >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Live Security Virtual Conference >>>> Exclusive live event will cover all the ways today's security and >>>> threat landscape has changed and how IT managers can respond. Discussions >>>> will include endpoint security, mobile security and the latest in malware >>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>> _______________________________________________ >>>> Py4j-users mailing list >>>> Py4...@li... >>>> https://lists.sourceforge.net/lists/listinfo/py4j-users >> >> > > > -- > MSc. Inf. Eleftherios Avramidis > DFKI GmbH, Alt-Moabit 91c, 10559 Berlin > Tel. +49-30 238 95-1806 > > Fax. +49-30 238 95-1810 > > ------------------------------------------------------------------------------------------- > Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH > Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern > > Geschaeftsfuehrung: > Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) > Dr. Walter Olthoff > > Vorsitzender des Aufsichtsrats: > Prof. Dr. h.c. Hans A. Aukes > > Amtsgericht Kaiserslautern, HRB 2313 > ------------------------------------------------------------------------------------------- > |