Re: [Py4j-users] Multiprocessing error
Status: Beta
Brought to you by:
barthe
|
From: Barthelemy D. <ba...@cs...> - 2012-06-22 22:47:02
|
Hi,
I'm a bit puzzled.
This error, py4j.Py4JException: Method __getnewargs__([]) does not exist, means that something is trying to pickle a JavaObject instance. This typically happens if you try to share an object with multiprocessing.
Here are a couple of debug statements I would add:
1. Try to place the import statements in singlethread(…). I just want to make sure that Py4J is not initializing some global states that are shared between multiprocessing (I don't think so because I used multiprocessing with Py4J in the past, but this is just to eliminate a possibility).
2. print the id the objects you create, e.g., print(config._target_id). The error message contains an object id that we can use to track down which object is supposed to have this __getnewargs__ method.
3. debug or print a message between each line after the java_import statements, to know exactly when the error occurs.
Thanks,
Barthélémy
On 2012-06-22, at 10:52 AM, Eleftherios Avramidis wrote:
> Hi,
>
> I am sorry, I missed two lines:
>
>
> from py4j.java_gateway import JavaGateway, GatewayClient, java_import
>
> import multiprocessing, time, random
>
> def singlethread(socket_no):
> print "Thread starting"
>
> gatewayclient = GatewayClient('localhost', socket_no)
> gateway = JavaGateway(gatewayclient, auto_convert=True, auto_field=True)
> #create a new view for the jvm
> meteor_view = gateway.new_jvm_view()
> #import required packages
> java_import(meteor_view, 'edu.cmu.meteor.scorer.*')
> #initialize the java object
> java_import(meteor_view, 'edu.cmu.meteor.util.*')
> #pass the language setting into the meteor configuration object
> config = meteor_view.MeteorConfiguration();
> config.setLanguage("en");
> scorer = meteor_view.MeteorScorer(config)
> print "object initialized"
> #run object function
> stats = scorer.getMeteorStats("Test sentence", "Test sentence !");
> print stats.score
> return 1
>
>
> if __name__ == '__main__':
> socket_no = 25336
> print "Gclient started"
>
> p = multiprocessing.Pool(3)
> print "Multipool initialized"
> p.map(singlethread, [socket_no, socket_no,socket_no])
>
>
>
>
> On 22/06/12 16:50, Eleftherios Avramidis wrote:
>> Hi Barthélémy,
>>
>> I just tried rewriting the code as following. Two threads printed out their result successfully, the third pool worker gave a similar error. (I am running this in a 2-core machine)
>>
>>
>> from py4j.java_gateway import JavaGateway, GatewayClient, java_import
>>
>> import multiprocessing, time, random
>>
>> def singlethread(socket_no):
>> print "Thread starting"
>> gatewayclient = GatewayClient('localhost', socket_no)
>> gateway = JavaGateway(gatewayclient, auto_convert=True, auto_field=True)
>> #create a new view for the jvm
>> meteor_view = gateway.new_jvm_view()
>> #import required packages
>> java_import(meteor_view, 'edu.cmu.meteor.scorer.*')
>> #initialize the java object
>> java_import(meteor_view, 'edu.cmu.meteor.util.*')
>> #pass the language setting into the meteor configuration object
>> config = meteor_view.MeteorConfiguration();
>> config.setLanguage("en");
>> scorer = meteor_view.MeteorScorer(config)
>> print "object initialized"
>> #run object function
>> stats = scorer.getMeteorStats("Test sentence", "Test sentence !");
>> print stats.score
>> return 1
>>
>>
>> if __name__ == '__main__':
>> socket_no = 25336
>> # gatewayclient = GatewayClient('localhost', socket_no)
>> print "Gclient started"
>>
>>
>>
>> output:
>>
>> Gclient started
>> Multipool initialized
>> Thread starting
>> Thread starting
>> Thread starting
>> Process PoolWorker-3:
>> Traceback (most recent call last):
>> File "/usr/local/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
>> self.run()
>> File "/usr/local/lib/python2.7/multiprocessing/process.py", line 114, in run
>> self._target(*self._args, **self._kwargs)
>> File "/usr/local/lib/python2.7/multiprocessing/pool.py", line 99, in worker
>> put((job, i, result))
>> File "/usr/local/lib/python2.7/multiprocessing/queues.py", line 392, in put
>> return send(obj)
>> File "/usr/local/lib/python2.7/site-packages/py4j/java_gateway.py", line 432, in __call__
>> self.target_id, self.name)
>> File "/usr/local/lib/python2.7/site-packages/py4j/protocol.py", line 271, in get_return_value
>> raise Py4JError('An error occurred while calling %s%s%s. Trace:\n%s\n' % (target_id, '.', name, value))
>> Py4JError: An error occurred while calling o6.__getnewargs__. Trace:
>> py4j.Py4JException: Method __getnewargs__([]) does not exist
>> at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:346)
>> at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:355)
>> at py4j.Gateway.invoke(Gateway.java:247)
>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:124)
>> at py4j.commands.CallCommand.execute(CallCommand.java:81)
>> at py4j.GatewayConnection.run(GatewayConnection.java:175)
>> at java.lang.Thread.run(Thread.java:636)
>>
>>
>> object initialized
>> 0.335206780367
>> object initialized
>> 0.335206780367
>>
>>
>> Any suggestions welcome,
>>
>> Lefteris
>>
>>
>> On 22/06/12 16:25, Barthelemy Dagenais wrote:
>>> Hi Eleftherios,
>>>
>>> in this code:
>>>
>>> if __name__ == '__main__':
>>> socket_no = 25336
>>> gatewayclient = GatewayClient('localhost', socket_no)
>>> print "Gclient started"
>>> gateway = JavaGateway(gatewayclient, auto_convert=True,
>>> auto_field=True)
>>> print "Gateway started"
>>>
>>> p = multiprocessing.Pool(2)
>>> print "Multipool initialized"
>>> p.map(singlethread, [gateway, gateway, gateway])
>>>
>>> You are creating only one instance of JavaGateway and this instance is passed to each worker (the list [gateway, gateway, gateway] is in fact a list of three pointers pointing to the same instance). I usually recommend to create one JavaGateway instance per worker.
>>>
>>> Unless this was not the code that caused the problem, I would start by creating a JavaGateway in each worker and then, if you still encounter the problem, we can start looking at other possibilities.
>>>
>>> Thanks,
>>> Barthélémy
>>>
>>>
>>> On 2012-06-22, at 10:20 AM, Eleftherios Avramidis wrote:
>>>
>>>> Hi Barthélémy,
>>>>
>>>> thanks for your quick response. I am trying to understand your suggestion:
>>>> - On the Java side I have initialized GatewayServer on socket 25336
>>>> - On the Python side I initialized one GatewayClient per thread,
>>>> connecting to socket 25336.
>>>>
>>>> Is this the right solution so that the sockets are not being shared?
>>>>
>>>> best
>>>> Eleftherios
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 22/06/12 15:55, Barthelemy Dagenais wrote:
>>>>> Hi,
>>>>>
>>>>> Please try to initialize the gateway in the thread/process and not outside. I believe you may be sharing sockets across threads/processes, which would explain the weird behavior you are observing.
>>>>>
>>>>> Barthélémy
>>>>>
>>>>> On 2012-06-22, at 9:08 AM, Eleftherios Avramidis wrote:
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> I would like your help with a particular issue which occurs when I am
>>>>>> trying to use python multiprocessing, along with Py4j Java objects.
>>>>>>
>>>>>>
>>>>>> def singlethread(gateway):
>>>>>> print "Thread starting"
>>>>>>
>>>>>> #create a new view for the jvm
>>>>>> meteor_view = gateway.new_jvm_view()
>>>>>> #import required packages
>>>>>> java_import(meteor_view, 'edu.cmu.meteor.scorer.*')
>>>>>> #initialize the java object
>>>>>> scorer = meteor_view.MeteorScorer()
>>>>>> print "object initialized"
>>>>>> #run object function
>>>>>> stats = scorer.getMeteorStats("Test sentence", "Test sentence !");
>>>>>> print stats.score
>>>>>>
>>>>>> if __name__ == '__main__':
>>>>>> socket_no = 25336
>>>>>> gatewayclient = GatewayClient('localhost', socket_no)
>>>>>> print "Gclient started"
>>>>>> gateway = JavaGateway(gatewayclient, auto_convert=True,
>>>>>> auto_field=True)
>>>>>> print "Gateway started"
>>>>>>
>>>>>> p = multiprocessing.Pool(2)
>>>>>> print "Multipool initialized"
>>>>>> p.map(singlethread, [gateway, gateway, gateway])
>>>>>>
>>>>>>
>>>>>>
>>>>>> output:
>>>>>>
>>>>>> Gclient started
>>>>>> Gateway started
>>>>>> Multipool initialized
>>>>>> Exception in thread Thread-2:
>>>>>> Traceback (most recent call last):
>>>>>> File "/usr/local/lib/python2.7/threading.py", line 552, in
>>>>>> __bootstrap_inner
>>>>>> self.run()
>>>>>> File "/usr/local/lib/python2.7/threading.py", line 505, in run
>>>>>> self.__target(*self.__args, **self.__kwargs)
>>>>>> File "/usr/local/lib/python2.7/multiprocessing/pool.py", line 313, in
>>>>>> _handle_tasks
>>>>>> put(task)
>>>>>> File "/usr/local/lib/python2.7/site-packages/py4j/java_gateway.py",
>>>>>> line 432, in __call__
>>>>>> self.target_id, self.name)
>>>>>> File "/usr/local/lib/python2.7/site-packages/py4j/protocol.py", line
>>>>>> 271, in get_return_value
>>>>>> raise Py4JError('An error occurred while calling %s%s%s.
>>>>>> Trace:\n%s\n' % (target_id, '.', name, value))
>>>>>> Py4JError: An error occurred while calling t.__getnewargs__. Trace:
>>>>>> java.lang.StringIndexOutOfBoundsException: String index out of range: -1
>>>>>> at java.lang.String.substring(String.java:1949)
>>>>>> at java.lang.String.substring(String.java:1916)
>>>>>> at py4j.Gateway.invoke(Gateway.java:250)
>>>>>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:124)
>>>>>> at py4j.commands.CallCommand.execute(CallCommand.java:81)
>>>>>> at py4j.GatewayConnection.run(GatewayConnection.java:175)
>>>>>> at java.lang.Thread.run(Thread.java:636)
>>>>>>
>>>>>>
>>>>>> I would be happy if somebody can help me find out what exactly the error
>>>>>> may be, or if you can indicate where I should look for the problem.
>>>>>>
>>>>>> Interestingly enough, this problem doesn't occur when running without
>>>>>> multiprocessing. It also runs fine with other Java programs. But how can
>>>>>> I specify which aspect of the Java program causes the error?
>>>>>>
>>>>>> best
>>>>>> Eleftherios
>>>>>>
>>>>>> --
>>>>>> MSc. Inf. Eleftherios Avramidis
>>>>>> DFKI GmbH, Alt-Moabit 91c, 10559 Berlin
>>>>>> Tel. +49-30 238 95-1806
>>>>>>
>>>>>> Fax. +49-30 238 95-1810
>>>>>>
>>>>>> -------------------------------------------------------------------------------------------
>>>>>> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
>>>>>> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
>>>>>>
>>>>>> Geschaeftsfuehrung:
>>>>>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
>>>>>> Dr. Walter Olthoff
>>>>>>
>>>>>> Vorsitzender des Aufsichtsrats:
>>>>>> Prof. Dr. h.c. Hans A. Aukes
>>>>>>
>>>>>> Amtsgericht Kaiserslautern, HRB 2313
>>>>>> -------------------------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> Live Security Virtual Conference
>>>>>> Exclusive live event will cover all the ways today's security and
>>>>>> threat landscape has changed and how IT managers can respond. Discussions
>>>>>> will include endpoint security, mobile security and the latest in malware
>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>>>> _______________________________________________
>>>>>> Py4j-users mailing list
>>>>>> Py4...@li...
>>>>>> https://lists.sourceforge.net/lists/listinfo/py4j-users
>>>>
>>>> --
>>>> MSc. Inf. Eleftherios Avramidis
>>>> DFKI GmbH, Alt-Moabit 91c, 10559 Berlin
>>>> Tel. +49-30 238 95-1806
>>>>
>>>> Fax. +49-30 238 95-1810
>>>>
>>>> -------------------------------------------------------------------------------------------
>>>> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
>>>> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
>>>>
>>>> Geschaeftsfuehrung:
>>>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
>>>> Dr. Walter Olthoff
>>>>
>>>> Vorsitzender des Aufsichtsrats:
>>>> Prof. Dr. h.c. Hans A. Aukes
>>>>
>>>> Amtsgericht Kaiserslautern, HRB 2313
>>>> -------------------------------------------------------------------------------------------
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Live Security Virtual Conference
>>>> Exclusive live event will cover all the ways today's security and
>>>> threat landscape has changed and how IT managers can respond. Discussions
>>>> will include endpoint security, mobile security and the latest in malware
>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>> _______________________________________________
>>>> Py4j-users mailing list
>>>> Py4...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/py4j-users
>>
>>
>
>
> --
> MSc. Inf. Eleftherios Avramidis
> DFKI GmbH, Alt-Moabit 91c, 10559 Berlin
> Tel. +49-30 238 95-1806
>
> Fax. +49-30 238 95-1810
>
> -------------------------------------------------------------------------------------------
> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
>
> Geschaeftsfuehrung:
> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
> Dr. Walter Olthoff
>
> Vorsitzender des Aufsichtsrats:
> Prof. Dr. h.c. Hans A. Aukes
>
> Amtsgericht Kaiserslautern, HRB 2313
> -------------------------------------------------------------------------------------------
>
|