Re: [Py4j-users] How to resolve "Py4jError: Trying to call a package"
Status: Beta
Brought to you by:
barthe
From: Barthelemy D. <bar...@in...> - 2014-12-31 01:28:15
|
Hi, The TLDR: you are trying to access a class unknown to the JVM. I believe it is not correctly added to the Java classpath OR the classloader of the GatewayServer instance on the JVM does not have access to your class. Someone from the PySpark team may better help you. To check if this is a problem with the classpath/classloader, try something like that: # Sanity test string_class = gateway.jvm.java.lang.Class.forName("java.lang.String") # Will return java.lang.String string_class.getName() # Will return java.lang.Class string_class.getClass().getName() # Will raise an exception if the class is not found by the JVM jar_test_class = gateway.jvm.java.lang.Class.forName("com.mycompany.spark.test.JarTest") Now for the long version: Py4J is extremely lazy and dynamic. It does not try to preload or verify that something exists on the JVM until it absolutely has to. Because of the limited introspection capabilities of the JVM when it comes to available packages, Py4J does not know in advance all available packages and classes. What happens here is that Py4J tries to find a class "JarTest" in the com.mycompany.spark.test package. Because it cannot find such as class, it considers JarTest to be a package. When you try to call ping(), Py4J tells you that you cannot call a method on a package. I agree that the error message could be improved, but it is essentially the same error as if you were trying to call java.lang.ping() Regarding java_import(): this serves the same purpose as the import statement in Java, i.e., it lets you refer to a class with its unqualified name. It does not even try to check if the class or package exists. It does not "load" the class as the import statement does in python. So you could do: # ArrayList2 does not exist, py4j does not complain java_import(gateway.jvm, "java.util.ArrayList2") # ArrayList exists java_import(gateway.jvm, "java.util.ArrayList") # No need to use qualified name. a_list = gateway.jvm.ArrayList() # No need to import a class to use it with a FQN another_list = gateway.jvm.java.util.LinkedList() Hope this helps, Barthelemy On Tue, Dec 30, 2014 at 1:21 PM, Stephen Boesch <ja...@gm...> wrote: > > My team has added a module for pyspark which is a heavy user of py4j. I > have not been successful to invoke the newly added scala/java classes from > python (pyspark) via their java gateway. > > > The pyspark code creates a java gateway: > gateway = JavaGateway(GatewayClient(port=gateway_port), > auto_convert=False) > > Here is an example of existing (/working) pyspark java_gateway code: > > java_import(gateway.jvm, "org.apache.spark.sql.hive.HiveContext") > > I have attempted to import a trivial custom class: > > java_import(gateway.jvm, "com.mycompany.spark.test.JarTest") > > > But when invoking I get: > > tclass = self._jvm.com.mycompany.spark.test.JarTest > tclass.ping() # Error happens on this line > > > Py4JError: Trying to call a package. > > > So I would request guidance / documentation on how to > (a) incorporate java classes into an existing java gateway > (b) how to invoke those classes from the python client side > (c) how to get more visibility/insight into exactly what is going on / > what the meaning is of > "Trying to call a package" > > Thanks! > > > > > > > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming! The Go Parallel Website, > sponsored by Intel and developed in partnership with Slashdot Media, is your > hub for all things parallel software development, from weekly thought > leadership blogs to news, videos, case studies, tutorials and more. Take a > look and join the conversation now. http://goparallel.sourceforge.net > _______________________________________________ > Py4j-users mailing list > Py4...@li... > https://lists.sourceforge.net/lists/listinfo/py4j-users > |