Re: [Py4j-users] How to resolve "Py4jError: Trying to call a package"
Status: Beta
Brought to you by:
barthe
|
From: Barthelemy D. <bar...@in...> - 2014-12-31 01:28:15
|
Hi,
The TLDR: you are trying to access a class unknown to the JVM. I
believe it is not correctly added to the Java classpath OR the
classloader of the GatewayServer instance on the JVM does not have
access to your class. Someone from the PySpark team may better help
you.
To check if this is a problem with the classpath/classloader, try
something like that:
# Sanity test
string_class = gateway.jvm.java.lang.Class.forName("java.lang.String")
# Will return java.lang.String
string_class.getName()
# Will return java.lang.Class
string_class.getClass().getName()
# Will raise an exception if the class is not found by the JVM
jar_test_class =
gateway.jvm.java.lang.Class.forName("com.mycompany.spark.test.JarTest")
Now for the long version: Py4J is extremely lazy and dynamic. It does
not try to preload or verify that something exists on the JVM until it
absolutely has to. Because of the limited introspection capabilities
of the JVM when it comes to available packages, Py4J does not know in
advance all available packages and classes.
What happens here is that Py4J tries to find a class "JarTest" in the
com.mycompany.spark.test package. Because it cannot find such as
class, it considers JarTest to be a package. When you try to call
ping(), Py4J tells you that you cannot call a method on a package. I
agree that the error message could be improved, but it is essentially
the same error as if you were trying to call java.lang.ping()
Regarding java_import(): this serves the same purpose as the import
statement in Java, i.e., it lets you refer to a class with its
unqualified name. It does not even try to check if the class or
package exists. It does not "load" the class as the import statement
does in python. So you could do:
# ArrayList2 does not exist, py4j does not complain
java_import(gateway.jvm, "java.util.ArrayList2")
# ArrayList exists
java_import(gateway.jvm, "java.util.ArrayList")
# No need to use qualified name.
a_list = gateway.jvm.ArrayList()
# No need to import a class to use it with a FQN
another_list = gateway.jvm.java.util.LinkedList()
Hope this helps,
Barthelemy
On Tue, Dec 30, 2014 at 1:21 PM, Stephen Boesch <ja...@gm...> wrote:
>
> My team has added a module for pyspark which is a heavy user of py4j. I
> have not been successful to invoke the newly added scala/java classes from
> python (pyspark) via their java gateway.
>
>
> The pyspark code creates a java gateway:
> gateway = JavaGateway(GatewayClient(port=gateway_port),
> auto_convert=False)
>
> Here is an example of existing (/working) pyspark java_gateway code:
>
> java_import(gateway.jvm, "org.apache.spark.sql.hive.HiveContext")
>
> I have attempted to import a trivial custom class:
>
> java_import(gateway.jvm, "com.mycompany.spark.test.JarTest")
>
>
> But when invoking I get:
>
> tclass = self._jvm.com.mycompany.spark.test.JarTest
> tclass.ping() # Error happens on this line
>
>
> Py4JError: Trying to call a package.
>
>
> So I would request guidance / documentation on how to
> (a) incorporate java classes into an existing java gateway
> (b) how to invoke those classes from the python client side
> (c) how to get more visibility/insight into exactly what is going on /
> what the meaning is of
> "Trying to call a package"
>
> Thanks!
>
>
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming! The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net
> _______________________________________________
> Py4j-users mailing list
> Py4...@li...
> https://lists.sourceforge.net/lists/listinfo/py4j-users
>
|