From: John T. <joh...@gm...> - 2008-09-29 15:28:55
|
Hi, I'm just trying to run the basic word count jython program that's distributed with Hadoop. I've included a slightly simplified version below: from org.apache.hadoop.fs import Path from org.apache.hadoop.io import * from org.apache.hadoop.mapred import * import sys import getopt class WordCountMap(Mapper, MapReduceBase): one = IntWritable(1) def map(self, key, value, output, reporter): for w in value.toString().split(): output.collect(Text(w), self.one) class Summer(Reducer, MapReduceBase): def reduce(self, key, values, output, reporter): sum = 0 while values.hasNext(): sum += values.next().get() output.collect(key, IntWritable(sum)) def main(args): conf = JobConf(WordCountMap); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text); conf.setOutputValueClass(IntWritable); conf.setMapperClass(WordCountMap); conf.setCombinerClass(Summer); conf.setReducerClass(Summer); conf.setInputPath(Path("in")) conf.setOutputPath(Path("out")) JobClient.runJob(conf); if __name__ == "__main__": main(sys.argv) When I run "jython WordCount.py" Hadoop apparently has trouble finding my Python classes. Sorry about the length, but here's the full output: 08/09/29 08:23:22 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 08/09/29 08:23:22 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 08/09/29 08:23:22 INFO mapred.FileInputFormat: Total input paths to process : 1 08/09/29 08:23:23 INFO mapred.JobClient: Running job: job_local_1 08/09/29 08:23:23 INFO mapred.MapTask: numReduceTasks: 1 08/09/29 08:23:23 WARN mapred.LocalJobRunner: job_local_1 *java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.python.proxies.__main__$WordCountMap$0 * at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:639) at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:728) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:36) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:204) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:132) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.python.proxies.__main__$WordCountMap$0 at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:607) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:631) ... 6 more Caused by: java.lang.ClassNotFoundException: org.python.proxies.__main__$WordCountMap$0 at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276) at java.lang.ClassLoader.loadClass(ClassLoader.java:251) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:587) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:605) ... 7 more Traceback (innermost last): File "WordCount.py", line 54, in ? File "WordCount.py", line 51, in main at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:894) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) java.io.IOException: java.io.IOException: Job failed! Has anyone seen this type of problem before? And do you know how I can tell Hadoop about the existence of the classes in org.python.proxies.__main__? Best, John |