Hi,

I'm just trying to run the basic word count jython program that's distributed with Hadoop.  I've included a slightly simplified version below:


from org.apache.hadoop.fs import Path
from org.apache.hadoop.io import *
from org.apache.hadoop.mapred import *

import sys
import getopt

class WordCountMap(Mapper, MapReduceBase):
    one = IntWritable(1)
    def map(self, key, value, output, reporter):
        for w in value.toString().split():
            output.collect(Text(w), self.one)

class Summer(Reducer, MapReduceBase):
    def reduce(self, key, values, output, reporter):
        sum = 0
        while values.hasNext():
            sum += values.next().get()
        output.collect(key, IntWritable(sum))

def main(args):
    conf = JobConf(WordCountMap);
    conf.setJobName("wordcount");

    conf.setOutputKeyClass(Text);
    conf.setOutputValueClass(IntWritable);

    conf.setMapperClass(WordCountMap);
    conf.setCombinerClass(Summer);
    conf.setReducerClass(Summer);
    conf.setInputPath(Path("in"))
    conf.setOutputPath(Path("out"))
    JobClient.runJob(conf);

if __name__ == "__main__":
    main(sys.argv)


When I run "jython WordCount.py" Hadoop apparently has trouble finding my Python classes.  Sorry about the length, but here's the full output:


08/09/29 08:23:22 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
08/09/29 08:23:22 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
08/09/29 08:23:22 INFO mapred.FileInputFormat: Total input paths to process : 1
08/09/29 08:23:23 INFO mapred.JobClient: Running job: job_local_1
08/09/29 08:23:23 INFO mapred.MapTask: numReduceTasks: 1
08/09/29 08:23:23 WARN mapred.LocalJobRunner: job_local_1
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.python.proxies.__main__$WordCountMap$0
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:639)
    at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:728)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:36)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:204)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:132)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.python.proxies.__main__$WordCountMap$0
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:607)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:631)
    ... 6 more
Caused by: java.lang.ClassNotFoundException: org.python.proxies.__main__$WordCountMap$0
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:587)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:605)
    ... 7 more
Traceback (innermost last):
  File "WordCount.py", line 54, in ?
  File "WordCount.py", line 51, in main
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:894)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)

java.io.IOException: java.io.IOException: Job failed!


Has anyone seen this type of problem before?  And do you know how I can tell Hadoop about the existence of the classes in org.python.proxies.__main__?

Best,
John