Build on Red hat 6, cloudera 4.1.2 fails

Help
rtb42
2012-11-28
2012-12-07
  • rtb42

    rtb42 - 2012-11-28

    Building pydoop (0.7.0-rc3) on RHEL6, CDH 4.1.2 fails with

    Building java code for hadoop-2.0.0-cdh4.1.2
    Compiling Java classes
    src/it/crs4/pydoop/NoSeparatorTextOutputFormat.java:25: cannot find symbol
    symbol  : class FileSystem
    location: package org.apache.hadoop.fs
    import org.apache.hadoop.fs.FileSystem;

    (many more, varying classes)

    In fact, poking around I see that setup.py builds a java CLASSPATH almost exclusively from /usr/lib/hadoop-0.20-mapreduce classes, no hdfs or in sight.

    While that can be hacked for the build (in hadoop_utils.py, adding globs for 'hadoop-common.jar' and '*/commons-*.jar' to the classpath), I wonder what the impact on running in a MR job will be? (Obviously, I tried already (build ok) but later ran into all sorts of trouble which casts a doubt that this is right.) 

    Any educated guess?

     
  • Simone Leo

    Simone Leo - 2012-11-28

    Hello,

    Pydoop has not been tested on Red Hat. For CDH4, the build system adds the following jars to the classpath:

    /usr/lib/hadoop/client/*.jar
    /usr/lib/hadoop/hadoop-annotations*.jar
    /usr/lib/hadoop-0.20-mapreduce/hadoop*.jar
    

    On Ubuntu this is enough if hadoop-0.20-conf-pseudo and hadoop-client have been installed. The latter is a convenience package that installs a series of symlinks to jars from other packages:

    /usr/lib/hadoop/client/slf4j-api-1.6.1.jar -> ../lib/slf4j-api-1.6.1.jar
    /usr/lib/hadoop/client/jetty-util-6.1.26.cloudera.2.jar -> ../lib/jetty-util-6.1.26.cloudera.2.jar
    /usr/lib/hadoop/client/avro-1.7.1.cloudera.2.jar -> ../lib/avro-1.7.1.cloudera.2.jar
    /usr/lib/hadoop/client/jetty-6.1.26.cloudera.2.jar -> ../lib/jetty-6.1.26.cloudera.2.jar
    /usr/lib/hadoop/client/hadoop-common-2.0.0-cdh4.1.2.jar -> ../hadoop-common-2.0.0-cdh4.1.2.jar
    /usr/lib/hadoop/client/hadoop-yarn-common-2.0.0-cdh4.1.2.jar -> ../../hadoop-yarn/hadoop-yarn-common-2.0.0-cdh4.1.2.jar
    /usr/lib/hadoop/client/commons-net-3.1.jar -> ../lib/commons-net-3.1.jar
    /usr/lib/hadoop/client/jackson-mapper-asl-1.8.8.jar -> ../lib/jackson-mapper-asl-1.8.8.jar
    /usr/lib/hadoop/client/hadoop-mapreduce-client-jobclient-2.0.0-cdh4.1.2.jar -> ../../hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.0.0-cdh4.1.2.jar
    /usr/lib/hadoop/client/jersey-server-1.8.jar -> ../lib/jersey-server-1.8.jar
    /usr/lib/hadoop/client/xmlenc-0.52.jar -> ../lib/xmlenc-0.52.jar
    /usr/lib/hadoop/client/zookeeper-3.4.3-cdh4.1.2.jar -> ../lib/zookeeper-3.4.3-cdh4.1.2.jar
    /usr/lib/hadoop/client/commons-digester-1.8.jar -> ../lib/commons-digester-1.8.jar
    /usr/lib/hadoop/client/junit-4.8.2.jar -> ../lib/junit-4.8.2.jar
    /usr/lib/hadoop/client/jackson-xc-1.8.8.jar -> ../lib/jackson-xc-1.8.8.jar
    /usr/lib/hadoop/client/hadoop-mapreduce-client-shuffle-2.0.0-cdh4.1.2.jar -> ../../hadoop-mapreduce/hadoop-mapreduce-client-shuffle-2.0.0-cdh4.1.2.jar
    /usr/lib/hadoop/client/jsp-api-2.1.jar -> ../lib/jsp-api-2.1.jar
    /usr/lib/hadoop/client/stax-api-1.0.1.jar -> ../lib/stax-api-1.0.1.jar
    /usr/lib/hadoop/client/jersey-json-1.8.jar -> ../lib/jersey-json-1.8.jar
    /usr/lib/hadoop/client/hadoop-yarn-api-2.0.0-cdh4.1.2.jar -> ../../hadoop-yarn/hadoop-yarn-api-2.0.0-cdh4.1.2.jar
    /usr/lib/hadoop/client/hadoop-auth-2.0.0-cdh4.1.2.jar -> ../hadoop-auth-2.0.0-cdh4.1.2.jar
    /usr/lib/hadoop/client/jsr305-1.3.9.jar -> ../lib/jsr305-1.3.9.jar
    /usr/lib/hadoop/client/jline-0.9.94.jar -> ../lib/jline-0.9.94.jar
    /usr/lib/hadoop/client/jaxb-impl-2.2.3-1.jar -> ../lib/jaxb-impl-2.2.3-1.jar
    /usr/lib/hadoop/client/commons-beanutils-1.7.0.jar -> ../lib/commons-beanutils-1.7.0.jar
    /usr/lib/hadoop/client/mockito-all-1.8.5.jar -> ../lib/mockito-all-1.8.5.jar
    /usr/lib/hadoop/client/log4j-1.2.17.jar -> ../lib/log4j-1.2.17.jar
    /usr/lib/hadoop/client/jasper-runtime-5.5.23.jar -> ../lib/jasper-runtime-5.5.23.jar
    /usr/lib/hadoop/client/commons-cli-1.2.jar -> ../lib/commons-cli-1.2.jar
    /usr/lib/hadoop/client/commons-codec-1.4.jar -> ../lib/commons-codec-1.4.jar
    /usr/lib/hadoop/client/asm-3.2.jar -> ../lib/asm-3.2.jar
    /usr/lib/hadoop/client/hadoop-yarn-server-common-2.0.0-cdh4.1.2.jar -> ../../hadoop-yarn/hadoop-yarn-server-common-2.0.0-cdh4.1.2.jar
    /usr/lib/hadoop/client/jackson-core-asl-1.8.8.jar -> ../lib/jackson-core-asl-1.8.8.jar
    /usr/lib/hadoop/client/commons-math-2.1.jar -> ../lib/commons-math-2.1.jar
    /usr/lib/hadoop/client/netty-3.2.4.Final.jar -> ../../hadoop-yarn/lib/netty-3.2.4.Final.jar
    /usr/lib/hadoop/client/jaxb-api-2.2.2.jar -> ../lib/jaxb-api-2.2.2.jar
    /usr/lib/hadoop/client/commons-el-1.0.jar -> ../lib/commons-el-1.0.jar
    /usr/lib/hadoop/client/hadoop-mapreduce-client-common-2.0.0-cdh4.1.2.jar -> ../../hadoop-mapreduce/hadoop-mapreduce-client-common-2.0.0-cdh4.1.2.jar
    /usr/lib/hadoop/client/snappy-java-1.0.4.1.jar -> ../lib/snappy-java-1.0.4.1.jar
    /usr/lib/hadoop/client/guava-11.0.2.jar -> ../lib/guava-11.0.2.jar
    /usr/lib/hadoop/client/commons-logging-1.1.1.jar -> ../lib/commons-logging-1.1.1.jar
    /usr/lib/hadoop/client/commons-configuration-1.6.jar -> ../lib/commons-configuration-1.6.jar
    /usr/lib/hadoop/client/paranamer-2.3.jar -> ../lib/paranamer-2.3.jar
    /usr/lib/hadoop/client/commons-io-2.1.jar -> ../lib/commons-io-2.1.jar
    /usr/lib/hadoop/client/activation-1.1.jar -> ../lib/activation-1.1.jar
    /usr/lib/hadoop/client/jackson-jaxrs-1.8.8.jar -> ../lib/jackson-jaxrs-1.8.8.jar
    /usr/lib/hadoop/client/commons-lang-2.5.jar -> ../lib/commons-lang-2.5.jar
    /usr/lib/hadoop/client/hadoop-mapreduce-client-core-2.0.0-cdh4.1.2.jar -> ../../hadoop-mapreduce/hadoop-mapreduce-client-core-2.0.0-cdh4.1.2.jar
    /usr/lib/hadoop/client/commons-beanutils-core-1.8.0.jar -> ../lib/commons-beanutils-core-1.8.0.jar
    /usr/lib/hadoop/client/servlet-api-2.5.jar -> ../lib/servlet-api-2.5.jar
    /usr/lib/hadoop/client/commons-collections-3.2.1.jar -> ../lib/commons-collections-3.2.1.jar
    /usr/lib/hadoop/client/jersey-core-1.8.jar -> ../lib/jersey-core-1.8.jar
    /usr/lib/hadoop/client/jsch-0.1.42.jar -> ../lib/jsch-0.1.42.jar
    /usr/lib/hadoop/client/jettison-1.1.jar -> ../lib/jettison-1.1.jar
    /usr/lib/hadoop/client/hadoop-hdfs-2.0.0-cdh4.1.2.jar -> ../../hadoop-hdfs/hadoop-hdfs-2.0.0-cdh4.1.2.jar
    /usr/lib/hadoop/client/protobuf-java-2.4.0a.jar -> ../lib/protobuf-java-2.4.0a.jar
    /usr/lib/hadoop/client/slf4j-log4j12-1.6.1.jar -> ../lib/slf4j-log4j12-1.6.1.jar
    /usr/lib/hadoop/client/hadoop-mapreduce-client-app-2.0.0-cdh4.1.2.jar -> ../../hadoop-mapreduce/hadoop-mapreduce-client-app-2.0.0-cdh4.1.2.jar
    /usr/lib/hadoop/client-0.20/slf4j-api-1.6.1.jar -> ../lib/slf4j-api-1.6.1.jar
    /usr/lib/hadoop/client-0.20/jetty-util-6.1.26.cloudera.2.jar -> ../lib/jetty-util-6.1.26.cloudera.2.jar
    /usr/lib/hadoop/client-0.20/avro-1.7.1.cloudera.2.jar -> ../lib/avro-1.7.1.cloudera.2.jar
    /usr/lib/hadoop/client-0.20/jetty-6.1.26.cloudera.2.jar -> ../lib/jetty-6.1.26.cloudera.2.jar
    /usr/lib/hadoop/client-0.20/hadoop-common-2.0.0-cdh4.1.2.jar -> ../hadoop-common-2.0.0-cdh4.1.2.jar
    /usr/lib/hadoop/client-0.20/commons-net-3.1.jar -> ../lib/commons-net-3.1.jar
    /usr/lib/hadoop/client-0.20/jackson-mapper-asl-1.8.8.jar -> ../lib/jackson-mapper-asl-1.8.8.jar
    /usr/lib/hadoop/client-0.20/jersey-server-1.8.jar -> ../lib/jersey-server-1.8.jar
    /usr/lib/hadoop/client-0.20/xmlenc-0.52.jar -> ../lib/xmlenc-0.52.jar
    /usr/lib/hadoop/client-0.20/zookeeper-3.4.3-cdh4.1.2.jar -> ../lib/zookeeper-3.4.3-cdh4.1.2.jar
    /usr/lib/hadoop/client-0.20/commons-digester-1.8.jar -> ../lib/commons-digester-1.8.jar
    /usr/lib/hadoop/client-0.20/junit-4.8.2.jar -> ../lib/junit-4.8.2.jar
    /usr/lib/hadoop/client-0.20/jsp-api-2.1.jar -> ../lib/jsp-api-2.1.jar
    /usr/lib/hadoop/client-0.20/hadoop-core-2.0.0-mr1-cdh4.1.2.jar -> ../../hadoop-0.20-mapreduce/hadoop-core-2.0.0-mr1-cdh4.1.2.jar
    /usr/lib/hadoop/client-0.20/hadoop-auth-2.0.0-cdh4.1.2.jar -> ../hadoop-auth-2.0.0-cdh4.1.2.jar
    /usr/lib/hadoop/client-0.20/jsr305-1.3.9.jar -> ../lib/jsr305-1.3.9.jar
    /usr/lib/hadoop/client-0.20/jline-0.9.94.jar -> ../lib/jline-0.9.94.jar
    /usr/lib/hadoop/client-0.20/commons-beanutils-1.7.0.jar -> ../lib/commons-beanutils-1.7.0.jar
    /usr/lib/hadoop/client-0.20/mockito-all-1.8.5.jar -> ../lib/mockito-all-1.8.5.jar
    /usr/lib/hadoop/client-0.20/log4j-1.2.17.jar -> ../lib/log4j-1.2.17.jar
    /usr/lib/hadoop/client-0.20/jasper-runtime-5.5.23.jar -> ../lib/jasper-runtime-5.5.23.jar
    /usr/lib/hadoop/client-0.20/commons-cli-1.2.jar -> ../lib/commons-cli-1.2.jar
    /usr/lib/hadoop/client-0.20/commons-codec-1.4.jar -> ../lib/commons-codec-1.4.jar
    /usr/lib/hadoop/client-0.20/asm-3.2.jar -> ../lib/asm-3.2.jar
    /usr/lib/hadoop/client-0.20/jackson-core-asl-1.8.8.jar -> ../lib/jackson-core-asl-1.8.8.jar
    /usr/lib/hadoop/client-0.20/commons-math-2.1.jar -> ../lib/commons-math-2.1.jar
    /usr/lib/hadoop/client-0.20/hsqldb-1.8.0.10.jar -> ../../hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar
    /usr/lib/hadoop/client-0.20/commons-el-1.0.jar -> ../lib/commons-el-1.0.jar
    /usr/lib/hadoop/client-0.20/snappy-java-1.0.4.1.jar -> ../lib/snappy-java-1.0.4.1.jar
    /usr/lib/hadoop/client-0.20/oro-2.0.8.jar -> ../../hadoop-0.20-mapreduce/lib/oro-2.0.8.jar
    /usr/lib/hadoop/client-0.20/guava-11.0.2.jar -> ../lib/guava-11.0.2.jar
    /usr/lib/hadoop/client-0.20/commons-logging-1.1.1.jar -> ../lib/commons-logging-1.1.1.jar
    /usr/lib/hadoop/client-0.20/commons-configuration-1.6.jar -> ../lib/commons-configuration-1.6.jar
    /usr/lib/hadoop/client-0.20/paranamer-2.3.jar -> ../lib/paranamer-2.3.jar
    /usr/lib/hadoop/client-0.20/commons-io-2.1.jar -> ../lib/commons-io-2.1.jar
    /usr/lib/hadoop/client-0.20/commons-lang-2.5.jar -> ../lib/commons-lang-2.5.jar
    /usr/lib/hadoop/client-0.20/commons-beanutils-core-1.8.0.jar -> ../lib/commons-beanutils-core-1.8.0.jar
    /usr/lib/hadoop/client-0.20/servlet-api-2.5.jar -> ../lib/servlet-api-2.5.jar
    /usr/lib/hadoop/client-0.20/commons-collections-3.2.1.jar -> ../lib/commons-collections-3.2.1.jar
    /usr/lib/hadoop/client-0.20/jersey-core-1.8.jar -> ../lib/jersey-core-1.8.jar
    /usr/lib/hadoop/client-0.20/jsch-0.1.42.jar -> ../lib/jsch-0.1.42.jar
    /usr/lib/hadoop/client-0.20/hadoop-hdfs-2.0.0-cdh4.1.2.jar -> ../../hadoop-hdfs/hadoop-hdfs-2.0.0-cdh4.1.2.jar
    /usr/lib/hadoop/client-0.20/protobuf-java-2.4.0a.jar -> ../lib/protobuf-java-2.4.0a.jar
    /usr/lib/hadoop/client-0.20/slf4j-log4j12-1.6.1.jar -> ../lib/slf4j-log4j12-1.6.1.jar
    

    Maybe there is a similar package in the RH suite provided by Cloudera. Any feedback on this is highly appreciated ;-)

    Here is the full <PACKAGE>: <JAR> list on Ubuntu, after resolving symlinks:

    hadoop-0.20-mapreduce: /usr/lib/hadoop-0.20-mapreduce/hadoop-ant-2.0.0-mr1-cdh4.1.2.jar
    hadoop-0.20-mapreduce: /usr/lib/hadoop-0.20-mapreduce/hadoop-core-2.0.0-mr1-cdh4.1.2.jar
    hadoop-0.20-mapreduce: /usr/lib/hadoop-0.20-mapreduce/hadoop-examples-2.0.0-mr1-cdh4.1.2.jar
    hadoop-0.20-mapreduce: /usr/lib/hadoop-0.20-mapreduce/hadoop-test-2.0.0-mr1-cdh4.1.2.jar
    hadoop-0.20-mapreduce: /usr/lib/hadoop-0.20-mapreduce/hadoop-tools-2.0.0-mr1-cdh4.1.2.jar
    hadoop-hdfs: /usr/lib/hadoop-hdfs/hadoop-hdfs-2.0.0-cdh4.1.2.jar
    hadoop-mapreduce: /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-app-2.0.0-cdh4.1.2.jar
    hadoop-mapreduce: /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common-2.0.0-cdh4.1.2.jar
    hadoop-mapreduce: /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-2.0.0-cdh4.1.2.jar
    hadoop-mapreduce: /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.0.0-cdh4.1.2.jar
    hadoop-mapreduce: /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-shuffle-2.0.0-cdh4.1.2.jar
    hadoop: /usr/lib/hadoop/hadoop-annotations-2.0.0-cdh4.1.2.jar
    hadoop: /usr/lib/hadoop/hadoop-auth-2.0.0-cdh4.1.2.jar
    hadoop: /usr/lib/hadoop/hadoop-common-2.0.0-cdh4.1.2.jar
    hadoop: /usr/lib/hadoop/lib/activation-1.1.jar
    hadoop: /usr/lib/hadoop/lib/asm-3.2.jar
    hadoop: /usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar
    hadoop: /usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar
    hadoop: /usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar
    hadoop: /usr/lib/hadoop/lib/commons-cli-1.2.jar
    hadoop: /usr/lib/hadoop/lib/commons-codec-1.4.jar
    hadoop: /usr/lib/hadoop/lib/commons-collections-3.2.1.jar
    hadoop: /usr/lib/hadoop/lib/commons-configuration-1.6.jar
    hadoop: /usr/lib/hadoop/lib/commons-digester-1.8.jar
    hadoop: /usr/lib/hadoop/lib/commons-el-1.0.jar
    hadoop: /usr/lib/hadoop/lib/commons-io-2.1.jar
    hadoop: /usr/lib/hadoop/lib/commons-lang-2.5.jar
    hadoop: /usr/lib/hadoop/lib/commons-logging-1.1.1.jar
    hadoop: /usr/lib/hadoop/lib/commons-math-2.1.jar
    hadoop: /usr/lib/hadoop/lib/commons-net-3.1.jar
    hadoop: /usr/lib/hadoop/lib/guava-11.0.2.jar
    hadoop: /usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar
    hadoop: /usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar
    hadoop: /usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar
    hadoop: /usr/lib/hadoop/lib/jackson-xc-1.8.8.jar
    hadoop: /usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar
    hadoop: /usr/lib/hadoop/lib/jaxb-api-2.2.2.jar
    hadoop: /usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar
    hadoop: /usr/lib/hadoop/lib/jersey-core-1.8.jar
    hadoop: /usr/lib/hadoop/lib/jersey-json-1.8.jar
    hadoop: /usr/lib/hadoop/lib/jersey-server-1.8.jar
    hadoop: /usr/lib/hadoop/lib/jettison-1.1.jar
    hadoop: /usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar
    hadoop: /usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar
    hadoop: /usr/lib/hadoop/lib/jline-0.9.94.jar
    hadoop: /usr/lib/hadoop/lib/jsch-0.1.42.jar
    hadoop: /usr/lib/hadoop/lib/jsp-api-2.1.jar
    hadoop: /usr/lib/hadoop/lib/jsr305-1.3.9.jar
    hadoop: /usr/lib/hadoop/lib/junit-4.8.2.jar
    hadoop: /usr/lib/hadoop/lib/log4j-1.2.17.jar
    hadoop: /usr/lib/hadoop/lib/mockito-all-1.8.5.jar
    hadoop: /usr/lib/hadoop/lib/paranamer-2.3.jar
    hadoop: /usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar
    hadoop: /usr/lib/hadoop/lib/servlet-api-2.5.jar
    hadoop: /usr/lib/hadoop/lib/slf4j-api-1.6.1.jar
    hadoop: /usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar
    hadoop: /usr/lib/hadoop/lib/stax-api-1.0.1.jar
    hadoop: /usr/lib/hadoop/lib/xmlenc-0.52.jar
    hadoop-yarn: /usr/lib/hadoop-yarn/hadoop-yarn-api-2.0.0-cdh4.1.2.jar
    hadoop-yarn: /usr/lib/hadoop-yarn/hadoop-yarn-common-2.0.0-cdh4.1.2.jar
    hadoop-yarn: /usr/lib/hadoop-yarn/hadoop-yarn-server-common-2.0.0-cdh4.1.2.jar
    hadoop-yarn: /usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar
    zookeeper: /usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar
    zookeeper: /usr/lib/zookeeper/zookeeper-3.4.3-cdh4.1.2.jar
    
     
  • rtb42

    rtb42 - 2012-11-29

    Thanks,

    The hadoop-client package did not help, but I managed to sort it out in a couple of iterations, down to successfully running wordcount_full.py.

    I "fixed" the classpath in hadoop_utils.py, don't know whether this is the best place. Here's my patch, in case it helps others:

    commit 668ae7dd256beacd968ea064fbde464548272219
    Author: Toebbicke <tobbicke@rtb-big-mac.cern.ch>
    Date:   Wed Nov 28 15:07:03 2012 +0100
        correct classpath for build & basic tests on CDH 4.1.2
    diff --git a/pydoop/hadoop_utils.py b/pydoop/hadoop_utils.py
    index d900067..62a453f 100644
    --- a/pydoop/hadoop_utils.py
    +++ b/pydoop/hadoop_utils.py
    @@ -296,9 +296,14 @@ class PathFinder(object):
           else:  # FIXME: this only covers installed-from-package CDH, not tarball
             hadoop_home = "/usr/lib/hadoop"
             mr1_home = "/usr/lib/hadoop-0.20-mapreduce"
    +        hdfs_home = "/usr/lib/hadoop-hdfs"
             self.__hadoop_classpath = ':'.join(
               glob.glob(os.path.join(hadoop_home, 'client', '*.jar')) +
    -          glob.glob(os.path.join(hadoop_home, 'hadoop-annotations*.jar')) +
    +          glob.glob(os.path.join(hadoop_home, 'hadoop-annotations.jar')) +
    +          glob.glob(os.path.join(hadoop_home, 'hadoop-common.jar')) +
    +          glob.glob(os.path.join(hadoop_home, 'hadoop-auth.jar')) +
    +          glob.glob(os.path.join(hadoop_home, 'lib/*.jar')) +
    +          glob.glob(os.path.join(hdfs_home, 'hadoop-hdfs.jar')) +
               glob.glob(os.path.join(mr1_home, 'hadoop*.jar'))
               )
         return self.__hadoop_classpath
    

     

     

Log in to post a comment.