Menu

Problems in using HadoopImageTerrier

Anonymous
2014-07-30
2014-08-07
  • Anonymous

    Anonymous - 2014-07-30

    I followed the steps described in wiki to use HadoopImageTerrier tool to
    create terrier index. But I encountered several problems in using it:

    1. I used the command described in wiki to run HadoopImageTerrier:

    hadoop jar HadoopImageTerrier.jar -t BASIC -nr 1 -fc QuantisedKeypoint -o
    hdfs:/servername//data/imageterrier-index.idx -m QUANTISED_FEATURES
    hdfs://servername/data/quantised-sift-features.seq

    but I got a message to show me it is need to specify the -k parameter. It
    is strange that HadoopImageTerrier uses quantised features to make index.
    There is no clustering procedures to run, why is it need to specify the -k
    parameter?

    1. I specify the -k parameter try to run it on Hadoop 2.0.0-cdh4.7.0 (The
      version OpenIMAJ 1.3 snapshot uses)., but I got an error message:

    Exception in thread "main" java.lang.IncompatibleClassChangeError: Found
    interface org.apache.hadoop.mapreduce.JobContext, but class was expected
    at
    org.imageterrier.hadoop.mapreduce.PositionAwareSequenceFileInputFormat.getSplits(PositionAwareSequenceFileInputFormat.java:71)
    at
    org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:468)
    at
    org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:485)
    at
    org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:369)
    at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1286)
    at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1283)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1283)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1304)
    at
    org.imageterrier.indexers.hadoop.HadoopIndexer.run(HadoopIndexer.java:569)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at
    org.imageterrier.indexers.hadoop.HadoopIndexer.main(HadoopIndexer.java:609)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

    It seems caused by the hadoop versions are incompatible. I found that
    Terrier uses hadoop 0.20.2. Is it the reason to produce the error? Thanks.

     
    • Jonathon Hare

      Jonathon Hare - 2014-07-31

      Regarding 1 - simple answer is that it doesn't need to know; however, unfortunately because of the way the arguments parsing works and the fact that we "borrow" arguments from other tools (i.e. ImageTerrierTools and OpenIMAJ's ClusterQuantiserTool) the -k argument gets marked as required.

      Regarding 2 - yes, this looks like a problem with mixed hadoop versions... I'm trying to make a fix atm.

       
      • Jonathon Hare

        Jonathon Hare - 2014-07-31

        Following up, I've just rebuilt Terrier (and deployed at maven.openimaj.org) to use the same version as OpenIMAJ, and committed the required change to the ImageTerrier maven poms to pick-up the new Terrier version. Untested, but hopefully will work...

         
        • Anonymous

          Anonymous - 2014-08-01

          I tested it and the same error appeared. I am new to hadoop, but from the error message:

          Exception in thread "main" java.lang.IncompatibleClassChangeError: Found
          interface org.apache.hadoop.mapreduce.JobContext, but class was expected
          at
          org.imageterrier.hadoop.mapreduce.PositionAwareSequenceFileInputFormat.getSplits(PositionAwareSequenceFileInputFormat.java:71)

          it seems the problem of mapreduce code section of org.imageterrier.hadoop.mapreduce.PositionAwareSequenceFileInputFormat. Is it need to be tuned according the hadoop 2.0 api? Thanks.

           
  • Jonathon Hare

    Jonathon Hare - 2014-08-01

    I think for some reason you still have the old version being linked... Try a "mvn -U clean install" to force an update, and then verify that the only hadoop version being linked is 2.0.0 by looking through the output of "mvn dependency:tree"

    I've also made a pre-compiled version for you to try here: http://degas.ecs.soton.ac.uk/~jsh2/HadoopImageTerrier-20140801.jar

     
    • Anonymous

      Anonymous - 2014-08-02

      I downloaded the pre-compiled version and it ran without previous problem. But it failed to generate terrier index. Here is the error that I got:

      java.io.IOException: No run status files found in hdfs://localhost:9001/images.idx
      at org.terrier.indexing.HadoopIndexerReducer.loadRunData(HadoopIndexerReducer.java:331)
      at org.terrier.indexing.HadoopIndexerReducer.reduce(HadoopIndexerReducer.java:146)
      at org.terrier.indexing.HadoopIndexerReducer.reduce(HadoopIndexerReducer.java:1)
      at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170)
      at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:648)
      at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:404)
      at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:443)

       
  • Jonathon Hare

    Jonathon Hare - 2014-08-06

    I would guess that there was an earlier error that caused that to happen... can you paste the complete output log?

     
  • Anonymous

    Anonymous - 2014-08-07
    Post awaiting moderation.

Anonymous
Anonymous

Add attachments
Cancel