Jonathon Hare

Welcome to your wiki!

This is the default page, edit it as you see fit. To add a page simply reference it within brackets, e.g.: [SamplePage].

The wiki uses Markdown syntax.

  • Anonymous

    Post awaiting moderation.

  • Anonymous

    Post awaiting moderation.
  • goalgodo

    Is there anyone knowing how to arrange imageterrier running in the framework of hadoop?


  • Anonymous

    Post awaiting moderation.

  • Anonymous

    So, there is no document, right?
    I really want to use it on hadoop, please show me the right track. Thanks alot.

    • Jonathon Hare
      Jonathon Hare

      Here are some brief notes on using ImageTerrier with Hadoop. You'll need a collection of tools from OpenIMAJ (from the hadoop/tools directory; compile each one with mvn assembly:assembly) as well as ImageTerrier & you might need to alter the maven dependencies to work with your version of Hadoop (currently the tools use the Cloudera Hadoop distribution [version 3u5 I think]).

      1. Create a SequenceFile of images and put it on the hdfs. The OpenIMAJ SequenceFileTool can help do this.

      2. Extract local features (i.e. DoG/SIFT) using HadoopLocalFeaturesTool:

      hadoop jar HadoopLocalFeaturesTool.jar --mode SIFT --no-double-size -o hdfs://servername/data/images.seq -i hdfs://servername/data/sift-features.seq

      1. Perform clustering of features to create a vocabulary. Either use OpenIMAJ's ClusterQuantiserTool (on a single machine) or HadoopFastKMeans (on a cluster). You should be able download a pre-trained vocabulary of 1 Million terms for SIFT features from: http://degas.ecs.soton.ac.uk/~jsh2/openimaj/codebooks/mirflickr-1000000-sift-fastkmeans.idx. The resultant vocabulary needs to be placed on the hdfs.

      2. Quantise features into visual terms using the HadoopClusterQuantiserTool (adjust the number of threads to suit your cluster):

      hadoop jar HadoopClusterQuantiserTool.jar -t BINARY_KEYPOINT -i hdfs://servername/data/sift-features.seq -o hdfs://servername/data/quantised-sift-features.seq -q hdfs://servername/data/mirflickr-1000000-sift-fastkmeans.idx -mm MULTITHREAD -j 6

      1. Build an ImageTerrier index using HadoopImageTerrier:

      hadoop jar HadoopImageTerrier.jar -t BASIC -nr 1 -fc QuantisedKeypoint -o hdfs:/servername//data/imageterrier-index.idx -m QUANTISED_FEATURES hdfs://servername/data/quantised-sift-features.seq

      Use the latest SNAPSHOTS/trunk versions of ImageTerrier and OpenIMAJ - things have moved along quite a lot since the last releases.
      HadoopImageTerrier is rather experimental - especially the "-m IMAGES" option!
      * Due to the way Terrier works with HDFS, it seems that you have to use the full hdfs url, including the fully qualified servername.



Cancel   Add attachments