Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

Home

Jonathon Hare

Welcome to your wiki!

This is the default page, edit it as you see fit. To add a page simply reference it within brackets, e.g.: [SamplePage].

The wiki uses Markdown syntax.


1 2 > >> (Page 1 of 2)

  • Anonymous
    2012-04-27

    Post awaiting moderation.

  • Anonymous
    2012-11-06

    Post awaiting moderation.
  • goalgodo
    goalgodo
    2012-11-06

    Is there anyone knowing how to arrange imageterrier running in the framework of hadoop?

     

  • Anonymous
    2013-10-28

    Post awaiting moderation.

  • Anonymous
    2014-04-13

    So, there is no document, right?
    I really want to use it on hadoop, please show me the right track. Thanks alot.

     
    • Jonathon Hare
      Jonathon Hare
      2014-04-17

      Here are some brief notes on using ImageTerrier with Hadoop. You'll need a collection of tools from OpenIMAJ (from the hadoop/tools directory; compile each one with mvn assembly:assembly) as well as ImageTerrier & you might need to alter the maven dependencies to work with your version of Hadoop (currently the tools use the Cloudera Hadoop distribution [version 3u5 I think]).

      1. Create a SequenceFile of images and put it on the hdfs. The OpenIMAJ SequenceFileTool can help do this.

      2. Extract local features (i.e. DoG/SIFT) using HadoopLocalFeaturesTool:

      hadoop jar HadoopLocalFeaturesTool.jar --mode SIFT --no-double-size -o hdfs://servername/data/images.seq -i hdfs://servername/data/sift-features.seq

      1. Perform clustering of features to create a vocabulary. Either use OpenIMAJ's ClusterQuantiserTool (on a single machine) or HadoopFastKMeans (on a cluster). You should be able download a pre-trained vocabulary of 1 Million terms for SIFT features from: http://degas.ecs.soton.ac.uk/~jsh2/openimaj/codebooks/mirflickr-1000000-sift-fastkmeans.idx. The resultant vocabulary needs to be placed on the hdfs.

      2. Quantise features into visual terms using the HadoopClusterQuantiserTool (adjust the number of threads to suit your cluster):

      hadoop jar HadoopClusterQuantiserTool.jar -t BINARY_KEYPOINT -i hdfs://servername/data/sift-features.seq -o hdfs://servername/data/quantised-sift-features.seq -q hdfs://servername/data/mirflickr-1000000-sift-fastkmeans.idx -mm MULTITHREAD -j 6

      1. Build an ImageTerrier index using HadoopImageTerrier:

      hadoop jar HadoopImageTerrier.jar -t BASIC -nr 1 -fc QuantisedKeypoint -o hdfs:/servername//data/imageterrier-index.idx -m QUANTISED_FEATURES hdfs://servername/data/quantised-sift-features.seq

      Notes:
      Use the latest SNAPSHOTS/trunk versions of ImageTerrier and OpenIMAJ - things have moved along quite a lot since the last releases.
      HadoopImageTerrier is rather experimental - especially the "-m IMAGES" option!
      * Due to the way Terrier works with HDFS, it seems that you have to use the full hdfs url, including the fully qualified servername.

       
  • Thank for tips, Jonathon.

    Nevertheless, I'm stuck with step "Quantise features into visual terms using the HadoopClusterQuantiserTool"

    I've tried hadoop jar HadoopClusterQuantiserTool.jar -t BINARY_KEYPOINT -i hdfs://localhost/sift-features.seq -o hdfs://localhost/quantised-sift-features.seq -q hdfs://localhost/mirflickr-1000000-sift-fastkmeans.voc

    but it gives me following error:

    Could not identify the clustertype. File: hdfs://localhost/mirflickr-1000000-sift-fastkmeans.voc

    Can you provide more details on quantification step?

     
    Last edit: Kazakov Alexander 2014-04-28
    • Jonathon Hare
      Jonathon Hare
      2014-05-12

      Ah, can you try with this vocabulary instead: http://degas.ecs.soton.ac.uk/~jsh2/openimaj/codebooks/mirflickr-sift-fastkmeans-1000000-new.voc

      I think the other one had the old format header from earlier versions of OpenIMAJ.

       

      • Anonymous
        2014-06-17

        Hi Hare,
        How to use this existing vocabulary to build a index in a single server(without hadoop)? I have around 8 millions images, how much memory cost will be required? (The server's memory size is 256G) Can I use the following command:

        java -Xmx250G -jar ImageTerrierTools-3.5-SNAPSHOT-jar-with-dependencies.jar BasicIndexer -m IMAGES -t POSITION -pm SPATIAL -qt RANDOMSET -k 1000000 -p BYTE -q ./mirflickr-sift-fastkmeans-1000000-new.voc -o ./mirflickr-sift-fastkmeans-1000000-new.idx -i image_paths.txt -v
        #-i [file] I add read file list fucntion instead of searching for image file here
        
         
        • Jonathon Hare
          Jonathon Hare
          2014-06-17

          I think that will work. You shouldn't need anywhere near that much ram though - it can probably be done in around 4G to 8G (Terrier's single pass indexing strategy is designed to require relatively small amounts of ram irrespective of the number of documents). Memory for feature extraction from the images will be dependent on the number of CPU cores on the server and image size. Unless you've got a lot of cores, indexing 8m images will take a long time.

           

  • Anonymous
    2014-06-19

    Hi Hare,
    I'm running BasicIndexer on oxford5k benchmark and mirflickr-sift-fastkmeans-1000000-new.voc, the QuantiserTask takes about 1min per image. Is it because I'm extracting too much feature per image?

     
    • Jonathon Hare
      Jonathon Hare
      2014-06-19

      Yes, probably... If I remember correctly some of the oxford dataset images are very large and result in very large numbers of local features, which in turn take a long while to quantise.

       

  • Anonymous
    2014-07-13

    Hello,

    I checked out the r94 and try to run it, but it shows me that org.openimaj.math.geometry.transforms.error.TransformError2d class can't be found. I have checked openimaj and there is no such a class there. How can I solve this problem? Thanks.

     
    • Jonathon Hare
      Jonathon Hare
      2014-07-14

      Sorry, I was making some changes and improvements in OpenIMAJ, and didn't commit the corresponding changes in imageterrier. It should be working again in r95.

       

  • Anonymous
    7 days ago

    Hello Jonathon,

    I followed your instructions to run the pipeline of tools. But I'm stuck at the step HadoopFastKMeans. It showed me a NullPointerException. The errors are follows:

    Exception in thread "main" java.lang.NullPointerException
    at org.openimaj.ml.clustering.ByteCentroidsResult.writeBinary(ByteCentroidsResult.java:132)
    at org.openimaj.io.IOUtils.writeBinary(IOUtils.java:602)
    at org.openimaj.io.IOUtils.writeBinary(IOUtils.java:573)
    at org.openimaj.hadoop.tools.fastkmeans.HadoopFastKMeans.replaceSequenceFileWithCluster(HadoopFastKMeans.java:148)
    at org.openimaj.hadoop.tools.fastkmeans.HadoopFastKMeans.run(HadoopFastKMeans.java:103)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.openimaj.hadoop.tools.fastkmeans.HadoopFastKMeans.main(HadoopFastKMeans.java:158)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:197)

    I noticed that there is no part-r-00000 file since there is no reducer when ran the tools pipeline. However, HadoopFastKMeans seems to use part-r-00000 file to train the codebook.

    Can you please show me how to use HadoopFastKMeans to generate a codebook like the one you mentioned (mirflickr-sift-fastkmeans-1000000-new.voc)? Thanks a lot.

     
    • Jonathon Hare
      Jonathon Hare
      6 days ago

      Can you paste the command you ran?

       

      • Anonymous
        6 days ago

        Hi Jonathon,

        The commands I ran as follows:

        1. java -cp SequenceFileTool.jar org.openimaj.hadoop.tools.sequencefile.SequenceFileTool -m CREATE -kns FILENAME -o hdfs://servername/images.seq /images

        2. hadoop jar HadoopLocalFeaturesTool.jar --mode SIFT --no-double-size -i hdfs://servername/images.seq -o hdfs://servername/imagefeatures.seq

        3. hadoop jar HadoopFastKMeans.jar -i hdfs://servername/imagefeatures.seq -o hdfs://servername/codebook -k 1000 --force-delete -iters 10

        and the NullPointerException occurred during the run of step 3.

        Thanks a lot.

         
1 2 > >> (Page 1 of 2)


Anonymous


Cancel   Add attachments