Welcome to your wiki!
This is the default page, edit it as you see fit. To add a page simply reference it within brackets, e.g.: [SamplePage].
The wiki uses Markdown syntax.
where the guide?
You seem to have CSS turned off.
Please don't fill out this field.
This wiki is still under construction. the guide can be found here: https://sourceforge.net/p/imageterrier/wiki/ImageTerrier%20Tools/
We'll tidy this up soon :)
Is there anyone knowing how to arrange imageterrier running in the framework of hadoop?
So, there is no document, right?
I really want to use it on hadoop, please show me the right track. Thanks alot.
Here are some brief notes on using ImageTerrier with Hadoop. You'll need a collection of tools from OpenIMAJ (from the hadoop/tools directory; compile each one with mvn assembly:assembly) as well as ImageTerrier & you might need to alter the maven dependencies to work with your version of Hadoop (currently the tools use the Cloudera Hadoop distribution [version 3u5 I think]).
Create a SequenceFile of images and put it on the hdfs. The OpenIMAJ SequenceFileTool can help do this.
Extract local features (i.e. DoG/SIFT) using HadoopLocalFeaturesTool:
hadoop jar HadoopLocalFeaturesTool.jar --mode SIFT --no-double-size -o hdfs://servername/data/images.seq -i hdfs://servername/data/sift-features.seq
Perform clustering of features to create a vocabulary. Either use OpenIMAJ's ClusterQuantiserTool (on a single machine) or HadoopFastKMeans (on a cluster). You should be able download a pre-trained vocabulary of 1 Million terms for SIFT features from: http://degas.ecs.soton.ac.uk/~jsh2/openimaj/codebooks/mirflickr-1000000-sift-fastkmeans.idx. The resultant vocabulary needs to be placed on the hdfs.
Quantise features into visual terms using the HadoopClusterQuantiserTool (adjust the number of threads to suit your cluster):
hadoop jar HadoopClusterQuantiserTool.jar -t BINARY_KEYPOINT -i hdfs://servername/data/sift-features.seq -o hdfs://servername/data/quantised-sift-features.seq -q hdfs://servername/data/mirflickr-1000000-sift-fastkmeans.idx -mm MULTITHREAD -j 6
hadoop jar HadoopImageTerrier.jar -t BASIC -nr 1 -fc QuantisedKeypoint -o hdfs:/servername//data/imageterrier-index.idx -m QUANTISED_FEATURES hdfs://servername/data/quantised-sift-features.seq
Use the latest SNAPSHOTS/trunk versions of ImageTerrier and OpenIMAJ - things have moved along quite a lot since the last releases.
HadoopImageTerrier is rather experimental - especially the "-m IMAGES" option!
* Due to the way Terrier works with HDFS, it seems that you have to use the full hdfs url, including the fully qualified servername.
Thank for tips, Jonathon.
Nevertheless, I'm stuck with step "Quantise features into visual terms using the HadoopClusterQuantiserTool"
I've tried hadoop jar HadoopClusterQuantiserTool.jar -t BINARY_KEYPOINT -i hdfs://localhost/sift-features.seq -o hdfs://localhost/quantised-sift-features.seq -q hdfs://localhost/mirflickr-1000000-sift-fastkmeans.voc
but it gives me following error:
Could not identify the clustertype. File: hdfs://localhost/mirflickr-1000000-sift-fastkmeans.voc
Can you provide more details on quantification step?
Ah, can you try with this vocabulary instead: http://degas.ecs.soton.ac.uk/~jsh2/openimaj/codebooks/mirflickr-sift-fastkmeans-1000000-new.voc
I think the other one had the old format header from earlier versions of OpenIMAJ.
How to use this existing vocabulary to build a index in a single server(without hadoop)? I have around 8 millions images, how much memory cost will be required? (The server's memory size is 256G) Can I use the following command:
java -Xmx250G -jar ImageTerrierTools-3.5-SNAPSHOT-jar-with-dependencies.jar BasicIndexer -m IMAGES -t POSITION -pm SPATIAL -qt RANDOMSET -k 1000000 -p BYTE -q ./mirflickr-sift-fastkmeans-1000000-new.voc -o ./mirflickr-sift-fastkmeans-1000000-new.idx -i image_paths.txt -v
#-i [file] I add read file list fucntion instead of searching for image file here
I think that will work. You shouldn't need anywhere near that much ram though - it can probably be done in around 4G to 8G (Terrier's single pass indexing strategy is designed to require relatively small amounts of ram irrespective of the number of documents). Memory for feature extraction from the images will be dependent on the number of CPU cores on the server and image size. Unless you've got a lot of cores, indexing 8m images will take a long time.
Can anyone tell me the options for BasicIndexer and BasicSearcher for Imageterrier 3.5 ?
The 3.0.1 options don't seem to work.
Take a look at the documentation in the svn: https://sourceforge.net/p/imageterrier/code/HEAD/tree/trunk/ImageTerrierTools/Documentation.markdown and https://sourceforge.net/p/imageterrier/code/HEAD/tree/trunk/ImageTerrierTools/Examples.markdown
Which file contains the source code of org.terrier.indexing.ExtensibleSinglePassIndexer? I can't find this class in this project.
It's not part of this project - it's part of the Terrier retrieval engine: http://terrier.org/docs/v3.5/javadoc/org/terrier/indexing/ExtensibleSinglePassIndexer.html
I'm running BasicIndexer on oxford5k benchmark and mirflickr-sift-fastkmeans-1000000-new.voc, the QuantiserTask takes about 1min per image. Is it because I'm extracting too much feature per image?
Yes, probably... If I remember correctly some of the oxford dataset images are very large and result in very large numbers of local features, which in turn take a long while to quantise.
I checked out the r94 and try to run it, but it shows me that org.openimaj.math.geometry.transforms.error.TransformError2d class can't be found. I have checked openimaj and there is no such a class there. How can I solve this problem? Thanks.
Sorry, I was making some changes and improvements in OpenIMAJ, and didn't commit the corresponding changes in imageterrier. It should be working again in r95.
I followed your instructions to run the pipeline of tools. But I'm stuck at the step HadoopFastKMeans. It showed me a NullPointerException. The errors are follows:
Exception in thread "main" java.lang.NullPointerException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
I noticed that there is no part-r-00000 file since there is no reducer when ran the tools pipeline. However, HadoopFastKMeans seems to use part-r-00000 file to train the codebook.
Can you please show me how to use HadoopFastKMeans to generate a codebook like the one you mentioned (mirflickr-sift-fastkmeans-1000000-new.voc)? Thanks a lot.
Can you paste the command you ran?
The commands I ran as follows:
java -cp SequenceFileTool.jar org.openimaj.hadoop.tools.sequencefile.SequenceFileTool -m CREATE -kns FILENAME -o hdfs://servername/images.seq /images
hadoop jar HadoopLocalFeaturesTool.jar --mode SIFT --no-double-size -i hdfs://servername/images.seq -o hdfs://servername/imagefeatures.seq
hadoop jar HadoopFastKMeans.jar -i hdfs://servername/imagefeatures.seq -o hdfs://servername/codebook -k 1000 --force-delete -iters 10
and the NullPointerException occurred during the run of step 3.
Thanks a lot.
I've traced the code and found there is no ByteCentroidsResult class (ByteCentroidsResult.java) in OpenIMAJ's trunk. Is it already been replaced with others? Thanks.
ByteCentroidsResult is auto-generated from #T#CentroidsResult.jtemp at build time - https://sourceforge.net/p/openimaj/code/HEAD/tree/trunk/machine-learning/clustering/src/main/jtemp/org/openimaj/ml/clustering/%23T%23CentroidsResult.jtemp (we do this so we don't have to maintain separate versions for all the native types)