OpenIMAJ / Discussion / General Discussion: Image Classification using Hadoop sequence file

Anonymous - 2015-03-12

I have obtained a Hadoop cluster quantizer sequence file, and I am trying to classify images from that file.

I am able to read the sequence file using TextBytesSequenceFileUtility. I've also been able to classify images by following the demo at http://www.openimaj.org/tutorial/classification101.html. However, I do not know how to convert the sequence file data into a sparse vector, or anything else that can be used by LiblinearAnnotator.

Here's what I have so far:

public static void main( String[] args ) throws IOException { //Read sequence file TextBytesSequenceFileUtility reader = new TextBytesSequenceFileUtility("pathToFile", true); //Convert sequence file data into hard assigner???? HardAssigner<byte[], float[], IntFloatPair> assigner = reader; //Train image classification FeatureExtractor<DoubleFV, Record<FImage>> extractor = new PHOWExtractor(pdsift, assigner); LiblinearAnnotator<Record<FImage>, String> ann = new LiblinearAnnotator<Record<FImage>, String>( extractor, Mode.MULTICLASS, SolverType.L2R_L2LOSS_SVC, 1.0, 0.00001); ann.train(splits.getTrainingDataset()); }
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Jonathon Hare - 2015-03-12

So, the Hadoop cluster quantiser tool wasn't really designed with this usecase in mind (it was more of a quick hack to make files to feed into ImageTerrier, with an eye to reading features from other [non-openimaj] tools). The sequence files contain an encoded list of QuantisedLocalFeature for each image, which can be read back into your program and converted to sparse vectors by doing something along the following lines:

TextBytesSequenceFileUtility reader = new TextBytesSequenceFileUtility("pathToFile", true); for (Entry<Text, BytesWritable> kv : reader) { MemoryLocalFeatureList<QuantisedKeypoint> features = MemoryLocalFeatureList.read(new ByteArrayInputStream(kv.getValue().getBytes()), QuantisedKeypoint.class); SparseIntFV vector = BagOfVisualWords.extractFeatureFromQuantised(features, numVisualWords); }

You might need to change QuantisedKeypoint.class to something else if your un-quantised features were something other than Keypoints.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Anonymous - 2015-03-15
  
  How would we pass the SparseIntFV to LiblinearAnnotator? Because LiblinearAnnotator take an extractor. Do we have to write our own extractor?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2015-03-15

How to pass this vector to FeatureExtractor so that i can train using LibLinear

LiblinearAnnotator<FImage, String=""> ann;

HomogeneousKernelMap hkm = new HomogeneousKernelMap(HomogeneousKernelMap.KernelType.Chi2, HomogeneousKernelMap.WindowType.Uniform); FeatureExtractor<SparseIntFV, FImage> extractor = hkm.createWrappedExtractor(); ann = new LiblinearAnnotator<FImage, String>(extractor, LiblinearAnnotator.Mode.MULTICLASS, SolverType.L2R_L2LOSS_SVC, 1.0, 0.00001);
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Jonathon Hare - 2015-03-15

You'll need to create a Liblinear annotator typed on SparseIntFV (i.e. ...new LiblinearAnnotator< SparseIntFV, String>(....

For the feature extractor, there's no reason to write your own as SparseIntFV is already a FeatureVector, and you can just use an instance of IdentityFeatureExtractor to pass through the input features.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Anonymous - 2015-03-15
  
  Ooh But this way how would my LiblinearAnnotator know which class it belong too. because rite now we just have featurevector and we will pass featurevector data to ann.train(featurevectordata). before it was GroupedDataset
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Jonathon Hare - 2015-03-15

You need to construct either a GroupedDataset (maybe using MapBackedDataset?) or a List of Annotated to hold your feature vectors and their class assignments (based on the image names from the sequence file keys)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Anonymous - 2015-03-16
  
  GroupedDataset<Text, ListDataset<SparseIntFV>, SparseIntFV> gds; //Map<Text, SparseIntFV> data_new = new HashMap<Text, SparseIntFV>(); for (Map.Entry<Text, BytesWritable> kv : reader) { MemoryLocalFeatureList<QuantisedKeypoint> features = MemoryLocalFeatureList.read(new ByteArrayInputStream(kv.getValue().getBytes()), QuantisedKeypoint.class); SparseIntFV vector = BagOfVisualWords.extractFeatureFromQuantised(features, 200); // data_new.put(kv.getKey(), vector); gds = new MapBackedDataset<Text, ListDataset<SparseIntFV>, SparseIntFV>(kv.getKey(), ??); }
  
  Is it something like this?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
  - Jonathon Hare - 2015-03-16
    
    Not quite - something like this:
    
    final GroupedDataset<String, ListDataset<SparseIntFV>, SparseIntFV> gds = new MapBackedDataset<String, ListDataset<SparseIntFV>, SparseIntFV>(); for (final Map.Entry<Text, BytesWritable> kv : reader) { final MemoryLocalFeatureList<QuantisedKeypoint> features = MemoryLocalFeatureList.read( new ByteArrayInputStream(kv.getValue().getBytes()), QuantisedKeypoint.class); final SparseIntFV vector = BagOfVisualWords.extractFeatureFromQuantised(features, 200); final String clz = lookupClass(kv.getKey().toString()); if (!gds.containsKey(clz)) gds.put(clz, new ListBackedDataset<SparseIntFV>()); gds.get(clz).add(vector); }
    
    You'll have to implement the lookupClass method to determine what class your feature belongs to based on the filename stored in the sequence file key.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    
    Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
    - Anonymous - 2015-03-16
      
      The is is in UUID(0006bede-cbf6-11e4-8731-1681e6b88ec1) form. Do i need to regenerate the quantise feature vectors with string key? or can i convert this one.
      
      Thankyou for your time.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      
      Anonymous
      
      Add attachments
      Cancel
      You seem to have CSS turned off. Please don't fill out this field.
      
      You seem to have CSS turned off. Please don't fill out this field.
      - Jonathon Hare - 2015-03-17
        
        (see below)
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2015-03-16

Because toString of getkey() return the same(0006bede-cbf6-11e4-8731-1681e6b88ec1) thing.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Jonathon Hare - 2015-03-17
  
  Right, I think that means when you built the original sequencefile with images in it you chose to use the MD5UUID of the file as the key, rather than the name (unfortunately MD5UUID is the default [I'll change that for future versions of the tool]).
  
  When you run the SequenceFileTool in create mode you need to use the option -kns FILENAME or -kns RELATIVEPATH to ensure that the keys are the original image names or relative paths (to where the tool is run from).
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
  - Jonathon Hare - 2015-03-17
    
    You have two options:
    (a) regenerate the image sequencefile with FILENAME keys and go through the feature extraction and quantisation steps again
    (b) re-run the command you used to generate the original image sequencefile, and add the option -wm. This will create an additional file <name>-map.txt (where <name> is the name of the output sequencefile`) which contains the mapping from filenames to UUIDs. You can then read the mapping into your program and use it to reverse the UUIDs back to the filenames in order to get the correct classes.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    
    Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
    - Anonymous - 2015-03-17
      
      thankyou very much for your time. I was able to do the training. Now on to predicting.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      
      Anonymous
      
      Add attachments
      Cancel
      You seem to have CSS turned off. Please don't fill out this field.
      
      You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2015-03-18

Is this correct way of defining annotator in my case even thou its working but i think there is problem because ann.classify() only take 1 feature vector. Is it suppose to be Image or list?

ann = new LiblinearAnnotator<SparseIntFV, String>(new IdentityFeatureExtractor<SparseIntFV>(), LiblinearAnnotator.Mode.MULTICLASS, SolverType.L2R_L2LOSS_SVC, 1.0, 0.00001);

Do i have to transform my test data same way i did with training data?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Jonathon Hare - 2015-03-18
  
  Yes, that looks correct; I didn't consider that you might want to train the annotator in a different way to using it (i.e. train with precomputed features and then test directly with images). The solution is to extract the features from the images and then pass each one to the annotator to get the class estimate.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2015-03-25

Does these commands and code look correct. Because i am getting accuracy of around 30-35%

=====================
Get Seq File
=====================
java -Xmx2G -jar ../SequenceFileTool.jar -m CREATE -R -name train.seq -wm train/

hadoop dfs -copyFromLocal train.seq /user/hue/newtrain/

=====================
Extract Features
=====================
hadoop jar ../HadoopLocalFeaturesTool.jar --mode PYRAMID_DENSE_SIFT -s 7 -i hdfs://172.16.1.128/user/hue/newtrain/train.seq -o hdfs://172.16.1.128/user/hue/newtrain/sift.seq

===================
Fast Kmean
===================
hadoop jar ../HadoopFastKMeans.jar -i hdfs://172.16.1.128/user/hue/newtrain/sift.seq -k 700 -o hdfs://172.16.1.128/user/hue/newtrain/

====================
Quantise Descriptor
===================
hadoop jar ../HadoopClusterQuantiserTool.jar -t BINARY_KEYPOINT -i hdfs://172.16.1.128/user/hue/newtrain/sift.seq -q hdfs://172.16.1.128/user/hue/newtrain/final -o hdfs://172.16.1.128/user/hue/newtrain/quantised-sift.seq

===================
Sequance File Merger
===================
hadoop jar ../SequenceFileMerger-1.4-SNAPSHOT-jar-with-dependencies.jar -i hdfs://172.16.1.128/user/hue/newtrain/quantised-sift.seq -n 1 -o hdfs://172.16.1.128/user/hue/newtrain/QuantisedSift/

================
Code for testing
================

GroupedRandomSplitter<String, SparseIntFV> splits = new GroupedRandomSplitter<String, SparseIntFV>(gds , 100, 0, 100); LiblinearAnnotator<SparseIntFV, String> ann; HomogeneousKernelMap hkm = new HomogeneousKernelMap(HomogeneousKernelMap.KernelType.JensonShannon, HomogeneousKernelMap.WindowType.Rectangular); //IdentityFeatureExtractor<SparseIntFV> extractor = new IdentityFeatureExtractor<SparseIntFV>(); //extractor.extractFeature(vector); ann = new LiblinearAnnotator<SparseIntFV, String>(hkm.createWrappedExtractor(new IdentityFeatureExtractor<SparseIntFV>()) , LiblinearAnnotator.Mode.MULTICLASS, SolverType.L2R_L1LOSS_SVC_DUAL, 3, 1); ClassificationEvaluator<CMResult<String>, String, SparseIntFV> eval = new ClassificationEvaluator<CMResult<String>, String, SparseIntFV>(ann, splits.getTestDataset(), new CMAnalyser<SparseIntFV, String>(CMAnalyser.Strategy.SINGLE)); Map<SparseIntFV, ClassificationResult<String>> guesses = eval.evaluate(); CMResult<String> result = eval.analyse(guesses); System.out.println(result);
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Jonathon Hare - 2015-03-30

I don't see anything fundamentally wrong with that. Given that you're only dealing with 100 test and training examples per class, how does that compare to doing the process without hadoop?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Anonymous - 2015-03-31
  
  Post awaiting moderation.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Image Classification using Hadoop sequence file

Forums

Help

Image Classification using Hadoop sequence file

Image Classification using Hadoop sequence file

Forums

Help

Image Classification using Hadoop sequence file document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Image Classification using Hadoop sequence file