Menu

Image Classification using Hadoop sequence file

Anonymous
2015-03-12
2015-03-31
  • Anonymous

    Anonymous - 2015-03-12

    I have obtained a Hadoop cluster quantizer sequence file, and I am trying to classify images from that file.

    I am able to read the sequence file using TextBytesSequenceFileUtility. I've also been able to classify images by following the demo at http://www.openimaj.org/tutorial/classification101.html. However, I do not know how to convert the sequence file data into a sparse vector, or anything else that can be used by LiblinearAnnotator.

    Here's what I have so far:

    public static void main( String[] args ) throws IOException
    {
        //Read sequence file
        TextBytesSequenceFileUtility reader = new TextBytesSequenceFileUtility("pathToFile", true);
    
        //Convert sequence file data into hard assigner????
        HardAssigner<byte[], float[], IntFloatPair> assigner = reader;
    
        //Train image classification
        FeatureExtractor<DoubleFV, Record<FImage>> extractor = new PHOWExtractor(pdsift, assigner);
        LiblinearAnnotator<Record<FImage>, String> ann = new LiblinearAnnotator<Record<FImage>, String>(
                extractor, Mode.MULTICLASS, SolverType.L2R_L2LOSS_SVC, 1.0, 0.00001);
        ann.train(splits.getTrainingDataset());
    }
    
     
  • Jonathon Hare

    Jonathon Hare - 2015-03-12

    So, the Hadoop cluster quantiser tool wasn't really designed with this usecase in mind (it was more of a quick hack to make files to feed into ImageTerrier, with an eye to reading features from other [non-openimaj] tools). The sequence files contain an encoded list of QuantisedLocalFeature for each image, which can be read back into your program and converted to sparse vectors by doing something along the following lines:

    TextBytesSequenceFileUtility  reader = new TextBytesSequenceFileUtility("pathToFile", true);
    for (Entry<Text, BytesWritable> kv : reader) {
        MemoryLocalFeatureList<QuantisedKeypoint> features = MemoryLocalFeatureList.read(new ByteArrayInputStream(kv.getValue().getBytes()), QuantisedKeypoint.class);
    
        SparseIntFV vector = BagOfVisualWords.extractFeatureFromQuantised(features, numVisualWords);
    }
    

    You might need to change QuantisedKeypoint.class to something else if your un-quantised features were something other than Keypoints.

     
    • Anonymous

      Anonymous - 2015-03-15

      How would we pass the SparseIntFV to LiblinearAnnotator? Because LiblinearAnnotator take an extractor. Do we have to write our own extractor?

       
  • Anonymous

    Anonymous - 2015-03-15

    How to pass this vector to FeatureExtractor so that i can train using LibLinear

    LiblinearAnnotator<FImage, String=""> ann;

        HomogeneousKernelMap hkm = new HomogeneousKernelMap(HomogeneousKernelMap.KernelType.Chi2, HomogeneousKernelMap.WindowType.Uniform);
    
        FeatureExtractor<SparseIntFV, FImage> extractor = hkm.createWrappedExtractor();
        ann = new LiblinearAnnotator<FImage, String>(extractor, LiblinearAnnotator.Mode.MULTICLASS, SolverType.L2R_L2LOSS_SVC, 1.0, 0.00001);
    
     
  • Jonathon Hare

    Jonathon Hare - 2015-03-15

    You'll need to create a Liblinear annotator typed on SparseIntFV (i.e. ...new LiblinearAnnotator< SparseIntFV, String>(....

    For the feature extractor, there's no reason to write your own as SparseIntFV is already a FeatureVector, and you can just use an instance of IdentityFeatureExtractor to pass through the input features.

     
    • Anonymous

      Anonymous - 2015-03-15

      Ooh But this way how would my LiblinearAnnotator know which class it belong too. because rite now we just have featurevector and we will pass featurevector data to ann.train(featurevectordata). before it was GroupedDataset

       
  • Jonathon Hare

    Jonathon Hare - 2015-03-15

    You need to construct either a GroupedDataset (maybe using MapBackedDataset?) or a List of Annotated to hold your feature vectors and their class assignments (based on the image names from the sequence file keys)

     
    • Anonymous

      Anonymous - 2015-03-16
           GroupedDataset<Text, ListDataset<SparseIntFV>, SparseIntFV> gds;
      
          //Map<Text, SparseIntFV> data_new = new HashMap<Text, SparseIntFV>();
      
              for (Map.Entry<Text, BytesWritable> kv : reader) {
              MemoryLocalFeatureList<QuantisedKeypoint> features = MemoryLocalFeatureList.read(new ByteArrayInputStream(kv.getValue().getBytes()), QuantisedKeypoint.class);
              SparseIntFV vector = BagOfVisualWords.extractFeatureFromQuantised(features, 200);
             // data_new.put(kv.getKey(), vector);
      
              gds = new MapBackedDataset<Text, ListDataset<SparseIntFV>, SparseIntFV>(kv.getKey(), ??);
               }
      

      Is it something like this?

       
      • Jonathon Hare

        Jonathon Hare - 2015-03-16

        Not quite - something like this:

        final GroupedDataset<String, ListDataset<SparseIntFV>, SparseIntFV> gds = new MapBackedDataset<String, ListDataset<SparseIntFV>, SparseIntFV>();
        for (final Map.Entry<Text, BytesWritable> kv : reader) {
            final MemoryLocalFeatureList<QuantisedKeypoint> features = MemoryLocalFeatureList.read(
            new ByteArrayInputStream(kv.getValue().getBytes()), QuantisedKeypoint.class);
            final SparseIntFV vector = BagOfVisualWords.extractFeatureFromQuantised(features, 200);
        
            final String clz = lookupClass(kv.getKey().toString());
        
            if (!gds.containsKey(clz))
                gds.put(clz, new ListBackedDataset<SparseIntFV>());
            gds.get(clz).add(vector);
        }
        

        You'll have to implement the lookupClass method to determine what class your feature belongs to based on the filename stored in the sequence file key.

         
        • Anonymous

          Anonymous - 2015-03-16

          The is is in UUID(0006bede-cbf6-11e4-8731-1681e6b88ec1) form. Do i need to regenerate the quantise feature vectors with string key? or can i convert this one.

          Thankyou for your time.

           
          • Jonathon Hare

            Jonathon Hare - 2015-03-17

            (see below)

             
  • Anonymous

    Anonymous - 2015-03-16

    Because toString of getkey() return the same(0006bede-cbf6-11e4-8731-1681e6b88ec1) thing.

     
    • Jonathon Hare

      Jonathon Hare - 2015-03-17

      Right, I think that means when you built the original sequencefile with images in it you chose to use the MD5UUID of the file as the key, rather than the name (unfortunately MD5UUID is the default [I'll change that for future versions of the tool]).

      When you run the SequenceFileTool in create mode you need to use the option -kns FILENAME or -kns RELATIVEPATH to ensure that the keys are the original image names or relative paths (to where the tool is run from).

       
      • Jonathon Hare

        Jonathon Hare - 2015-03-17

        You have two options:
        (a) regenerate the image sequencefile with FILENAME keys and go through the feature extraction and quantisation steps again
        (b) re-run the command you used to generate the original image sequencefile, and add the option -wm. This will create an additional file <name>-map.txt (where <name> is the name of the output sequencefile`) which contains the mapping from filenames to UUIDs. You can then read the mapping into your program and use it to reverse the UUIDs back to the filenames in order to get the correct classes.

         
        • Anonymous

          Anonymous - 2015-03-17

          thankyou very much for your time. I was able to do the training. Now on to predicting.

           
  • Anonymous

    Anonymous - 2015-03-18

    Is this correct way of defining annotator in my case even thou its working but i think there is problem because ann.classify() only take 1 feature vector. Is it suppose to be Image or list?

          ann = new LiblinearAnnotator<SparseIntFV, String>(new IdentityFeatureExtractor<SparseIntFV>(), LiblinearAnnotator.Mode.MULTICLASS, SolverType.L2R_L2LOSS_SVC, 1.0, 0.00001);
    

    Do i have to transform my test data same way i did with training data?

     
    • Jonathon Hare

      Jonathon Hare - 2015-03-18

      Yes, that looks correct; I didn't consider that you might want to train the annotator in a different way to using it (i.e. train with precomputed features and then test directly with images). The solution is to extract the features from the images and then pass each one to the annotator to get the class estimate.

       
  • Anonymous

    Anonymous - 2015-03-25

    Does these commands and code look correct. Because i am getting accuracy of around 30-35%

    =====================
    Get Seq File
    =====================
    java -Xmx2G -jar ../SequenceFileTool.jar -m CREATE -R -name train.seq -wm train/

    hadoop dfs -copyFromLocal train.seq /user/hue/newtrain/

    =====================
    Extract Features
    =====================
    hadoop jar ../HadoopLocalFeaturesTool.jar --mode PYRAMID_DENSE_SIFT -s 7 -i hdfs://172.16.1.128/user/hue/newtrain/train.seq -o hdfs://172.16.1.128/user/hue/newtrain/sift.seq

    ===================
    Fast Kmean
    ===================
    hadoop jar ../HadoopFastKMeans.jar -i hdfs://172.16.1.128/user/hue/newtrain/sift.seq -k 700 -o hdfs://172.16.1.128/user/hue/newtrain/

    ====================
    Quantise Descriptor
    ===================
    hadoop jar ../HadoopClusterQuantiserTool.jar -t BINARY_KEYPOINT -i hdfs://172.16.1.128/user/hue/newtrain/sift.seq -q hdfs://172.16.1.128/user/hue/newtrain/final -o hdfs://172.16.1.128/user/hue/newtrain/quantised-sift.seq

    ===================
    Sequance File Merger
    ===================
    hadoop jar ../SequenceFileMerger-1.4-SNAPSHOT-jar-with-dependencies.jar -i hdfs://172.16.1.128/user/hue/newtrain/quantised-sift.seq -n 1 -o hdfs://172.16.1.128/user/hue/newtrain/QuantisedSift/

    ================
    Code for testing
    ================

        GroupedRandomSplitter<String, SparseIntFV> splits =
                new GroupedRandomSplitter<String, SparseIntFV>(gds , 100, 0, 100);
    
        LiblinearAnnotator<SparseIntFV, String> ann;
    
        HomogeneousKernelMap hkm = new HomogeneousKernelMap(HomogeneousKernelMap.KernelType.JensonShannon, HomogeneousKernelMap.WindowType.Rectangular);
        //IdentityFeatureExtractor<SparseIntFV> extractor = new IdentityFeatureExtractor<SparseIntFV>();
        //extractor.extractFeature(vector);
        ann = new LiblinearAnnotator<SparseIntFV, String>(hkm.createWrappedExtractor(new IdentityFeatureExtractor<SparseIntFV>()) , LiblinearAnnotator.Mode.MULTICLASS, SolverType.L2R_L1LOSS_SVC_DUAL, 3, 1);
    
         ClassificationEvaluator<CMResult<String>, String, SparseIntFV> eval =
                new ClassificationEvaluator<CMResult<String>, String, SparseIntFV>(ann, splits.getTestDataset(), new CMAnalyser<SparseIntFV, String>(CMAnalyser.Strategy.SINGLE));
        Map<SparseIntFV, ClassificationResult<String>> guesses = eval.evaluate();
        CMResult<String> result = eval.analyse(guesses);
        System.out.println(result);
    
     
  • Jonathon Hare

    Jonathon Hare - 2015-03-30

    I don't see anything fundamentally wrong with that. Given that you're only dealing with 100 test and training examples per class, how does that compare to doing the process without hadoop?

     
    • Anonymous

      Anonymous - 2015-03-31
      Post awaiting moderation.

Anonymous
Anonymous

Add attachments
Cancel