I have obtained a Hadoop cluster quantizer sequence file, and I am trying to classify images from that file.
I am able to read the sequence file using TextBytesSequenceFileUtility. I've also been able to classify images by following the demo at http://www.openimaj.org/tutorial/classification101.html. However, I do not know how to convert the sequence file data into a sparse vector, or anything else that can be used by LiblinearAnnotator.
Here's what I have so far:
public static void main( String[] args ) throws IOException
{
//Read sequence file
TextBytesSequenceFileUtility reader = new TextBytesSequenceFileUtility("pathToFile", true);
//Convert sequence file data into hard assigner????
HardAssigner<byte[], float[], IntFloatPair> assigner = reader;
//Train image classification
FeatureExtractor<DoubleFV, Record<FImage>> extractor = new PHOWExtractor(pdsift, assigner);
LiblinearAnnotator<Record<FImage>, String> ann = new LiblinearAnnotator<Record<FImage>, String>(
extractor, Mode.MULTICLASS, SolverType.L2R_L2LOSS_SVC, 1.0, 0.00001);
ann.train(splits.getTrainingDataset());
}
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So, the Hadoop cluster quantiser tool wasn't really designed with this usecase in mind (it was more of a quick hack to make files to feed into ImageTerrier, with an eye to reading features from other [non-openimaj] tools). The sequence files contain an encoded list of QuantisedLocalFeature for each image, which can be read back into your program and converted to sparse vectors by doing something along the following lines:
TextBytesSequenceFileUtility reader = new TextBytesSequenceFileUtility("pathToFile", true);
for (Entry<Text, BytesWritable> kv : reader) {
MemoryLocalFeatureList<QuantisedKeypoint> features = MemoryLocalFeatureList.read(new ByteArrayInputStream(kv.getValue().getBytes()), QuantisedKeypoint.class);
SparseIntFV vector = BagOfVisualWords.extractFeatureFromQuantised(features, numVisualWords);
}
You might need to change QuantisedKeypoint.class to something else if your un-quantised features were something other than Keypoints.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2015-03-15
How would we pass the SparseIntFV to LiblinearAnnotator? Because LiblinearAnnotator take an extractor. Do we have to write our own extractor?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2015-03-15
How to pass this vector to FeatureExtractor so that i can train using LibLinear
LiblinearAnnotator<FImage, String=""> ann;
HomogeneousKernelMap hkm = new HomogeneousKernelMap(HomogeneousKernelMap.KernelType.Chi2, HomogeneousKernelMap.WindowType.Uniform);
FeatureExtractor<SparseIntFV, FImage> extractor = hkm.createWrappedExtractor();
ann = new LiblinearAnnotator<FImage, String>(extractor, LiblinearAnnotator.Mode.MULTICLASS, SolverType.L2R_L2LOSS_SVC, 1.0, 0.00001);
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You'll need to create a Liblinear annotator typed on SparseIntFV (i.e. ...new LiblinearAnnotator< SparseIntFV, String>(....
For the feature extractor, there's no reason to write your own as SparseIntFV is already a FeatureVector, and you can just use an instance of IdentityFeatureExtractor to pass through the input features.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2015-03-15
Ooh But this way how would my LiblinearAnnotator know which class it belong too. because rite now we just have featurevector and we will pass featurevector data to ann.train(featurevectordata). before it was GroupedDataset
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You need to construct either a GroupedDataset (maybe using MapBackedDataset?) or a List of Annotated to hold your feature vectors and their class assignments (based on the image names from the sequence file keys)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2015-03-16
GroupedDataset<Text, ListDataset<SparseIntFV>, SparseIntFV> gds;
//Map<Text, SparseIntFV> data_new = new HashMap<Text, SparseIntFV>();
for (Map.Entry<Text, BytesWritable> kv : reader) {
MemoryLocalFeatureList<QuantisedKeypoint> features = MemoryLocalFeatureList.read(new ByteArrayInputStream(kv.getValue().getBytes()), QuantisedKeypoint.class);
SparseIntFV vector = BagOfVisualWords.extractFeatureFromQuantised(features, 200);
// data_new.put(kv.getKey(), vector);
gds = new MapBackedDataset<Text, ListDataset<SparseIntFV>, SparseIntFV>(kv.getKey(), ??);
}
Is it something like this?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
final GroupedDataset<String, ListDataset<SparseIntFV>, SparseIntFV> gds = new MapBackedDataset<String, ListDataset<SparseIntFV>, SparseIntFV>();
for (final Map.Entry<Text, BytesWritable> kv : reader) {
final MemoryLocalFeatureList<QuantisedKeypoint> features = MemoryLocalFeatureList.read(
new ByteArrayInputStream(kv.getValue().getBytes()), QuantisedKeypoint.class);
final SparseIntFV vector = BagOfVisualWords.extractFeatureFromQuantised(features, 200);
final String clz = lookupClass(kv.getKey().toString());
if (!gds.containsKey(clz))
gds.put(clz, new ListBackedDataset<SparseIntFV>());
gds.get(clz).add(vector);
}
You'll have to implement the lookupClass method to determine what class your feature belongs to based on the filename stored in the sequence file key.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2015-03-16
The is is in UUID(0006bede-cbf6-11e4-8731-1681e6b88ec1) form. Do i need to regenerate the quantise feature vectors with string key? or can i convert this one.
Thankyou for your time.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Right, I think that means when you built the original sequencefile with images in it you chose to use the MD5UUID of the file as the key, rather than the name (unfortunately MD5UUID is the default [I'll change that for future versions of the tool]).
When you run the SequenceFileTool in create mode you need to use the option -kns FILENAME or -kns RELATIVEPATH to ensure that the keys are the original image names or relative paths (to where the tool is run from).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You have two options:
(a) regenerate the image sequencefile with FILENAME keys and go through the feature extraction and quantisation steps again
(b) re-run the command you used to generate the original image sequencefile, and add the option -wm. This will create an additional file <name>-map.txt (where <name> is the name of the output sequencefile`) which contains the mapping from filenames to UUIDs. You can then read the mapping into your program and use it to reverse the UUIDs back to the filenames in order to get the correct classes.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2015-03-17
thankyou very much for your time. I was able to do the training. Now on to predicting.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2015-03-18
Is this correct way of defining annotator in my case even thou its working but i think there is problem because ann.classify() only take 1 feature vector. Is it suppose to be Image or list?
ann = new LiblinearAnnotator<SparseIntFV, String>(new IdentityFeatureExtractor<SparseIntFV>(), LiblinearAnnotator.Mode.MULTICLASS, SolverType.L2R_L2LOSS_SVC, 1.0, 0.00001);
Do i have to transform my test data same way i did with training data?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, that looks correct; I didn't consider that you might want to train the annotator in a different way to using it (i.e. train with precomputed features and then test directly with images). The solution is to extract the features from the images and then pass each one to the annotator to get the class estimate.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2015-03-25
Does these commands and code look correct. Because i am getting accuracy of around 30-35%
I don't see anything fundamentally wrong with that. Given that you're only dealing with 100 test and training examples per class, how does that compare to doing the process without hadoop?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2015-03-31
Post awaiting moderation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have obtained a Hadoop cluster quantizer sequence file, and I am trying to classify images from that file.
I am able to read the sequence file using TextBytesSequenceFileUtility. I've also been able to classify images by following the demo at http://www.openimaj.org/tutorial/classification101.html. However, I do not know how to convert the sequence file data into a sparse vector, or anything else that can be used by LiblinearAnnotator.
Here's what I have so far:
So, the Hadoop cluster quantiser tool wasn't really designed with this usecase in mind (it was more of a quick hack to make files to feed into ImageTerrier, with an eye to reading features from other [non-openimaj] tools). The sequence files contain an encoded list of
QuantisedLocalFeature
for each image, which can be read back into your program and converted to sparse vectors by doing something along the following lines:You might need to change
QuantisedKeypoint.class
to something else if your un-quantised features were something other thanKeypoint
s.How would we pass the SparseIntFV to LiblinearAnnotator? Because LiblinearAnnotator take an extractor. Do we have to write our own extractor?
How to pass this vector to FeatureExtractor so that i can train using LibLinear
LiblinearAnnotator<FImage, String=""> ann;
You'll need to create a Liblinear annotator typed on
SparseIntFV
(i.e....new LiblinearAnnotator< SparseIntFV, String>(...
.For the feature extractor, there's no reason to write your own as
SparseIntFV
is already aFeatureVector
, and you can just use an instance ofIdentityFeatureExtractor
to pass through the input features.Ooh But this way how would my LiblinearAnnotator know which class it belong too. because rite now we just have featurevector and we will pass featurevector data to ann.train(featurevectordata). before it was GroupedDataset
You need to construct either a
GroupedDataset
(maybe usingMapBackedDataset
?) or aList
ofAnnotated
to hold your feature vectors and their class assignments (based on the image names from the sequence file keys)Is it something like this?
Not quite - something like this:
You'll have to implement the
lookupClass
method to determine what class your feature belongs to based on the filename stored in the sequence file key.The is is in UUID(0006bede-cbf6-11e4-8731-1681e6b88ec1) form. Do i need to regenerate the quantise feature vectors with string key? or can i convert this one.
Thankyou for your time.
(see below)
Because toString of getkey() return the same(0006bede-cbf6-11e4-8731-1681e6b88ec1) thing.
Right, I think that means when you built the original sequencefile with images in it you chose to use the MD5UUID of the file as the key, rather than the name (unfortunately MD5UUID is the default [I'll change that for future versions of the tool]).
When you run the
SequenceFileTool
increate
mode you need to use the option-kns FILENAME
or-kns RELATIVEPATH
to ensure that the keys are the original image names or relative paths (to where the tool is run from).You have two options:
(a) regenerate the image sequencefile with FILENAME keys and go through the feature extraction and quantisation steps again
(b) re-run the command you used to generate the original image sequencefile, and add the option
-wm
. This will create an additional file<name>-map.txt
(where<name>
is the name of the output sequencefile`) which contains the mapping from filenames to UUIDs. You can then read the mapping into your program and use it to reverse the UUIDs back to the filenames in order to get the correct classes.thankyou very much for your time. I was able to do the training. Now on to predicting.
Is this correct way of defining annotator in my case even thou its working but i think there is problem because ann.classify() only take 1 feature vector. Is it suppose to be Image or list?
Do i have to transform my test data same way i did with training data?
Yes, that looks correct; I didn't consider that you might want to train the annotator in a different way to using it (i.e. train with precomputed features and then test directly with images). The solution is to extract the features from the images and then pass each one to the annotator to get the class estimate.
Does these commands and code look correct. Because i am getting accuracy of around 30-35%
=====================
Get Seq File
=====================
java -Xmx2G -jar ../SequenceFileTool.jar -m CREATE -R -name train.seq -wm train/
hadoop dfs -copyFromLocal train.seq /user/hue/newtrain/
=====================
Extract Features
=====================
hadoop jar ../HadoopLocalFeaturesTool.jar --mode PYRAMID_DENSE_SIFT -s 7 -i hdfs://172.16.1.128/user/hue/newtrain/train.seq -o hdfs://172.16.1.128/user/hue/newtrain/sift.seq
===================
Fast Kmean
===================
hadoop jar ../HadoopFastKMeans.jar -i hdfs://172.16.1.128/user/hue/newtrain/sift.seq -k 700 -o hdfs://172.16.1.128/user/hue/newtrain/
====================
Quantise Descriptor
===================
hadoop jar ../HadoopClusterQuantiserTool.jar -t BINARY_KEYPOINT -i hdfs://172.16.1.128/user/hue/newtrain/sift.seq -q hdfs://172.16.1.128/user/hue/newtrain/final -o hdfs://172.16.1.128/user/hue/newtrain/quantised-sift.seq
===================
Sequance File Merger
===================
hadoop jar ../SequenceFileMerger-1.4-SNAPSHOT-jar-with-dependencies.jar -i hdfs://172.16.1.128/user/hue/newtrain/quantised-sift.seq -n 1 -o hdfs://172.16.1.128/user/hue/newtrain/QuantisedSift/
================
Code for testing
================
I don't see anything fundamentally wrong with that. Given that you're only dealing with 100 test and training examples per class, how does that compare to doing the process without hadoop?