I was working on a multi label classification problem using multi label k nearest neighbour(mlknn) in mulan.
Your library is fantastic, but not a lot of resources or documentation available, I was trying out different things and got some results, but I'm not sure if I'm right or totally wrong. I have a few questions.
- Say, I want to predict the two variables, predvar1 and predvar2, am
I supposed to give these in the xml file?
- One of the variables I want to predict, say, predvar1 can take three
values - 0, 1 or 2. When I give these in the xml file, I get the
error, "The format of label attribute 'predvar1' is not valid".
However, if I do the same for a variable which can take only two
values 0 or 1, it works fine and gives me some accuracy and other
metrics. Why is this?
- Also, for the variables, which I want to predict, should I give a
'?' in the arff file?
4.If I give less than three labels in the xml file, it gives me an error (using mulan 1.3)
This is the code I'm using
import mulan.classifier.lazy.MLkNN;
import mulan.classifier.meta.RAkEL;
import mulan.classifier.transformation.LabelPowerset;
import mulan.data.MultiLabelInstances;
import mulan.evaluation.Evaluator;
import mulan.evaluation.MultipleEvaluation;
import weka.classifiers.trees.J48;
import weka.core.Utils;
public class MulanExp1 {
public static void main(String[] args) throws Exception {
String arffFilename = Utils.getOption("arff", args); // e.g. -arff emotions.arff
String xmlFilename = Utils.getOption("xml", args); // e.g. -xml emotions.xml
MultiLabelInstances dataset = new MultiLabelInstances(arffFilename, xmlFilename);
RAkEL learner1 = new RAkEL(new LabelPowerset(new J48()));
MLkNN learner2 = new MLkNN();
Evaluator eval = new Evaluator();
MultipleEvaluation results;
int numFolds = 10;
results = eval.crossValidate(learner1, dataset, numFolds);
System.out.println(results);
results = eval.crossValidate(learner2, dataset, numFolds);
System.out.println(results);
}
}
Dear Aftab,
I answer your questions inline.
On 16/2/2015 4:33 πμ, Aftab Hassan wrote:
There are two ways you can do this:
a) Just have these two variables as the last two variables of your dataset and call the MultiLabelInstances constructor with the second argument being the number of output variables, e.g. new MultiLabelInstances(arffFilename, 2);
b) Put these in an xml file according to our schema (http://mulan.sourceforge.net/format.html) and call the constructor you are already using.
Mulan addressed primarily multi-label learning tasks, i.e. all target variables should be binary. Recently we are also addressing problems with all target variables being numeric. Having a mixed typed of target variables as well as nominal attributes like {0, 1, 2} is a future goal.
Yes, this is Weka's way of indicating unknown values.
Less than three labels doesn't make sense for RAkEL.
Cheers,
Greg