tsoumakas - 2015-02-16

Dear Aftab,

I answer your questions inline.

On 16/2/2015 4:33 πμ, Aftab Hassan wrote:

I was working on a multi label classification problem using multi label k nearest neighbour(mlknn) in mulan.

Your library is fantastic, but not a lot of resources or documentation available, I was trying out different things and got some results, but I'm not sure if I'm right or totally wrong. I have a few questions.

  1. Say, I want to predict the two variables, predvar1 and predvar2, am
    I supposed to give these in the xml file?

There are two ways you can do this:
a) Just have these two variables as the last two variables of your dataset and call the MultiLabelInstances constructor with the second argument being the number of output variables, e.g. new MultiLabelInstances(arffFilename, 2);
b) Put these in an xml file according to our schema (http://mulan.sourceforge.net/format.html) and call the constructor you are already using.

  1. One of the variables I want to predict, say, predvar1 can take three
    values - 0, 1 or 2. When I give these in the xml file, I get the
    error, "The format of label attribute 'predvar1' is not valid".
    However, if I do the same for a variable which can take only two
    values 0 or 1, it works fine and gives me some accuracy and other
    metrics. Why is this?

Mulan addressed primarily multi-label learning tasks, i.e. all target variables should be binary. Recently we are also addressing problems with all target variables being numeric. Having a mixed typed of target variables as well as nominal attributes like {0, 1, 2} is a future goal.

  1. Also, for the variables, which I want to predict, should I give a
    '?' in the arff file?

Yes, this is Weka's way of indicating unknown values.

4.If I give less than three labels in the xml file, it gives me an error (using mulan 1.3)

Less than three labels doesn't make sense for RAkEL.

Cheers,
Greg