Hello!
I am using RankLib to solve a simple ranking problem,but it is not a IR problem.
And I have some nominal features,for example,temperature:{hot,mild,cool},weather:{sunny,overcast,rainy}.
So how should I deal with nominal features?
Can I get hot=0,mild=1,cool=2?
Thank you very much!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Conventional wisdom says you shouldn't really try to convert nominal feature data into numeric feature data because the arithmetic of making use of the values (by the ranking algorithms) doesn't really have any meaning.
For some nominal data such as your examples, there does seem to be a gradient between cold to hot / rainy to sunny, so perhaps a straight numerical substitution would not be meaningless. There is an implied order to the numeric and nominal values.
You might want to normalize the converted values based on the maximum nominal range (e.g. cold = 0/3, cool=1/3, mild=2/3 and hot=3/3).
You also may want to check out Panda and SciKit-learn which are python, but does have some capabilities in converting nominal data to numeric values. I don't really know that much about what it does, but it is fairly widely used. You might use it to develop numeric feature values from the nominal values for input into RankLib.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello!
I am using RankLib to solve a simple ranking problem,but it is not a IR problem.
And I have some nominal features,for example,temperature:{hot,mild,cool},weather:{sunny,overcast,rainy}.
So how should I deal with nominal features?
Can I get hot=0,mild=1,cool=2?
Thank you very much!
Conventional wisdom says you shouldn't really try to convert nominal feature data into numeric feature data because the arithmetic of making use of the values (by the ranking algorithms) doesn't really have any meaning.
For some nominal data such as your examples, there does seem to be a gradient between cold to hot / rainy to sunny, so perhaps a straight numerical substitution would not be meaningless. There is an implied order to the numeric and nominal values.
You might want to normalize the converted values based on the maximum nominal range (e.g. cold = 0/3, cool=1/3, mild=2/3 and hot=3/3).
You also may want to check out Panda and SciKit-learn which are python, but does have some capabilities in converting nominal data to numeric values. I don't really know that much about what it does, but it is fairly widely used. You might use it to develop numeric feature values from the nominal values for input into RankLib.