Hi,
I have been working with the library for a while. I am very happy with the results: RankLib is an extremely useful library. However, when I switched to using ERR@20, things fell apart. It looks like:
1) ERR@K is computed incorrectly. For example, I get negative values.
2) Training using ERR@K doesn't work. I tried coordinate ascent and lambda mart.
In the case of coordinate ascent, I get a very bad model. In the case of LMART I get an exception, which I paste below.
To reproduce results, I attach a file with two features each of which should get a good weight. For example, a good coordinate ascent model would have weights: 0.7 and 0.3.
I also attach helper scripts that I used to train/test models just in case.
Recall that training works with e.g. NDCG@20, but not with ERR@20.
I used Java Oracle Java 8 on Linux.
Many thanks!
Reading feature file [/home/ubuntu/sample.feat]... [Done.]
(9240 ranked lists, 138600 entries read)
Initializing... [Done]
1 | -20.9899 |
2 | Exception in thread "main" java.lang.NullPointerException
at ciir.umass.edu.learning.tree.RegressionTree.insert(RegressionTree.java:150)
at ciir.umass.edu.learning.tree.RegressionTree.fit(RegressionTree.java:64)
at ciir.umass.edu.learning.tree.LambdaMART.learn(LambdaMART.java:203)
at ciir.umass.edu.learning.RankerTrainer.train(RankerTrainer.java:43)
at ciir.umass.edu.eval.Evaluator.evaluate(Evaluator.java:730)
at ciir.umass.edu.eval.Evaluator.main(Evaluator.java:503)
PS: RankLib versions that I tried: 2.5 and 2.7.
I also noticed that all metric values (P@K, MAP, NDCG@K) seems to be multiplied by 10, though, of course, this doesn't affect the outcome of training.