What is wrong with my data if I keep getting this error? [+] General Parameters:
Training data: imat2009-datasets\out_learning.txt
Feature vector representation: Dense.
Ranking method: LambdaMART
Feature description file: Unspecified. All features will be used.
Train metric: ERR@10
Test metric: ERR@10
Highest relevance label (to compute ERR): 4
Feature normalization: No
Model file: mymodel.txt
[+] LambdaMART's Parameters:
No. of trees: 1000
No. of leaves: 10
No. of threshold candidates: 256
Min leaf support: 1
Learning rate: 0.1
Stop early: 100 rounds without performance gain on validation data
(9124 ranked lists, 97290 entries read)
Initializing... Exception in thread "pool-1-thread-4" ciir.umass.edu.utilities.R
ankLibError: Error in DenseDataPoint::getFeatureValue(): requesting unspecified
feature, fid=245
at ciir.umass.edu.utilities.RankLibError.create(RankLibError.java:26)
at ciir.umass.edu.learning.DenseDataPoint.getFeatureValue(DenseDataPoint
.java:26)
at ciir.umass.edu.learning.tree.LambdaMART.sortSamplesByFeature(LambdaMA
RT.java:444)
at ciir.umass.edu.learning.tree.LambdaMART.sortSamplesByFeature(LambdaMA
RT.java:546)
at ciir.umass.edu.learning.tree.LambdaMART$SortWorker.run(LambdaMART.jav
a:562)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Exception in thread "main" java.lang.NullPointerException
at ciir.umass.edu.learning.tree.LambdaMART.init(LambdaMART.java:120)
at ciir.umass.edu.learning.RankerTrainer.train(RankerTrainer.java:42)
at ciir.umass.edu.eval.Evaluator.evaluate(Evaluator.java:678)
at ciir.umass.edu.eval.Evaluator.main(Evaluator.java:470)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hey guys,
I am getting exactly the same...
Which is weird, isn't RankLib supporting sparse training sets ?
just tried a small training set, adding spare features, and I am getting that error with the last one for id ( if it is not present everywhere).
Where I define the number of features ?
If a have a training set with 50 features, if an entry has 51, why the error is raised instead of considering it a sparse feature ?
Cheers
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
RankLib's models are not consistent with this behavior. Some are more robust to sparse features than others. I think CoorAscent "-ranker 4" is the one I've had the most luck with on this count. A lot of the tree-based learners assume that all instances will have a feature existing so that it can try to split based on statistics of the feature. There ought to be a feature to pad missing features with zeros, or a user defined alternative.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
What is wrong with my data if I keep getting this error?
[+] General Parameters:
Training data: imat2009-datasets\out_learning.txt
Feature vector representation: Dense.
Ranking method: LambdaMART
Feature description file: Unspecified. All features will be used.
Train metric: ERR@10
Test metric: ERR@10
Highest relevance label (to compute ERR): 4
Feature normalization: No
Model file: mymodel.txt
[+] LambdaMART's Parameters:
No. of trees: 1000
No. of leaves: 10
No. of threshold candidates: 256
Min leaf support: 1
Learning rate: 0.1
Stop early: 100 rounds without performance gain on validation data
Reading feature file [imat2009-datasets\out_learning.txt]... [Done.]
(9124 ranked lists, 97290 entries read)
Initializing... Exception in thread "pool-1-thread-4" ciir.umass.edu.utilities.R
ankLibError: Error in DenseDataPoint::getFeatureValue(): requesting unspecified
feature, fid=245
at ciir.umass.edu.utilities.RankLibError.create(RankLibError.java:26)
at ciir.umass.edu.learning.DenseDataPoint.getFeatureValue(DenseDataPoint
.java:26)
at ciir.umass.edu.learning.tree.LambdaMART.sortSamplesByFeature(LambdaMA
RT.java:444)
at ciir.umass.edu.learning.tree.LambdaMART.sortSamplesByFeature(LambdaMA
RT.java:546)
at ciir.umass.edu.learning.tree.LambdaMART$SortWorker.run(LambdaMART.jav
a:562)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Exception in thread "main" java.lang.NullPointerException
at ciir.umass.edu.learning.tree.LambdaMART.init(LambdaMART.java:120)
at ciir.umass.edu.learning.RankerTrainer.train(RankerTrainer.java:42)
at ciir.umass.edu.eval.Evaluator.evaluate(Evaluator.java:678)
at ciir.umass.edu.eval.Evaluator.main(Evaluator.java:470)
Apparently feature ID 245 is larger than the number of features in your training file.
Do you really have 245 features?
Is your data file in the correct format?
<label> qid:<id> [<fid>:<id>]+ [#optional comment]
Each query-doc feature set on one line?
SMH
Hey guys,
I am getting exactly the same...
Which is weird, isn't RankLib supporting sparse training sets ?
just tried a small training set, adding spare features, and I am getting that error with the last one for id ( if it is not present everywhere).
Where I define the number of features ?
If a have a training set with 50 features, if an entry has 51, why the error is raised instead of considering it a sparse feature ?
Cheers
RankLib's models are not consistent with this behavior. Some are more robust to sparse features than others. I think CoorAscent "-ranker 4" is the one I've had the most luck with on this count. A lot of the tree-based learners assume that all instances will have a feature existing so that it can try to split based on statistics of the feature. There ought to be a feature to pad missing features with zeros, or a user defined alternative.