The Lemur Project / Discussion / RankLib: Length of feature array returned by Ranker.getFeatures() is sometimes less than expected

Ilya Zavorin - 2016-04-18

I've been experimenting with RankLib using different feature sets and different models. I just noticed that when I load some trained models and then get their feature Array using "int[] features = ranker.getFeatures()" the resulting array is shorter by 1 than I expected. For example, I trained the model with 20 features but features.length gives me only 19. However, when I apply the model to rank a feature matrix of the expected length (20), I don't see any errors from rankLib. Also, when I look at the model file, I see "<feature> 20 </feature>" lines in it.

Furthermore, for other models, it does return the right count.

I've been using MART and LambdaMART methods and for both I have examples of both matching and mismatching counts.

Do you know why I am seeing this? (I promise you that I am not hallucinating :-))

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Lemur Project - 2016-04-19

What version of RankLib are you using? What was your command line?

You should be aware that when loading a model, there can be more or fewer features listed than you actually defined when training or testing the model. It can depend on what ranker you used.

I'm not sure if this is the same feature.length you are referring to. Some models load only the features that were represented (with weights) in the resulting model, so some features could be left out and some features could even be repeated.

But for the test data, I think it should read in the proper number of features in the data file, although it puts them in an array that starts at 0. Still, the length should be correct.

I've got data to use, so just send me the command line you used and perhaps I can replicate what is happening...after I have a few hallucinagenic mushrooms of course!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

I am using whichever latets version I downloaded a few months ago (2.5?). I am using it as an API. My method that does the actual training given training data looks like this:

    public static void trainAndSaveModel(List<RankList> train)
    {
        if(rankerModelPath == null)
        {
            return;
        }

        final String trainMetric = "ERR@10";

        mFact = new MetricScorerFactory();
        trainScorer = mFact.createScorer(trainMetric);

        RankerFactory rf = new RankerFactory();
        rf.createRanker(rType2[rankerType.ordinal()]).printParameters();

        List<RankList> validation = null;

        int[] features = FeatureManager.getFeatureFromSampleVector(train);      
        RankerTrainer trainer = new RankerTrainer();
        ranker = trainer.train(rankerType, train, validation, features, trainScorer);

        ranker.save(rankerModelPath);
        log.info("Model saved to: " + rankerModelPath);

        return;
    }

I've been only using MART and LambdaMART methods in my experiments, and have observed both matches and mismatches on both.

I am hoping no mushrooms will be needede to resolve it...

Thanks

Length of feature array returned by Ranker.getFeatures() is sometimes less...

Search engine and data mining applications and ClueWeb datasets.

Forums

Help

Length of feature array returned by Ranker.getFeatures() is sometimes less than expected