Menu

Length of feature array returned by Ranker.getFeatures() is sometimes less than expected

RankLib
2016-04-18
2016-04-20
  • Ilya Zavorin

    Ilya Zavorin - 2016-04-18

    I've been experimenting with RankLib using different feature sets and different models. I just noticed that when I load some trained models and then get their feature Array using "int[] features = ranker.getFeatures()" the resulting array is shorter by 1 than I expected. For example, I trained the model with 20 features but features.length gives me only 19. However, when I apply the model to rank a feature matrix of the expected length (20), I don't see any errors from rankLib. Also, when I look at the model file, I see "<feature> 20 </feature>" lines in it.

    Furthermore, for other models, it does return the right count.

    I've been using MART and LambdaMART methods and for both I have examples of both matching and mismatching counts.

    Do you know why I am seeing this? (I promise you that I am not hallucinating :-))

     
  • Lemur Project

    Lemur Project - 2016-04-19

    What version of RankLib are you using? What was your command line?

    You should be aware that when loading a model, there can be more or fewer features listed than you actually defined when training or testing the model. It can depend on what ranker you used.

    I'm not sure if this is the same feature.length you are referring to. Some models load only the features that were represented (with weights) in the resulting model, so some features could be left out and some features could even be repeated.

    But for the test data, I think it should read in the proper number of features in the data file, although it puts them in an array that starts at 0. Still, the length should be correct.

    I've got data to use, so just send me the command line you used and perhaps I can replicate what is happening...after I have a few hallucinagenic mushrooms of course!

     
  • Ilya Zavorin

    Ilya Zavorin - 2016-04-20

    I am using whichever latets version I downloaded a few months ago (2.5?). I am using it as an API. My method that does the actual training given training data looks like this:

        public static void trainAndSaveModel(List<RankList> train)
        {
            if(rankerModelPath == null)
            {
                return;
            }
    
            final String trainMetric = "ERR@10";
    
            mFact = new MetricScorerFactory();
            trainScorer = mFact.createScorer(trainMetric);
    
            RankerFactory rf = new RankerFactory();
            rf.createRanker(rType2[rankerType.ordinal()]).printParameters();
    
            List<RankList> validation = null;
    
            int[] features = FeatureManager.getFeatureFromSampleVector(train);      
            RankerTrainer trainer = new RankerTrainer();
            ranker = trainer.train(rankerType, train, validation, features, trainScorer);
    
            ranker.save(rankerModelPath);
            log.info("Model saved to: " + rankerModelPath);
    
            return;
        }
    

    I've been only using MART and LambdaMART methods in my experiments, and have observed both matches and mismatches on both.

    I am hoping no mushrooms will be needede to resolve it...

    Thanks

     

Log in to post a comment.

MongoDB Logo MongoDB