RankLib error with normalization and saved models

RankLib
Edgar Meij
2013-11-14
2016-11-09
  • Edgar Meij

    Edgar Meij - 2013-11-14

    Let me start by saying thanks for creating this toolkit and sharing it with the world.

    I think I hit a bug however. Using the downloadable binary, it seems that z-score normalization messes up a saved model. For instance, note the different "MAP on test data" (this is all done on the MQ2008 data, Fold1):

    $ ranklib_jar -train train.txt -test test.txt -ranker 2 -norm zscore -silent -metric2t MAP -save model.txt
    
    [+] General Parameters:
    Training data:  train.txt
    Test data:  test.txt
    Feature vector representation: Dense.
    Ranking method: RankBoost
    Feature description file:   Unspecified. All features will be used.
    Train metric:   MAP
    Test metric:    MAP
    Feature normalization: zscore
    Model file: model.txt
    
    [+] RankBoost's Parameters:
    
    Reading feature file [train.txt]... [Done.]
    (471 ranked lists, 9630 entries read)
    Reading feature file [test.txt]... [Done.]
    (156 ranked lists, 2874 entries read)
    MAP on test data: 0.453
    
    Model saved to: model.txt
    
    $ ranklib_jar -load model.txt -test test.txt -metric2T MAP
    
    [+] General Parameters:
    Model file: model.txt
    Feature normalization: No
    Test metric:    MAP
    Model:      RankBoost
    Reading feature file [test.txt]... [Done.]
    (156 ranked lists, 2874 entries read)
    MAP on test data: 0.3902
    

    Without normalization it works fine:

    $ ranklib_jar -train train.txt -test test.txt -ranker 6 -silent -metric2t MAP -save model.txt
    
    [+] General Parameters:
    Training data:  train.txt
    Test data:  test.txt
    Feature vector representation: Dense.
    Ranking method: LambdaMART
    Feature description file:   Unspecified. All features will be used.
    Train metric:   MAP
    Test metric:    MAP
    Feature normalization: No
    Model file: model.txt
    
    [+] LambdaMART's Parameters:
    
    Reading feature file [train.txt]... [Done.]
    (471 ranked lists, 9630 entries read)
    Reading feature file [test.txt]... [Done.]
    (156 ranked lists, 2874 entries read)
    MAP on test data: 0.4279
    
    Model saved to: model.txt
    
    $ ranklib_jar -load model.txt -test test.txt -metric2T MAP
    
    [+] General Parameters:
    Model file: model.txt
    Feature normalization: No
    Test metric:    MAP
    Model:      LambdaMART
    Reading feature file [test.txt]... [Done.]
    (156 ranked lists, 2874 entries read)
    MAP on test data: 0.4279
    

    Note that when I try to do the same normalization during testing, I get an error:

    $ ranklib_jar -load model.txt -test test.txt -norm zscore -metric2T MAP
    
    [+] General Parameters:
    Model file: model.txt
    Feature normalization: zscore
    Test metric:    MAP
    Model:      LambdaMART
    Reading feature file [test.txt]... [Done.]
    (156 ranked lists, 2874 entries read)
    Exception in thread "main" java.lang.NullPointerException
        at ciir.umass.edu.features.ZScoreNormalizor.normalize(Unknown Source)
        at ciir.umass.edu.eval.Evaluator.normalize(Unknown Source)
        at ciir.umass.edu.eval.Evaluator.test(Unknown Source)
        at ciir.umass.edu.eval.Evaluator.main(Unknown Source)
    

    This is fixed in the SVN version. Compiling the trunk and using that yields the different scores again:

    $ ranklib_jar -train train.txt -test test.txt -ranker 2 -norm zscore -silent -metric2t MAP -save model.txt
    [+] General Parameters:
    Training data:  train.txt
    Test data:  test.txt
    Feature vector representation: Dense.
    Ranking method: RankBoost
    Feature description file:   Unspecified. All features will be used.
    Train metric:   MAP
    Test metric:    MAP
    Feature normalization: zscore
    Model file: model.txt
    
    [+] RankBoost's Parameters:
    
    Reading feature file [train.txt]... [Done.]
    (471 ranked lists, 9630 entries read)
    Reading feature file [test.txt]... [Done.]
    (156 ranked lists, 2874 entries read)
    MAP on test data: 0.453
    
    Model saved to: model.txt
    
    $ ranklib_jar -load model.txt -test test.txt -norm zscore -metric2T MAP
    
    [+] General Parameters:
    Model file: model.txt
    Feature normalization: zscore
    Test metric:    MAP
    Model:      RankBoost
    Reading feature file [test.txt]... [Done.]
    (156 ranked lists, 2874 entries read)
    MAP on test data: 0.3937
    

    Can it be that the zscore normalization is applied to the combination of training and testing instances?

     
  • Van Dang

    Van Dang - 2013-11-19

    This issue has been fixed. Please do an update from trunk and let us know if it's really gone.

     
  • Edgar Meij

    Edgar Meij - 2013-11-21

    Yep, this is fixed now.

     
    • Van Dang

      Van Dang - 2013-11-21

      Sounds good. Thanks for confirming.

       
  • Mohammad A

    Mohammad A - 2016-09-16

    I'm still experiencing the same issue using this version: 2.1-patched-2

     
  • Stephen Harding

    Stephen Harding - 2016-09-19

    Please use a more recent version.

    Current release version is RankLib-2.7.

     
  • Stephen Harding

    Stephen Harding - 2016-11-09

    I don't seem to be getting the difference in scores between a saved normed model and a normed loaded model. I definitely am not getting any sort of NullPointerException reported in the original problem from 2013.

    I would not expect scores to be the same between non-normed saved models and normed data loaded into a non-normed model.

    I have tried running some save/load tests using RankLib-2.3, which was the version in which the original problem was reportedly fixed, as well as the downloaded RankLib-2.7 model from downloads, and a current jar built from RankLib-2.8-SNAPSHOT sources.

    Have I misunderstood your reported problem, or inadequately attempted to reproduce the problem?

    Below is a listing of the testing I did based on what was done in the original error report as well as the Bug 221 report ( https://sourceforge.net/p/lemur/bugs/221/ ). I used MQ2008/Fold1 data for the runs.

    // Downloaded RankLib-2.7
    // Create the normed model
    $ java -jar RankLib-2.7-download.jar -train train.txt -test test.txt -ranker 2 -norm zscore \ -silent -metric2t MAP -save download-2.7-save.txt

    [+] General Parameters:
    Training data: train.txt
    Test data: test.txt
    Feature vector representation: Dense.
    Ranking method: RankBoost
    Feature description file: Unspecified. All features will be used.
    Train metric: MAP
    Test metric: MAP
    Feature normalization: zscore
    Model file: download-2.7-save.txt

    [+] RankBoost's Parameters:

    Reading feature file [train.txt]: 0...
    Reading feature file [train.txt]... [Done.]
    (471 ranked lists, 9630 entries read)

    Reading feature file [test.txt]: 0...
    Reading feature file [test.txt]... [Done.]
    (156 ranked lists, 2874 entries read)
    MAP on test data: 0.453

    Model saved to: download-2.7-save.txt

    // Run loaded normed model
    $ java -jar RankLib-2.7-download.jar -test test.txt -norm zscore \ -silent -metric2T MAP -load download-2.7-save.txt

    [+] General Parameters:
    Model file: download-2.7-save.txt
    Feature normalization: zscore
    Test metric: MAP
    Model: RankBoost

    Reading feature file [test.txt]: 0...
    Reading feature file [test.txt]... [Done.]
    (156 ranked lists, 2874 entries read)
    MAP on test data: 0.453

    // Current RankLib-2.8-SNAPSHOT
    // Create normed model
    $ java -jar RankLib-2.8-SNAPSHOT.jar -train train.txt -test test.txt -ranker 2 \ -norm zscore -silent -metric2t MAP -save RL2.8-save.txt

    [+] General Parameters:
    Training data: train.txt
    Test data: test.txt
    Feature vector representation: Dense.
    Ranking method: RankBoost
    Feature description file: Unspecified. All features will be used.
    Train metric: MAP
    Test metric: MAP
    Feature normalization: zscore
    Model file: RL2.8-save.txt

    [+] RankBoost's Parameters:

    Reading feature file [train.txt]: 0...
    Reading feature file [train.txt]... [Done.]
    (471 ranked lists, 9630 entries read)

    Reading feature file [test.txt]: 0...
    Reading feature file [test.txt]... [Done.]
    (156 ranked lists, 2874 entries read)
    MAP on test data: 0.453

    Model saved to: RL2.8-save.txt

    // Run loaded normed model
    $ java -jar RankLib-2.8-SNAPSHOT.jar -test test.txt -norm zscore \ -silent -metric2T MAP -load RL2.8-save.txt

    [+] General Parameters:
    Model file: RL2.8-save.txt
    Feature normalization: zscore
    Test metric: MAP
    Model: RankBoost

    Reading feature file [test.txt]: 0...
    Reading feature file [test.txt]... [Done.]
    (156 ranked lists, 2874 entries read)
    MAP on test data: 0.453

    // RankLib-2.3 in which Bug 221 was reported fixed.
    // Create normed model
    $ java -jar RankLib-2.3.jar -train train.txt -test test.txt -ranker 2 -norm zscore \ -silent -metric2t MAP -save RL2.3-save.txt

    [+] General Parameters:
    Training data: train.txt
    Test data: test.txt
    Feature vector representation: Dense.
    Ranking method: RankBoost
    Feature description file: Unspecified. All features will be used.
    Train metric: MAP
    Test metric: MAP
    Feature normalization: zscore
    Model file: RL2.3-save.txt

    [+] RankBoost's Parameters:

    Reading feature file [train.txt]: 0...
    Reading feature file [train.txt]... [Done.]
    (471 ranked lists, 9630 entries read)

    Reading feature file [test.txt]: 0...
    Reading feature file [test.txt]... [Done.]
    (156 ranked lists, 2874 entries read)
    MAP on test data: 0.453

    Model saved to: RL2.3-save.txt

    // Run loaded normed model
    $ java -jar RankLib-2.3.jar -test test.txt -norm zscore \ -silent -metric2T MAP -load RL2.3-save.txt

    [+] General Parameters:
    Model file: RL2.3-save.txt
    Feature normalization: zscore
    Test metric: MAP
    Model: RankBoost

    Reading feature file [test.txt]: 0...
    Reading feature file [test.txt]... [Done.]
    (156 ranked lists, 2874 entries read)
    MAP on test data: 0.453

     

Log in to post a comment.