RankLib error with normalization and saved models

RankLib
Edgar Meij
2013-11-14
2013-11-21
  • Edgar Meij
    Edgar Meij
    2013-11-14

    Let me start by saying thanks for creating this toolkit and sharing it with the world.

    I think I hit a bug however. Using the downloadable binary, it seems that z-score normalization messes up a saved model. For instance, note the different "MAP on test data" (this is all done on the MQ2008 data, Fold1):

    $ ranklib_jar -train train.txt -test test.txt -ranker 2 -norm zscore -silent -metric2t MAP -save model.txt
    
    [+] General Parameters:
    Training data:  train.txt
    Test data:  test.txt
    Feature vector representation: Dense.
    Ranking method: RankBoost
    Feature description file:   Unspecified. All features will be used.
    Train metric:   MAP
    Test metric:    MAP
    Feature normalization: zscore
    Model file: model.txt
    
    [+] RankBoost's Parameters:
    
    Reading feature file [train.txt]... [Done.]
    (471 ranked lists, 9630 entries read)
    Reading feature file [test.txt]... [Done.]
    (156 ranked lists, 2874 entries read)
    MAP on test data: 0.453
    
    Model saved to: model.txt
    
    $ ranklib_jar -load model.txt -test test.txt -metric2T MAP
    
    [+] General Parameters:
    Model file: model.txt
    Feature normalization: No
    Test metric:    MAP
    Model:      RankBoost
    Reading feature file [test.txt]... [Done.]
    (156 ranked lists, 2874 entries read)
    MAP on test data: 0.3902
    

    Without normalization it works fine:

    $ ranklib_jar -train train.txt -test test.txt -ranker 6 -silent -metric2t MAP -save model.txt
    
    [+] General Parameters:
    Training data:  train.txt
    Test data:  test.txt
    Feature vector representation: Dense.
    Ranking method: LambdaMART
    Feature description file:   Unspecified. All features will be used.
    Train metric:   MAP
    Test metric:    MAP
    Feature normalization: No
    Model file: model.txt
    
    [+] LambdaMART's Parameters:
    
    Reading feature file [train.txt]... [Done.]
    (471 ranked lists, 9630 entries read)
    Reading feature file [test.txt]... [Done.]
    (156 ranked lists, 2874 entries read)
    MAP on test data: 0.4279
    
    Model saved to: model.txt
    
    $ ranklib_jar -load model.txt -test test.txt -metric2T MAP
    
    [+] General Parameters:
    Model file: model.txt
    Feature normalization: No
    Test metric:    MAP
    Model:      LambdaMART
    Reading feature file [test.txt]... [Done.]
    (156 ranked lists, 2874 entries read)
    MAP on test data: 0.4279
    

    Note that when I try to do the same normalization during testing, I get an error:

    $ ranklib_jar -load model.txt -test test.txt -norm zscore -metric2T MAP
    
    [+] General Parameters:
    Model file: model.txt
    Feature normalization: zscore
    Test metric:    MAP
    Model:      LambdaMART
    Reading feature file [test.txt]... [Done.]
    (156 ranked lists, 2874 entries read)
    Exception in thread "main" java.lang.NullPointerException
        at ciir.umass.edu.features.ZScoreNormalizor.normalize(Unknown Source)
        at ciir.umass.edu.eval.Evaluator.normalize(Unknown Source)
        at ciir.umass.edu.eval.Evaluator.test(Unknown Source)
        at ciir.umass.edu.eval.Evaluator.main(Unknown Source)
    

    This is fixed in the SVN version. Compiling the trunk and using that yields the different scores again:

    $ ranklib_jar -train train.txt -test test.txt -ranker 2 -norm zscore -silent -metric2t MAP -save model.txt
    [+] General Parameters:
    Training data:  train.txt
    Test data:  test.txt
    Feature vector representation: Dense.
    Ranking method: RankBoost
    Feature description file:   Unspecified. All features will be used.
    Train metric:   MAP
    Test metric:    MAP
    Feature normalization: zscore
    Model file: model.txt
    
    [+] RankBoost's Parameters:
    
    Reading feature file [train.txt]... [Done.]
    (471 ranked lists, 9630 entries read)
    Reading feature file [test.txt]... [Done.]
    (156 ranked lists, 2874 entries read)
    MAP on test data: 0.453
    
    Model saved to: model.txt
    
    $ ranklib_jar -load model.txt -test test.txt -norm zscore -metric2T MAP
    
    [+] General Parameters:
    Model file: model.txt
    Feature normalization: zscore
    Test metric:    MAP
    Model:      RankBoost
    Reading feature file [test.txt]... [Done.]
    (156 ranked lists, 2874 entries read)
    MAP on test data: 0.3937
    

    Can it be that the zscore normalization is applied to the combination of training and testing instances?

     
  • Van Dang
    Van Dang
    2013-11-19

    This issue has been fixed. Please do an update from trunk and let us know if it's really gone.

     
  • Edgar Meij
    Edgar Meij
    2013-11-21

    Yep, this is fixed now.

     
    • Van Dang
      Van Dang
      2013-11-21

      Sounds good. Thanks for confirming.